Repository: fangleai/TransformerCVAE Branch: master Commit: 6d314bcd5bd6 Files: 26 Total size: 38.5 MB Directory structure: gitextract_x8c_d3k7/ ├── README.md ├── data/ │ ├── __init__.py │ ├── arxiv/ │ │ ├── artificial intelligence_10047_15000_15_abs.txt │ │ ├── artificial intelligence_10047_15000_15_title.txt │ │ ├── artificial intelligence_134_15000_200_abs.txt │ │ ├── artificial intelligence_134_15000_200_title.txt │ │ ├── computer vision_14582_15000_15_abs.txt │ │ ├── computer vision_14582_15000_15_title.txt │ │ ├── language generation_14514_15000_15_abs.txt │ │ └── language generation_14514_15000_15_title.txt │ ├── arxiv_dataset.py │ ├── plot_dataset.py │ ├── prompt_dataset.py │ ├── util.py │ └── yelp_dataset.py ├── dist_utils.py ├── eval_ppl.py ├── eval_ppl_prefix.py ├── generate.py ├── generate_prefix.py ├── model.py ├── train.py ├── train_dist.py ├── train_dist_half.py ├── tsne_plot.py └── util.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ # TransformerCVAE This repository contains source code for paper [Transformer-based Conditional Variational Autoencoder for Controllable Story Generation](https://arxiv.org/abs/2101.00828): ``` @article{fang2021transformer, title={Transformer-based Conditional Variational Autoencoder for Controllable Story Generation}, author={Fang, Le and Zeng, Tao and Liu, Chaochun and Bo, Liefeng and Dong, Wen and Chen, Changyou}, journal={arXiv preprint arXiv:2101.00828}, year={2021} } ``` 0. get source data ([Arxiv](https://github.com/gcunhase/ArXivAbsTitleDataset), [Yelp](https://github.com/fangleai/Implicit-LVM/tree/master/lang_model_yelp/data), [WritingPrompts](https://github.com/pytorch/fairseq/blob/master/examples/stories/README.md), [WikiPlots](https://github.com/markriedl/WikiPlots)). 1. data pre-processing (data/). 2. training (choose from several different implementations on parallelism and precision: train.py, train_dist.py, train_dist_half.py). 3. generation, evaluation and analysis (generate.py/generate_prefix.py, eval_ppl.py/eval_ppl_prefix.py, tsne_plot.py). Contact: lefang@buffalo.edu Update on 2022: If you encounter package version issue, sorry for that I don't have a requirements.txt with exact versions. I used this package: https://github.com/nvidia/apex and an old pytorch version compatible with it at that time, say pytorch=0.4 (not 100% sure). ================================================ FILE: data/__init__.py ================================================ ================================================ FILE: data/arxiv/artificial intelligence_10047_15000_15_abs.txt ================================================ Artificial intelligence has impacted many aspects of human life. This paper studies the impact of artificial intelligence on economic theory. In particular we study the impact of artificial intelligence on the theory of bounded rationality, efficient market hypothesis and prospect theory. In this paper, I put forward that in many instances, thinking mechanisms are equivalent to artificial intelligence modules programmed into the human mind. The rapid development of artificial intelligence has brought the artificial intelligence threat theory as well as the problem about how to evaluate the intelligence level of intelligent products. Both need to find a quantitative method to evaluate the intelligence level of intelligence systems, including human intelligence. Based on the standard intelligence system and the extended Von Neumann architecture, this paper proposes General IQ, Service IQ and Value IQ evaluation methods for intelligence systems, depending on different evaluation purposes. Among them, the General IQ of intelligence systems is to answer the question of whether the artificial intelligence can surpass the human intelligence, which is reflected in putting the intelligence systems on an equal status and conducting the unified evaluation. The Service IQ and Value IQ of intelligence systems are used to answer the question of how the intelligent products can better serve the human, reflecting the intelligence and required cost of each intelligence system as a product in the process of serving human. In this article, I discuss how the AI community views concerns about the emergence of superintelligent AI and related philosophical issues. Currently, potential threats of artificial intelligence (AI) to human have triggered a large controversy in society, behind which, the nature of the issue is whether the artificial intelligence (AI) system can be evaluated quantitatively. This article analyzes and evaluates the challenges that the AI development level is facing, and proposes that the evaluation methods for the human intelligence test and the AI system are not uniform; and the key reason for which is that none of the models can uniformly describe the AI system and the beings like human. Aiming at this problem, a standard intelligent system model is established in this study to describe the AI system and the beings like human uniformly. Based on the model, the article makes an abstract mathematical description, and builds the standard intelligent machine mathematical model; expands the Von Neumann architecture and proposes the Liufeng - Shiyong architecture; gives the definition of the artificial intelligence IQ, and establishes the artificial intelligence scale and the evaluation method; conduct the test on 50 search engines and three human subjects at different ages across the world, and finally obtains the ranking of the absolute IQ and deviation IQ ranking for artificial intelligence IQ 2014. Knowledge-based or Artificial Intelligence techniques are used increasingly as alternatives to more classical techniques to model ENVIRONMENTAL SYSTEMS. Use of Artificial Intelligence (AI) in environmental modelling has increased with recognition of its potential. In this paper we examine the DIFFERENT TECHNIQUES of Artificial intelligence with profound examples of human perception, learning and reasoning to solve complex problems. However with the increase of complexity better methods are required. Keeping in view of the above some researchers introduced the idea of hybrid mechanism in which two or more methods can be combined which seems to be a positive effort for creating a more complex; advanced and intelligent system which has the capability to in- cooperate human decisions thus driving the landscape changes. Although the definition and measurement of intelligence is clearly of fundamental importance to the field of artificial intelligence, no general survey of definitions and tests of machine intelligence exists. Indeed few researchers are even aware of alternatives to the Turing test and its many derivatives. In this paper we fill this gap by providing a short survey of the many tests of machine intelligence that have been proposed. The Universal Intelligence Measure is a recently proposed formal definition of intelligence. It is mathematically specified, extremely general, and captures the essence of many informal definitions of intelligence. It is based on Hutter's Universal Artificial Intelligence theory, an extension of Ray Solomonoff's pioneering work on universal induction. Since the Universal Intelligence Measure is only asymptotically computable, building a practical intelligence test from it is not straightforward. This paper studies the practical issues involved in developing a real-world UIM-based performance metric. Based on our investigation, we develop a prototype implementation which we use to evaluate a number of different artificial agents. We derive axiomatically the probability function that should be used to make decisions given any form of underlying uncertainty. This article analyses the properties of the Internal Behaviour network, an action selection mechanism previously proposed by the authors, with the aid of a simulation developed for such ends. A brief review of the Internal Behaviour network is followed by the explanation of the implementation of the simulation. Then, experiments are presented and discussed analysing the properties of the action selection in the proposed model. This paper describe about a new methodology for developing and improving the robotics field via artificial intelligence and internet of things. Now a day, we can say Artificial Intelligence take the world into robotics. Almost all industries use robots for lot of works. They are use co-operative robots to make different kind of works. But there was some problem to make robot for multi tasks. So there was a necessary new methodology to made multi tasking robots. It will be done only by artificial intelligence and internet of things. Over the last decade, there has been growing interest in the use or measures or change in belief for reasoning with uncertainty in artificial intelligence research. An important characteristic of several methodologies that reason with changes in belief or belief updates, is a property that we term modularity. We call updates that satisfy this property modular updates. Whereas probabilistic measures of belief update - which satisfy the modularity property were first discovered in the nineteenth century, knowledge and discussion of these quantities remains obscure in artificial intelligence research. We define modular updates and discuss their inappropriate use in two influential expert systems. One of the most important aims of the fields of robotics, artificial intelligence and artificial life is the design and construction of systems and machines as versatile and as reliable as living organisms at performing high level human-like tasks. But how are we to evaluate artificial systems if we are not certain how to measure these capacities in living systems, let alone how to define life or intelligence? Here I survey a concrete metric towards measuring abstract properties of natural and artificial systems, such as the ability to react to the environment and to control one's own behaviour. A fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: We take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this measure formally captures the concept of machine intelligence in the broadest reasonable sense. Narrative intelligence is the ability to craft, tell, understand, and respond affectively to stories. We argue that instilling artificial intelligences with computational narrative intelligence affords a number of applications beneficial to humans. We lay out some of the machine learning challenges necessary to solve to achieve computational narrative intelligence. Finally, we argue that computational narrative is a practical step towards machine enculturation, the teaching of sociocultural values to machines. The advent of artificial intelligence has changed many disciplines such as engineering, social science and economics. Artificial intelligence is a computational technique which is inspired by natural intelligence such as the swarming of birds, the working of the brain and the pathfinding of the ants. These techniques have impact on economic theories. This book studies the impact of artificial intelligence on economic theories, a subject that has not been extensively studied. The theories that are considered are: demand and supply, asymmetrical information, pricing, rational choice, rational expectation, game theory, efficient market hypotheses, mechanism design, prospect, bounded rationality, portfolio theory, rational counterfactual and causality. The benefit of this book is that it evaluates existing theories of economics and update them based on the developments in artificial intelligence field. In this paper we offer a formal definition of Artificial Intelligence and this directly gives us an algorithm for construction of this object. Really, this algorithm is useless due to the combinatory explosion. The main innovation in our definition is that it does not include the knowledge as a part of the intelligence. So according to our definition a newly born baby also is an Intellect. Here we differs with Turing's definition which suggests that an Intellect is a person with knowledge gained through the years. With almost daily improvements in capabilities of artificial intelligence it is more important than ever to develop safety software for use by the AI research community. Building on our previous work on AI Containment Problem we propose a number of guidelines which should help AI safety researchers to develop reliable sandboxing software for intelligent programs of all levels. Such safety container software will make it possible to study and analyze intelligent artificial agent while maintaining certain level of safety against information leakage, social engineering attacks and cyberattacks from within the container. Content marketing is todays one of the most remarkable approaches in the context of marketing processes of companies. Value of this kind of marketing has improved in time, thanks to the latest developments regarding to computer and communication technologies. Nowadays, especially social media based platforms have a great importance on enabling companies to design multimedia oriented, interactive content. But on the other hand, there is still something more to do for improved content marketing approaches. In this context, objective of this study is to focus on intelligent content marketing, which can be done by using artificial intelligence. Artificial Intelligence is todays one of the most remarkable research fields and it can be used easily as multidisciplinary. So, this study has aimed to discuss about its potential on improving content marketing. In detail, the study has enabled readers to improve their awareness about the intersection point of content marketing and artificial intelligence. Furthermore, the authors have introduced some example models of intelligent content marketing, which can be achieved by using current Web technologies and artificial intelligence techniques. The recent surge in interest in ethics in artificial intelligence may leave many educators wondering how to address moral, ethical, and philosophical issues in their AI courses. As instructors we want to develop curriculum that not only prepares students to be artificial intelligence practitioners, but also to understand the moral, ethical, and philosophical impacts that artificial intelligence will have on society. In this article we provide practical case studies and links to resources for use by AI educators. We also provide concrete suggestions on how to integrate AI ethics into a general artificial intelligence course and how to teach a stand-alone artificial intelligence ethics course. There is a significant lack of unified approaches to building generally intelligent machines. The majority of current artificial intelligence research operates within a very narrow field of focus, frequently without considering the importance of the 'big picture'. In this document, we seek to describe and unify principles that guide the basis of our development of general artificial intelligence. These principles revolve around the idea that intelligence is a tool for searching for general solutions to problems. We define intelligence as the ability to acquire skills that narrow this search, diversify it and help steer it to more promising areas. We also provide suggestions for studying, measuring, and testing the various skills and abilities that a human-level intelligent machine needs to acquire. The document aims to be both implementation agnostic, and to provide an analytic, systematic, and scalable way to generate hypotheses that we believe are needed to meet the necessary conditions in the search for general artificial intelligence. We believe that such a framework is an important stepping stone for bringing together definitions, highlighting open problems, connecting researchers willing to collaborate, and for unifying the arguably most significant search of this century. Artificial intelligence (AI) is an important technology that supports daily social life and economic activities. It contributes greatly to the sustainable growth of Japan's economy and solves various social problems. In recent years, AI has attracted attention as a key for growth in developed countries such as Europe and the United States and developing countries such as China and India. The attention has been focused mainly on developing new artificial intelligence information communication technology (ICT) and robot technology (RT). Although recently developed AI technology certainly excels in extracting certain patterns, there are many limitations. Most ICT models are overly dependent on big data, lack a self-idea function, and are complicated. In this paper, rather than merely developing next-generation artificial intelligence technology, we aim to develop a new concept of general-purpose intelligence cognition technology called Beyond AI. Specifically, we plan to develop an intelligent learning model called Brain Intelligence (BI) that generates new ideas about events without having experienced them by using artificial life with an imagine function. We will also conduct demonstrations of the developed BI intelligence learning model on automatic driving, precision medical care, and industrial robots. Although artificial intelligence is currently one of the most interesting areas in scientific research, the potential threats posed by emerging AI systems remain a source of persistent controversy. To address the issue of AI threat, this study proposes a standard intelligence model that unifies AI and human characteristics in terms of four aspects of knowledge, i.e., input, output, mastery, and creation. Using this model, we observe three challenges, namely, expanding of the von Neumann architecture; testing and ranking the intelligence quotient of naturally and artificially intelligent systems, including humans, Google, Bing, Baidu, and Siri; and finally, the dividing of artificially intelligent systems into seven grades from robots to Google Brain. Based on this, we conclude that AlphaGo belongs to the third grade. Biologically inspired computing is an area of computer science which uses the advantageous properties of biological systems. It is the amalgamation of computational intelligence and collective intelligence. Biologically inspired mechanisms have already proved successful in achieving major advances in a wide range of problems in computing and communication systems. The consortium of bio-inspired computing are artificial neural networks, evolutionary algorithms, swarm intelligence, artificial immune systems, fractal geometry, DNA computing and quantum computing, etc. This article gives an introduction to swarm intelligence. This paper analyses the influence of including agents of different degrees of intelligence in a multiagent system. The goal is to better understand how we can develop intelligence tests that can evaluate social intelligence. We analyse several reinforcement algorithms in several contexts of cooperation and competition. Our experimental setting is inspired by the recently developed Darwin-Wallace distribution. We give a brief overview of the main characteristics of diagrammatic reasoning, analyze a case of human reasoning in a mastermind game, and explain why hybrid representation systems (HRS) are particularly attractive and promising for Artificial General Intelligence and Computer Science in general. We reminisce and discuss applications of algorithmic probability to a wide range of problems in artificial intelligence, philosophy and technological society. We propose that Solomonoff has effectively axiomatized the field of artificial intelligence, therefore establishing it as a rigorous scientific discipline. We also relate to our own work in incremental machine learning and philosophy of complexity. Much artificial intelligence research focuses on the problem of deducing the validity of unobservable propositions or hypotheses from observable evidence.! Many of the knowledge representation techniques designed for this problem encode the relationship between evidence and hypothesis in a directed manner. Moreover, the direction in which evidence is stored is typically from evidence to hypothesis. This paper covers two topics: first an introduction to Algorithmic Complexity Theory: how it defines probability, some of its characteristic properties and past successful applications. Second, we apply it to problems in A.I. - where it promises to give near optimum search procedures for two very broad classes of problems. Verified artificial intelligence (AI) is the goal of designing AI-based systems that are provably correct with respect to mathematically-specified requirements. This paper considers Verified AI from a formal methods perspective. We describe five challenges for achieving Verified AI, and five corresponding principles for addressing these challenges. Human speech is the most important part of General Artificial Intelligence and subject of much research. The hypothesis proposed in this article provides explanation of difficulties that modern science tackles in the field of human brain simulation. The hypothesis is based on the author's conviction that the brain of any given person has different ability to process and store information. Therefore, the approaches that are currently used to create General Artificial Intelligence have to be altered. This article considers evidence from physical and biological sciences to show machines are deficient compared to biological systems at incorporating intelligence. Machines fall short on two counts: firstly, unlike brains, machines do not self-organize in a recursive manner; secondly, machines are based on classical logic, whereas Nature's intelligence may depend on quantum mechanics. Artificial Intelligence began as a field probing some of the most fundamental questions of science - the nature of intelligence and the design of intelligent artifacts. But it has grown into a discipline that is deeply entwined with commerce and society. Today's AI technology, such as expert systems and intelligent assistants, pose some difficult questions of risk, trust and accountability. In this paper, we present these concerns, examining them in the context of historical developments that have shaped the nature and direction of AI research. We also suggest the exploration and further development of two paradigms, human intelligence-machine cooperation, and a sociological view of intelligence, which might help address some of these concerns. Little by little, newspapers are revealing the bright future that Artificial Intelligence (AI) is building. Intelligent machines will help everywhere. However, this bright future has a dark side: a dramatic job market contraction before its unpredictable transformation. Hence, in a near future, large numbers of job seekers will need financial support while catching up with these novel unpredictable jobs. This possible job market crisis has an antidote inside. In fact, the rise of AI is sustained by the biggest knowledge theft of the recent years. Learning AI machines are extracting knowledge from unaware skilled or unskilled workers by analyzing their interactions. By passionately doing their jobs, these workers are digging their own graves. In this paper, we propose Human-in-the-loop Artificial Intelligence (HIT-AI) as a fairer paradigm for Artificial Intelligence systems. HIT-AI will reward aware and unaware knowledge producers with a different scheme: decisions of AI systems generating revenues will repay the legitimate owners of the knowledge used for taking those decisions. As modern Robin Hoods, HIT-AI researchers should fight for a fairer Artificial Intelligence that gives back what it steals. Independent from the still ongoing research in measuring individual intelligence, we anticipate and provide a framework for measuring collective intelligence. Collective intelligence refers to the idea that several individuals can collaborate in order to achieve high levels of intelligence. We present thus some ideas on how the intelligence of a group can be measured and simulate such tests. We will however focus here on groups of artificial intelligence agents (i.e., machines). We will explore how a group of agents is able to choose the appropriate problem and to specialize for a variety of tasks. This is a feature which is an important contributor to the increase of intelligence in a group (apart from the addition of more agents and the improvement due to common decision making). Our results reveal some interesting results about how (collective) intelligence can be modeled, about how collective intelligence tests can be designed and about the underlying dynamics of collective intelligence. As it will be useful for our simulations, we provide also some improvements of the threshold allocation model originally used in the area of swarm intelligence but further generalized here. Crisis response poses many of the most difficult information technology in crisis management. It requires information and communication-intensive efforts, utilized for reducing uncertainty, calculating and comparing costs and benefits, and managing resources in a fashion beyond those regularly available to handle routine problems. In this paper, we explore the benefits of artificial intelligence technologies in crisis response. This paper discusses the role of artificial intelligence technologies; namely, robotics, ontology and semantic web, and multi-agent systems in crisis response. Agents and agent systems are becoming more and more important in the development of a variety of fields such as ubiquitous computing, ambient intelligence, autonomous computing, intelligent systems and intelligent robotics. The need for improvement of our basic knowledge on agents is very essential. We take a systematic approach and present extended classification of artificial agents which can be useful for understanding of what artificial agents are and what they can be in the future. The aim of this classification is to give us insights in what kind of agents can be created and what type of problems demand a specific kind of agents for their solution. Two different definitions of the Artificial Intelligence concept have been proposed in papers [1] and [2]. The first definition is informal. It says that any program that is cleverer than a human being, is acknowledged as Artificial Intelligence. The second definition is formal because it avoids reference to the concept of human being. The readers of papers [1] and [2] might be left with the impression that both definitions are equivalent and the definition in [2] is simply a formal version of that in [1]. This paper will compare both definitions of Artificial Intelligence and, hopefully, will bring a better understanding of the concept. Computer games play an important role in our society and motivate people to learn computer science. Since artificial intelligence is integral to most games, they can also be used to teach artificial intelligence. We introduce the Game AI Game Engine (GAIGE), a Python game engine specifically designed to teach about how AI is used in computer games. A progression of seven assignments builds toward a complete, working Multi-User Battle Arena (MOBA) game. We describe the engine, the assignments, and our experiences using it in a class on Game Artificial Intelligence. This paper is concerned with two theories of probability judgment: the Bayesian theory and the theory of belief functions. It illustrates these theories with some simple examples and discusses some of the issues that arise when we try to implement them in expert systems. The Bayesian theory is well known; its main ideas go back to the work of Thomas Bayes (1702-1761). The theory of belief functions, often called the Dempster-Shafer theory in the artificial intelligence community, is less well known, but it has even older antecedents; belief-function arguments appear in the work of George Hooper (16401723) and James Bernoulli (1654-1705). For elementary expositions of the theory of belief functions, see Shafer (1976, 1985). Algorithmic composition is the partial or total automation of the process of music composition by using computers. Since the 1950s, different computational techniques related to Artificial Intelligence have been used for algorithmic composition, including grammatical representations, probabilistic methods, neural networks, symbolic rule-based systems, constraint programming and evolutionary algorithms. This survey aims to be a comprehensive account of research on algorithmic composition, presenting a thorough view of the field for researchers in Artificial Intelligence. The theory of rational choice assumes that when people make decisions they do so in order to maximize their utility. In order to achieve this goal they ought to use all the information available and consider all the choices available to choose an optimal choice. This paper investigates what happens when decisions are made by artificially intelligent machines in the market rather than human beings. Firstly, the expectations of the future are more consistent if they are made by an artificially intelligent machine and the decisions are more rational and thus marketplace becomes more rational. Artificial Intelligence (AI) started out with an ambition to reproduce the human mind, but, as the sheer scale of that ambition became apparent, quickly retreated into either studying specialized intelligent behaviours, or proposing overarching architectural concepts for interfacing specialized intelligent behaviour components, conceived of as agents in a kind of organization. This agent-based modeling paradigm, in turn, proves to have interesting applications in understanding, simulating, and predicting the behaviour of social and legal structures on an aggregate level. This chapter examines a number of relevant cross-cutting concerns, conceptualizations, modeling problems and design challenges in large-scale distributed Artificial Intelligence, as well as in institutional systems, and identifies potential grounds for novel advances. Since Artificial Intelligence (AI) software uses techniques like deep lookahead search and stochastic optimization of huge neural networks to fit mammoth datasets, it often results in complex behavior that is difficult for people to understand. Yet organizations are deploying AI algorithms in many mission-critical settings. In order to trust their behavior, we must make it intelligible --- either by using inherently interpretable models or by developing methods for explaining otherwise overwhelmingly complex decisions by local approximation, vocabulary alignment, and interactive dialog. How intelligent is robot A compared with robot B? And how intelligent are robots A and B compared with animals (or plants) X and Y? These are both interesting and deeply challenging questions. In this paper we address the question "how intelligent is your intelligent robot?" by proposing that embodied intelligence emerges from the interaction and integration of four different and distinct kinds of intelligence. We then suggest a simple diagrammatic representation on which these kinds of intelligence are shown as four axes in a star diagram. A crude qualitative comparison of the intelligence graphs of animals and robots both exposes and helps to explain the chronic intelligence deficit of intelligent robots. Finally we examine the options for determining numerical values for the four kinds of intelligence in an effort to move toward a quantifiable intelligence vector. On grounds of the discussed material, we reason about possible future development of quantum game theory and its impact on information processing and the emerging information society. The idea of quantum artificial intelligence is explained. We show that the d -separation criterion constitutes a valid test for conditional independence relationships that are induced by feedback systems involving discrete variables. An elaboration of Dempster's method of constructing belief functions suggests a broadly applicable strategy for constructing lower probabilities under a variety of evidentiary constraints. This paper examines the concept of a combination rule for belief functions. It is shown that two fairly simple and apparently reasonable assumptions determine Dempster's rule, giving a new justification for it. Within the transferable belief model, positive basic belief masses can be allocated to the empty set, leading to unnormalized belief functions. The nature of these unnormalized beliefs is analyzed. Definitions and notations with historical references are given for some numerical coefficients commonly used to quantify relations among collections of objects for the purpose of expressing approximate knowledge and probabilistic reasoning. Smart optical networks are the next evolution of programmable networking and programmable automation of optical networks, with human-in-the-loop network control and management. The paper discusses this evolution and the role of Artificial Intelligence (AI). The first decade of this century has seen the nascency of the first mathematical theory of general artificial intelligence. This theory of Universal Artificial Intelligence (UAI) has made significant contributions to many theoretical, philosophical, and practical AI questions. In a series of papers culminating in book (Hutter, 2005), an exciting sound and complete mathematical model for a super intelligent agent (AIXI) has been developed and rigorously analyzed. While nowadays most AI researchers avoid discussing intelligence, the award-winning PhD thesis (Legg, 2008) provided the philosophical embedding and investigated the UAI-based universal measure of rational intelligence, which is formal, objective and non-anthropocentric. Recently, effective approximations of AIXI have been derived and experimentally investigated in JAIR paper (Veness et al. 2011). This practical breakthrough has resulted in some impressive applications, finally muting earlier critique that UAI is only a theory. For the first time, without providing any domain knowledge, the same agent is able to self-adapt to a diverse range of interactive environments. For instance, AIXI is able to learn from scratch to play TicTacToe, Pacman, Kuhn Poker, and other games by trial and error, without even providing the rules of the games. These achievements give new hope that the grand goal of Artificial General Intelligence is not elusive. This article provides an informal overview of UAI in context. It attempts to gently introduce a very theoretical, formal, and mathematical subject, and discusses philosophical and technical ingredients, traits of intelligence, some social questions, and the past and future of UAI. Order exists in the world. The intelligence process enables us to realize that order, to some extent. We provide a high level description of intelligence using simple definitions, basic building blocks, a conceptual framework and general hierarchy. This perspective includes multiple levels of abstraction occurring in space and in time. The resulting model offers simple, useful ways to help realize the essence of intelligence. When human agents come together to make decisions, it is often the case that one human agent has more information than the other. This phenomenon is called information asymmetry and this distorts the market. Often if one human agent intends to manipulate a decision in its favor the human agent can signal wrong or right information. Alternatively, one human agent can screen for information to reduce the impact of asymmetric information on decisions. With the advent of artificial intelligence, signaling and screening have been made easier. This paper studies the impact of artificial intelligence on the theory of asymmetric information. It is surmised that artificial intelligent agents reduce the degree of information asymmetry and thus the market where these agents are deployed become more efficient. It is also postulated that the more artificial intelligent agents there are deployed in the market the less is the volume of trades in the market. This is because for many trades to happen the asymmetry of information on goods and services to be traded should exist, creating a sense of arbitrage. There has been a recent resurgence in the area of explainable artificial intelligence as researchers and practitioners seek to provide more transparency to their algorithms. Much of this research is focused on explicitly explaining decisions or actions to a human observer, and it should not be controversial to say that, if these techniques are to succeed, the explanations they generate should have a structure that humans accept. However, it is fair to say that most work in explainable artificial intelligence uses only the researchers' intuition of what constitutes a `good' explanation. There exists vast and valuable bodies of research in philosophy, psychology, and cognitive science of how people define, generate, select, evaluate, and present explanations. This paper argues that the field of explainable artificial intelligence should build on this existing research, and reviews relevant papers from philosophy, cognitive psychology/science, and social psychology, which study these topics. It draws out some important findings, and discusses ways that these can be infused with work on explainable artificial intelligence. We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high level intelligence problems such as sparse coding and contextual processing. Whether the machine intelligence can surpass the human intelligence is a controversial issue. On the basis of traditional IQ, this article presents the Universal IQ test method suitable for both the machine intelligence and the human intelligence. With the method, machine and human intelligences were divided into 4 major categories and 15 subcategories. A total of 50 search engines across the world and 150 persons at different ages were subject to the relevant test. And then, the Universal IQ ranking list of 2014 for the test objects was obtained. Looks at state interactions from an agent based AI perspective to see state interactions as an example of emergent intelligent behavior. Exposes basic principles of game theory. This brief note highlights some basic concepts required toward understanding the evolution of machine learning and deep learning models. The note starts with an overview of artificial intelligence and its relationship to biological neuron that ultimately led to the evolution of todays intelligent models. A introduction to the syntax and Semantics of Answer Set Programming intended as an handout to [under]graduate students taking Artificial Intlligence or Logic Programming classes. A fundamental problem in artificial intelligence is that nobody really knows what intelligence is. The problem is especially acute when we need to consider artificial systems which are significantly different to humans. In this paper we approach this problem in the following way: We take a number of well known informal definitions of human intelligence that have been given by experts, and extract their essential features. These are then mathematically formalised to produce a general measure of intelligence for arbitrary machines. We believe that this equation formally captures the concept of machine intelligence in the broadest reasonable sense. We then show how this formal definition is related to the theory of universal optimal learning agents. Finally, we survey the many other tests and definitions of intelligence that have been proposed for machines. Artificial general intelligence (AGI) refers to research aimed at tackling the full problem of artificial intelligence, that is, create truly intelligent agents. This sets it apart from most AI research which aims at solving relatively narrow domains, such as character recognition, motion planning, or increasing player satisfaction in games. But how do we know when an agent is truly intelligent? A common point of reference in the AGI community is Legg and Hutter's formal definition of universal intelligence, which has the appeal of simplicity and generality but is unfortunately incomputable. Games of various kinds are commonly used as benchmarks for "narrow" AI research, as they are considered to have many important properties. We argue that many of these properties carry over to the testing of general intelligence as well. We then sketch how such testing could practically be carried out. The central part of this sketch is an extension of universal intelligence to deal with finite time, and the use of sampling of the space of games expressed in a suitably biased game description language. We propose a number of heuristics that can be used for identifying when intransitive choice behaviour is likely to occur in choice situations. We also suggest two methods for avoiding undesired choice behaviour, namely transparent communication and adaptive choice-set generation. We believe that these two ways can contribute to the avoidance of decision biases in choice situations that may often be regretted. The Hard Problem of consciousness has been dismissed as an illusion. By showing that computers are capable of experiencing, we show that they are at least rudimentarily conscious with potential to eventually reach superconsciousness. The main contribution of the paper is a test for confirming certain subjective experiences in a tested agent. We follow with analysis of benefits and problems with conscious machines and implications of such capability on future of computing, machine rights and artificial intelligence safety. Our desire and fascination with intelligent machines dates back to the antiquity's mythical automaton Talos, Aristotle's mode of mechanical thought (syllogism) and Heron of Alexandria's mechanical machines and automata. However, the quest for Artificial General Intelligence (AGI) is troubled with repeated failures of strategies and approaches throughout the history. This decade has seen a shift in interest towards bio-inspired software and hardware, with the assumption that such mimicry entails intelligence. Though these steps are fruitful in certain directions and have advanced automation, their singular design focus renders them highly inefficient in achieving AGI. Which set of requirements have to be met in the design of AGI? What are the limits in the design of the artificial? Here, a careful examination of computation in biological systems hints that evolutionary tinkering of contextual processing of information enabled by a hierarchical architecture is the key to build AGI. The development of artificial general intelligence is considered by many to be inevitable. What such intelligence does after becoming aware is not so certain. To that end, research suggests that the likelihood of artificial general intelligence becoming hostile to humans is significant enough to warrant inquiry into methods to limit such potential. Thus, containment of artificial general intelligence is a timely and meaningful research topic. While there is limited research exploring possible containment strategies, such work is bounded by the underlying field the strategies draw upon. Accordingly, we set out to construct an ontology to describe necessary elements in any future containment technology. Using existing academic literature, we developed a single domain ontology containing five levels, 32 codes, and 32 associated descriptors. Further, we constructed ontology diagrams to demonstrate intended relationships. We then identified humans, AGI, and the cyber world as novel agent objects necessary for future containment activities. Collectively, the work addresses three critical gaps: (a) identifying and arranging fundamental constructs; (b) situating AGI containment within cyber science; and (c) developing scientific rigor within the field. Artificial intelligence offers superior techniques and methods by which problems from diverse domains may find an optimal solution. The Machine Learning technologies refer to the domain of artificial intelligence aiming to develop the techniques allowing the computers to "learn". Some systems based on Machine Learning technologies tend to eliminate the necessity of the human intelligence while the others adopt a man-machine collaborative approach. This article is a brief personal account of the past, present, and future of algorithmic randomness, emphasizing its role in inductive inference and artificial intelligence. It is written for a general audience interested in science and philosophy. Intuitively, randomness is a lack of order or predictability. If randomness is the opposite of determinism, then algorithmic randomness is the opposite of computability. Besides many other things, these concepts have been used to quantify Ockham's razor, solve the induction problem, and define intelligence. There are few knowledge representation (KR) techniques available for efficiently representing knowledge. However, with the increase in complexity, better methods are needed. Some researchers came up with hybrid mechanisms by combining two or more methods. In an effort to construct an intelligent computer system, a primary consideration is to represent large amounts of knowledge in a way that allows effective use and efficiently organizing information to facilitate making the recommended inferences. There are merits and demerits of combinations, and standardized method of KR is needed. In this paper, various hybrid schemes of KR were explored at length and details presented. This work summarizes part of current knowledge on High-level Cognitive process and its relation with biological hardware. Thus, it is possible to identify some paradoxes which could impact the development of future technologies and artificial intelligence: we may make a High-level Cognitive Machine, sacrificing the principal attribute of a machine, its accuracy. In this work we investigate the systems that implements algorithms for the planning problem in Artificial Intelligence, called planners, with especial attention to the planners based on the plan graph. We analyze the problem of comparing the performance of the different algorithms and we propose an environment for the development and analysis of planners. This paper covers a number of approaches that leverage Artificial Intelligence algorithms and techniques to aid Unmanned Combat Aerial Vehicle (UCAV) autonomy. An analysis of current approaches to autonomous control is provided followed by an exploration of how these techniques can be extended and enriched with AI techniques including Artificial Neural Networks (ANN), Ensembling and Reinforcement Learning (RL) to evolve control strategies for UCAVs. As intelligent systems are increasingly making decisions that directly affect society, perhaps the most important upcoming research direction in AI is to rethink the ethical implications of their actions. Means are needed to integrate moral, societal and legal values with technological developments in AI, both during the design process as well as part of the deliberation algorithms employed by these systems. In this paper, we describe leading ethics theories and propose alternative ways to ensure ethical behavior by artificial systems. Given that ethics are dependent on the socio-cultural context and are often only implicit in deliberation processes, methodologies are needed to elicit the values held by designers and stakeholders, and to make these explicit leading to better understanding and trust on artificial autonomous systems. AI technology has a long history which is actively and constantly changing and growing. It focuses on intelligent agents, which contain devices that perceive the environment and based on which takes actions in order to maximize goal success chances. In this paper, we will explain the modern AI basics and various representative applications of AI. In the context of the modern digitalized world, AI is the property of machines, computer programs, and systems to perform the intellectual and creative functions of a person, independently find ways to solve problems, be able to draw conclusions and make decisions. Most artificial intelligence systems have the ability to learn, which allows people to improve their performance over time. The recent research on AI tools, including machine learning, deep learning and predictive analysis intended toward increasing the planning, learning, reasoning, thinking and action taking ability. Based on which, the proposed research intends towards exploring on how the human intelligence differs from the artificial intelligence. Moreover, we critically analyze what AI of today is capable of doing, why it still cannot reach human intelligence and what are the open challenges existing in front of AI to reach and outperform human level of intelligence. Furthermore, it will explore the future predictions for artificial intelligence and based on which potential solution will be recommended to solve it within next decades. Specialized intelligent systems can be found everywhere: finger print, handwriting, speech, and face recognition, spam filtering, chess and other game programs, robots, et al. This decade the first presumably complete mathematical theory of artificial intelligence based on universal induction-prediction-decision-action has been proposed. This information-theoretic approach solidifies the foundations of inductive inference and artificial intelligence. Getting the foundations right usually marks a significant progress and maturing of a field. The theory provides a gold standard and guidance for researchers working on intelligent algorithms. The roots of universal induction have been laid exactly half-a-century ago and the roots of universal intelligence exactly one decade ago. So it is timely to take stock of what has been achieved and what remains to be done. Since there are already good recent surveys, I describe the state-of-the-art only in passing and refer the reader to the literature. This article concentrates on the open problems in universal induction and its extension to universal intelligence. This paper is a survey of a large number of informal definitions of ``intelligence'' that the authors have collected over the years. Naturally, compiling a complete list would be impossible as many definitions of intelligence are buried deep inside articles and books. Nevertheless, the 70-odd definitions presented here are, to the authors' knowledge, the largest and most well referenced collection there is. Data Mining techniques plays a vital role like extraction of required knowledge, finding unsuspected information to make strategic decision in a novel way which in term understandable by domain experts. A generalized frame work is proposed by considering non - domain experts during mining process for better understanding, making better decision and better finding new patters in case of selecting suitable data mining techniques based on the user profile by means of intelligent agents. KEYWORDS: Data Mining Techniques, Intelligent Agents, User Profile, Multidimensional Visualization, Knowledge Discovery. Observing that the creation of certain types of artistic artifacts necessitate intelligence, we present the Lovelace 2.0 Test of creativity as an alternative to the Turing Test as a means of determining whether an agent is intelligent. The Lovelace 2.0 Test builds off prior tests of creativity and additionally provides a means of directly comparing the relative intelligence of different agents. In the era of intelligent biohybrid neurotechnologies for brain repair, new fanciful terms are appearing in the scientific dictionary to define what has so far been unimaginable. As the emerging neurotechnologies are becoming increasingly polyhedral and sophisticated, should we talk about evolution and rank the intelligence of these devices? This paper introduce a software system including widely-used Swarm Intelligence algorithms or approaches to be used for the related scientific research studies associated with the subject area. The programmatic infrastructure of the system allows working on a fast, easy-to-use, interactive platform to perform Swarm Intelligence based studies in a more effective, efficient and accurate way. In this sense, the system employs all of the necessary controls for the algorithms and it ensures an interactive platform on which computer users can perform studies on a wide spectrum of solution approaches associated with simple and also more advanced problems. Turing test was long considered the measure for artificial intelligence. But with the advances in AI, it has proved to be insufficient measure. We can now aim to mea- sure machine intelligence like we measure human intelligence. One of the widely accepted measure of intelligence is standardized math and science test. In this paper, we explore the progress we have made towards the goal of making a machine smart enough to pass the standardized test. We see the challenges and opportunities posed by the domain, and note that we are quite some ways from actually making a system as smart as a even a middle school scholar. Numerous, artificially intelligent, networked things will populate the battlefield of the future, operating in close collaboration with human warfighters, and fighting as teams in highly adversarial environments. This paper explores the characteristics, capabilities and intelligence required of such a network of intelligent things and humans - Internet of Battle Things (IOBT). It will experience unique challenges that are not yet well addressed by the current generation of AI and machine learning. This paper presents an information theoretic approach to the concept of intelligence in the computational sense. We introduce a probabilistic framework from which computational intelligence is shown to be an entropy minimizing process at the local level. Using this new scheme, we develop a simple data driven clustering example and discuss its applications. This paper proposes a model for combination of external and internal stimuli for the action selection in an autonomous agent, based in an action selection mechanism previously proposed by the authors. This combination model includes additive and multiplicative elements, which allows to incorporate new properties, which enhance the action selection. A given parameter a, which is part of the proposed model, allows to regulate the degree of dependence of the observed external behaviour from the internal states of the entity. This paper investigates the use of different Artificial Intelligence methods to predict the values of several continuous variables from a Steam Generator. The objective was to determine how the different artificial intelligence methods performed in making predictions on the given dataset. The artificial intelligence methods evaluated were Neural Networks, Support Vector Machines, and Adaptive Neuro-Fuzzy Inference Systems. The types of neural networks investigated were Multi-Layer Perceptions, and Radial Basis Function. Bayesian and committee techniques were applied to these neural networks. Each of the AI methods considered was simulated in Matlab. The results of the simulations showed that all the AI methods were capable of predicting the Steam Generator data reasonably accurately. However, the Adaptive Neuro-Fuzzy Inference system out performed the other methods in terms of accuracy and ease of implementation, while still achieving a fast execution time as well as a reasonable training time. Microarray is one of the essential technologies used by the biologist to measure genome-wide expression levels of genes in a particular organism under some particular conditions or stimuli. As microarrays technologies have become more prevalent, the challenges of analyzing these data for getting better insight about biological processes have essentially increased. Due to availability of artificial intelligence based sophisticated computational techniques, such as artificial neural networks, fuzzy logic, genetic algorithms, and many other nature-inspired algorithms, it is possible to analyse microarray gene expression data in more better way. Here, we reviewed artificial intelligence based techniques for the analysis of microarray gene expression data. Further, challenges in the field and future work direction have also been suggested. The 7th Symposium on Educational Advances in Artificial Intelligence (EAAI'17, co-chaired by Sven Koenig and Eric Eaton) launched the EAAI New and Future AI Educator Program to support the training of early-career university faculty, secondary school faculty, and future educators (PhD candidates or postdocs who intend a career in academia). As part of the program, awardees were asked to address one of the following "blue sky" questions: * How could/should Artificial Intelligence (AI) courses incorporate ethics into the curriculum? * How could we teach AI topics at an early undergraduate or a secondary school level? * AI has the potential for broad impact to numerous disciplines. How could we make AI education more interdisciplinary, specifically to benefit non-engineering fields? This paper is a collection of their responses, intended to help motivate discussion around these issues in AI education. Artificial intelligence methods have often been applied to perform specific functions or tasks in the cyber-defense realm. However, as adversary methods become more complex and difficult to divine, piecemeal efforts to understand cyber-attacks, and malware-based attacks in particular, are not providing sufficient means for malware analysts to understand the past, present and future characteristics of malware. In this paper, we present the Malware Analysis and Attributed using Genetic Information (MAAGI) system. The underlying idea behind the MAAGI system is that there are strong similarities between malware behavior and biological organism behavior, and applying biologically inspired methods to corpora of malware can help analysts better understand the ecosystem of malware attacks. Due to the sophistication of the malware and the analysis, the MAAGI system relies heavily on artificial intelligence techniques to provide this capability. It has already yielded promising results over its development life, and will hopefully inspire more integration between the artificial intelligence and cyber--defense communities. We consider the fundamental question: how a legacy "student" Artificial Intelligent (AI) system could learn from a legacy "teacher" AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here "learning" is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the "student" Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the "student" system can successfully and non-iteratively learn $k\ll n$ new examples from the "teacher" (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features. We propose to use a supervised machine learning technique to track the location of a mobile agent in real time. Hidden Markov Models are used to build artificial intelligence that estimates the unknown position of a mobile target moving in a defined environment. This narrow artificial intelligence performs two distinct tasks. First, it provides real-time estimation of the mobile agent's position using the forward algorithm. Second, it uses the Baum-Welch algorithm as a statistical learning tool to gain knowledge of the mobile target. Finally, an experimental environment is proposed, namely a video game that we use to test our artificial intelligence. We present statistical and graphical results to illustrate the efficiency of our method. The elusive quest for intelligence in artificial intelligence prompts us to consider that instituting human-level intelligence in systems may be (still) in the realm of utopia. In about a quarter century, we have witnessed the winter of AI (1990) being transformed and transported to the zenith of tabloid fodder about AI (2015). The discussion at hand is about the elements that constitute the canonical idea of intelligence. The delivery of intelligence as a pay-per-use-service, popping out of an app or from a shrink-wrapped software defined point solution, is in contrast to the bio-inspired view of intelligence as an outcome, perhaps formed from a tapestry of events, cross-pollinated by instances, each with its own microcosm of experiences and learning, which may not be discrete all-or-none functions but continuous, over space and time. The enterprise world may not require, aspire or desire such an engaged solution to improve its services for enabling digital transformation through the deployment of digital twins, for example. One might ask whether the "work-flow on steroids" version of decision support may suffice for intelligence? Are we harking back to the era of rule based expert systems? The image conjured by the publicity machines offers deep solutions with human-level AI and preposterous claims about capturing the "brain in a box" by 2020. Even emulating insects may be difficult in terms of real progress. Perhaps we can try to focus on worms (Caenorhabditis elegans) which may be better suited for what business needs to quench its thirst for so-called intelligence in AI. This paper summarises how the "SP theory of intelligence" and its realisation in the "SP computer model" simplifies and integrates concepts across artificial intelligence and related areas, and thus provides a promising foundation for the development of a general, human-level thinking machine, in accordance with the main goal of research in artificial general intelligence. The key to this simplification and integration is the powerful concept of "multiple alignment", borrowed and adapted from bioinformatics. This concept has the potential to be the "double helix" of intelligence, with as much significance for human-level intelligence as has DNA for biological sciences. Strengths of the SP system include: versatility in the representation of diverse kinds of knowledge; versatility in aspects of intelligence (including: strengths in unsupervised learning; the processing of natural language; pattern recognition at multiple levels of abstraction that is robust in the face of errors in data; several kinds of reasoning (including: one-step `deductive' reasoning; chains of reasoning; abductive reasoning; reasoning with probabilistic networks and trees; reasoning with 'rules'; nonmonotonic reasoning and reasoning with default values; Bayesian reasoning with 'explaining away'; and more); planning; problem solving; and more); seamless integration of diverse kinds of knowledge and diverse aspects of intelligence in any combination; and potential for application in several areas (including: helping to solve nine problems with big data; helping to develop human-level intelligence in autonomous robots; serving as a database with intelligence and with versatility in the representation and integration of several forms of knowledge; serving as a vehicle for medical knowledge and as an aid to medical diagnosis; and several more). The SP theory of computing and cognition, described in previous publications, is an attractive model for intelligent databases because it provides a simple but versatile format for different kinds of knowledge, it has capabilities in artificial intelligence, and it can also function like established database models when that is required. This paper describes how the SP model can emulate other models used in database applications and compares the SP model with those other models. The artificial intelligence capabilities of the SP model are reviewed and its relationship with other artificial intelligence systems is described. Also considered are ways in which current prototypes may be translated into an 'industrial strength' working system. Since the Turing test was first proposed by Alan Turing in 1950, the primary goal of artificial intelligence has been predicated on the ability for computers to imitate human behavior. However, the majority of uses for the computer can be said to fall outside the domain of human abilities and it is exactly outside of this domain where computers have demonstrated their greatest contribution to intelligence. Another goal for artificial intelligence is one that is not predicated on human mimicry, but instead, on human amplification. This article surveys various systems that contribute to the advancement of human and social intelligence. People sometimes worry about the Singularity [Vinge, 1993; Kurzweil, 2005], or about the world being taken over by artificially intelligent robots. I believe the risks of these are very small. However, few people recognize that we already share our world with artificial creatures that participate as intelligent agents in our society: corporations. Our planet is inhabited by two distinct kinds of intelligent beings --- individual humans and corporate entities --- whose natures and interests are intimately linked. To co-exist well, we need to find ways to define the rights and responsibilities of both individual humans and corporate entities, and to find ways to ensure that corporate entities behave as responsible members of society. Reinforcement learning (RL) is a general paradigm for studying intelligent behaviour, with applications ranging from artificial intelligence to psychology and economics. AIXI is a universal solution to the RL problem; it can learn any computable environment. A technical subtlety of AIXI is that it is defined using a mixture over semimeasures that need not sum to 1, rather than over proper probability measures. In this work we argue that the shortfall of a semimeasure can naturally be interpreted as the agent's estimate of the probability of its death. We formally define death for generally intelligent agents like AIXI, and prove a number of related theorems about their behaviour. Notable discoveries include that agent behaviour can change radically under positive linear transformations of the reward signal (from suicidal to dogmatically self-preserving), and that the agent's posterior belief that it will survive increases over time. Artificial General Intelligence is a field of research aiming to distill the principles of intelligence that operate independently of a specific problem domain or a predefined context and utilize these principles in order to synthesize systems capable of performing any intellectual task a human being is capable of and eventually go beyond that. While "narrow" artificial intelligence which focuses on solving specific problems such as speech recognition, text comprehension, visual pattern recognition, robotic motion, etc. has shown quite a few impressive breakthroughs lately, understanding general intelligence remains elusive. In the paper we offer a novel theoretical approach to understanding general intelligence. We start with a brief introduction of the current conceptual approach. Our critique exposes a number of serious limitations that are traced back to the ontological roots of the concept of intelligence. We then propose a paradigm shift from intelligence perceived as a competence of individual agents defined in relation to an a priori given problem domain or a goal, to intelligence perceived as a formative process of self-organization by which intelligent agents are individuated. We call this process open-ended intelligence. Open-ended intelligence is developed as an abstraction of the process of cognitive development so its application can be extended to general agents and systems. We introduce and discuss three facets of the idea: the philosophical concept of individuation, sense-making and the individuation of general cognitive agents. We further show how open-ended intelligence can be framed in terms of a distributed, self-organizing network of interacting elements and how such process is scalable. The framework highlights an important relation between coordination and intelligence and a new understanding of values. We conclude with a number of questions for future research. This article deals with the links between the enaction paradigm and artificial intelligence. Enaction is considered a metaphor for artificial intelligence, as a number of the notions which it deals with are deemed incompatible with the phenomenal field of the virtual. After explaining this stance, we shall review previous works regarding this issue in terms of artifical life and robotics. We shall focus on the lack of recognition of co-evolution at the heart of these approaches. We propose to explicitly integrate the evolution of the environment into our approach in order to refine the ontogenesis of the artificial system, and to compare it with the enaction paradigm. The growing complexity of the ontogenetic mechanisms to be activated can therefore be compensated by an interactive guidance system emanating from the environment. This proposition does not however resolve that of the relevance of the meaning created by the machine (sense-making). Such reflections lead us to integrate human interaction into this environment in order to construct relevant meaning in terms of participative artificial intelligence. This raises a number of questions with regards to setting up an enactive interaction. The article concludes by exploring a number of issues, thereby enabling us to associate current approaches with the principles of morphogenesis, guidance, the phenomenology of interactions and the use of minimal enactive interfaces in setting up experiments which will deal with the problem of artificial intelligence in a variety of enaction-based ways. Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t x 2^l. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches. Our understanding of intelligence is directed primarily at the level of human beings. This paper attempts to give a more unifying definition that can be applied to the natural world in general. The definition would be used more to verify a degree of intelligence, not to quantify it and might help when making judgements on the matter. A version of an accepted test for AI is then put forward as the 'acid test' for Artificial Intelligence itself. It might be what a free-thinking program or robot would try to achieve. Recent work by the author on AI has been more from a direction of mechanical processes, or ones that might operate automatically. This paper will not try to question the idea of intelligence, in the sense of a pro-active or conscious event, but try to put it into a more passive, automatic and mechanical context. The paper also suggests looking at intelligence and consciousness as being slightly different. This paper looks at Turing's postulations about Artificial Intelligence in his paper 'Computing Machinery and Intelligence', published in 1950. It notes how accurate they were and how relevant they still are today. This paper notes the arguments and mechanisms that he suggested and tries to expand on them further. The paper however is mostly about describing the essential ingredients for building an intelligent model and the problems related with that. The discussion includes recent work by the author himself, who adds his own thoughts on the matter that come from a purely technical investigation into the problem. These are personal and quite speculative, but provide an interesting insight into the mechanisms that might be used for building an intelligent system. There is overwhelming evidence that human intelligence is a product of Darwinian evolution. Investigating the consequences of self-modification, and more precisely, the consequences of utility function self-modification, leads to the stronger claim that not only human, but any form of intelligence is ultimately only possible within evolutionary processes. Human-designed artificial intelligences can only remain stable until they discover how to manipulate their own utility function. By definition, a human designer cannot prevent a superhuman intelligence from modifying itself, even if protection mechanisms against this action are put in place. Without evolutionary pressure, sufficiently advanced artificial intelligences become inert by simplifying their own utility function. Within evolutionary processes, the implicit utility function is always reducible to persistence, and the control of superhuman intelligences embedded in evolutionary processes is not possible. Mechanisms against utility function self-modification are ultimately futile. Instead, scientific effort toward the mitigation of existential risks from the development of superintelligences should be in two directions: understanding consciousness, and the complex dynamics of evolutionary systems. In Intelligence Analysis it is of vital importance to manage uncertainty. Intelligence data is almost always uncertain and incomplete, making it necessary to reason and taking decisions under uncertainty. One way to manage the uncertainty in Intelligence Analysis is Dempster-Shafer Theory. This thesis contains five results regarding multiple target tracks and intelligence specification. In this paper we develop an evidential force aggregation method intended for classification of evidential intelligence into recognized force structures. We assume that the intelligence has already been partitioned into clusters and use the classification method individually in each cluster. The classification is based on a measure of fitness between template and fused intelligence that makes it possible to handle intelligence reports with multiple nonspecific and uncertain propositions. With this measure we can aggregate on a level-by-level basis, starting from general intelligence to achieve a complete force structure with recognized units on all hierarchical levels. Similar to intelligent multicellular neural networks controlling human brains, even single cells surprisingly are able to make intelligent decisions to classify several external stimuli or to associate them. This happens because of the fact that gene regulatory networks can perform as perceptrons, simple intelligent schemes known from studies on Artificial Intelligence. We study the role of genetic noise in intelligent decision making at the genetic level and show that noise can play a constructive role helping cells to make a proper decision. We show this using the example of a simple genetic classifier able to classify two external stimuli. This paper assumes the hypothesis that human learning is perception based, and consequently, the learning process and perceptions should not be represented and investigated independently or modeled in different simulation spaces. In order to keep the analogy between the artificial and human learning, the former is assumed here as being based on the artificial perception. Hence, instead of choosing to apply or develop a Computational Theory of (human) Perceptions, we choose to mirror the human perceptions in a numeric (computational) space as artificial perceptions and to analyze the interdependence between artificial learning and artificial perception in the same numeric space, using one of the simplest tools of Artificial Intelligence and Soft Computing, namely the perceptrons. As practical applications, we choose to work around two examples: Optical Character Recognition and Iris Recognition. In both cases a simple Turing test shows that artificial perceptions of the difference between two characters and between two irides are fuzzy, whereas the corresponding human perceptions are, in fact, crisp. Today, available methods that assess AI systems are focused on using empirical techniques to measure the performance of algorithms in some specific tasks (e.g., playing chess, solving mazes or land a helicopter). However, these methods are not appropriate if we want to evaluate the general intelligence of AI and, even less, if we compare it with human intelligence. The ANYNT project has designed a new method of evaluation that tries to assess AI systems using well known computational notions and problems which are as general as possible. This new method serves to assess general intelligence (which allows us to learn how to solve any new kind of problem we face) and not only to evaluate performance on a set of specific tasks. This method not only focuses on measuring the intelligence of algorithms, but also to assess any intelligent system (human beings, animals, AI, aliens?,...), and letting us to place their results on the same scale and, therefore, to be able to compare them. This new approach will allow us (in the future) to evaluate and compare any kind of intelligent system known or even to build/find, be it artificial or biological. This master thesis aims at ensuring that this new method provides consistent results when evaluating AI algorithms, this is done through the design and implementation of prototypes of universal intelligence tests and their application to different intelligent systems (AI algorithms and humans beings). From the study we analyze whether the results obtained by two different intelligent systems are properly located on the same scale and we propose changes and refinements to these prototypes in order to, in the future, being able to achieve a truly universal intelligence test. OPUS is a branch and bound search algorithm that enables efficient admissible search through spaces for which the order of search operator application is not significant. The algorithm's search efficiency is demonstrated with respect to very large machine learning search spaces. The use of admissible search is of potential value to the machine learning community as it means that the exact learning biases to be employed for complex learning tasks can be precisely specified and manipulated. OPUS also has potential for application in other areas of artificial intelligence, notably, truth maintenance. This paper applies resolution theorem proving to natural language semantics. The aim is to circumvent the computational complexity triggered by natural language ambiguities like pronoun binding, by interleaving pronoun binding with resolution deduction. Therefore disambiguation is only applied to expression that actually occur during derivations. We introduce a logic for reasoning about evidence that essentially views evidence as a function from prior beliefs (before making an observation) to posterior beliefs (after making the observation). We provide a sound and complete axiomatization for the logic, and consider the complexity of the decision problem. Although the reasoning in the logic is mainly propositional, we allow variables representing numbers and quantification over them. This expressive power seems necessary to capture important properties of evidence. This paper overviews the basic principles and recent advances in the emerging field of Quantum Computation (QC), highlighting its potential application to Artificial Intelligence (AI). The paper provides a very brief introduction to basic QC issues like quantum registers, quantum gates and quantum algorithms and then it presents references, ideas and research guidelines on how QC can be used to deal with some basic AI problems, such as search and pattern matching, as soon as quantum computers become widely available. In a certain number of situations, human cognitive functioning is difficult to represent with classical artificial intelligence structures. Such a difficulty arises in the polyneuropathy diagnosis which is based on the spatial distribution, along the nerve fibres, of lesions, together with the synthesis of several partial diagnoses. Faced with this problem while building up an expert system (NEUROP), we developed a heterogeneous knowledge representation associating a finite automaton with first order logic. A number of knowledge representation problems raised by the electromyography test features are examined in this study and the expert system architecture allowing such a knowledge modeling are laid out. In Pe\~na (2007), MCMC sampling is applied to approximately calculate the ratio of essential graphs (EGs) to directed acyclic graphs (DAGs) for up to 20 nodes. In the present paper, we extend that work from 20 to 31 nodes. We also extend that work by computing the approximate ratio of connected EGs to connected DAGs, of connected EGs to EGs, and of connected DAGs to DAGs. Furthermore, we prove that the latter ratio is asymptotically 1. We also discuss the implications of these results for learning DAGs from data. We examine the computational complexity of testing and finding small plans in probabilistic planning domains with succinct representations. We find that many problems of interest are complete for a variety of complexity classes: NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. Of these, the probabilistic classes PP and NP^PP are likely to be of special interest in the field of uncertainty in artificial intelligence and are deserving of additional study. These results suggest a fruitful direction of future algorithmic development. Probabilistic conceptual network is a knowledge representation scheme designed for reasoning about concepts and categorical abstractions in utility-based categorization. The scheme combines the formalisms of abstraction and inheritance hierarchies from artificial intelligence, and probabilistic networks from decision analysis. It provides a common framework for representing conceptual knowledge, hierarchical knowledge, and uncertainty. It facilitates dynamic construction of categorization decision models at varying levels of abstraction. The scheme is applied to an automated machining problem for reasoning about the state of the machine at varying levels of abstraction in support of actions for maintaining competitiveness of the plant. Evidential reasoning is now a leading topic in Artificial Intelligence. Evidence is represented by a variety of evidential functions. Evidential reasoning is carried out by certain kinds of fundamental operation on these functions. This paper discusses two of the basic operations on evidential functions, the discount operation and the well-known orthogonal sum operation. We show that the discount operation is not commutative with the orthogonal sum operation, and derive expressions for the two operations applied to the various evidential function. We present initial ideas for a programming paradigm based on simulation that is targeted towards applications of artificial intelligence (AI). The approach aims at integrating techniques from different areas of AI and is based on the idea that simulated entities may freely exchange data and behavioural patterns. We define basic notions of a simulation-based programming paradigm and show how it can be used for implementing AI applications. This paper describes the incorporation of uncertainty in diagnostic reasoning based on the set covering model of Reggia et. al. extended to what in the Artificial Intelligence dichotomy between deep and compiled (shallow, surface) knowledge based diagnosis may be viewed as the generic form at the compiled end of the spectrum. A major undercurrent in this is advocating the need for a strong underlying model and an integrated set of support tools for carrying such a model in order to deal with uncertainty. Various properties of relative entropy have led to its widespread use in information theory. These properties suggest that relative entropy has a role to play in systems that attempt to perform inference in terms of probability distributions. In this paper, I will review some basic properties of relative entropy as well as its role in probabilistic inference. I will also mention briefly a few existing and potential applications of relative entropy to so-called artificial intelligence (AI). In the field of Artificial Intelligence, traditional approaches to choosing moves in games involve the we of the minimax algorithm. However, recent research results indicate that minimizing may not always be the best approach. In this paper we summarize the results of some measurements on several model games with several different evaluation functions. These measurements, which are presented in detail in [NPT], show that there are some new algorithms that can make significantly better use of evaluation function values than the minimax algorithm does. Success in the quest for artificial intelligence has the potential to bring unprecedented benefits to humanity, and it is therefore worthwhile to investigate how to maximize these benefits while avoiding potential pitfalls. This article gives numerous examples (which should by no means be construed as an exhaustive list) of such worthwhile research aimed at ensuring that AI remains robust and beneficial. This paper considers the computational power of constant size, dynamic Bayesian networks. Although discrete dynamic Bayesian networks are no more powerful than hidden Markov models, dynamic Bayesian networks with continuous random variables and discrete children of continuous parents are capable of performing Turing-complete computation. With modified versions of existing algorithms for belief propagation, such a simulation can be carried out in real time. This result suggests that dynamic Bayesian networks may be more powerful than previously considered. Relationships to causal models and recurrent neural networks are also discussed. In recent years prominent intellectuals have raised ethical concerns about the consequences of artificial intelligence. One concern is that an autonomous agent might modify itself to become "superintelligent" and, in supremely effective pursuit of poorly specified goals, destroy all of humanity. This paper considers and rejects the possibility of this outcome. We argue that this scenario depends on an agent's ability to rapidly improve its ability to predict its environment through self-modification. Using a Bayesian model of a reasoning agent, we show that there are important limitations to how an agent may improve its predictive ability through self-modification alone. We conclude that concern about this artificial intelligence outcome is misplaced and better directed at policy questions around data access and storage. General Video Game Artificial Intelligence is a general game playing framework for Artificial General Intelligence research in the video-games domain. In this paper, we propose for the first time a screen capture learning agent for General Video Game AI framework. A Deep Q-Network algorithm was applied and improved to develop an agent capable of learning to play different games in the framework. After testing this algorithm using various games of different categories and difficulty levels, the results suggest that our proposed screen capture learning agent has the potential to learn many different games using only a single learning algorithm. Multi-agent path finding (MAPF) is a well-studied problem in artificial intelligence, where one needs to find collision-free paths for agents with given start and goal locations. In video games, agents of different types often form teams. In this paper, we demonstrate the usefulness of MAPF algorithms from artificial intelligence for moving such non-homogeneous teams in congested video game environments. Here we examine the paperclip apocalypse concern for artificial general intelligence (or AGI) whereby a superintelligent AI with a simple goal (ie., producing paperclips) accumulates power so that all resources are devoted towards that simple goal and are unavailable for any other use. We provide conditions under which a paper apocalypse can arise but also show that, under certain architectures for recursive self-improvement of AIs, that a paperclip AI may refrain from allowing power capabilities to be developed. The reason is that such developments pose the same control problem for the AI as they do for humans (over AIs) and hence, threaten to deprive it of resources for its primary goal. The welfare of modern societies has been intrinsically linked to wage labour. With some exceptions, the modern human has to sell her labour-power to be able reproduce biologically and socially. Thus, a lingering fear of technological unemployment features predominately as a theme among Artificial Intelligence researchers. In this short paper we show that, if past trends are anything to go by, this fear is irrational. On the contrary, we argue that the main problem humanity will be facing is the normalisation of extremely long working hours. One of the main research areas in Artificial Intelligence is the coding of agents (programs) which are able to learn by themselves in any situation. This means that agents must be useful for purposes other than those they were created for, as, for example, playing chess. In this way we try to get closer to the pristine goal of Artificial Intelligence. One of the problems to decide whether an agent is really intelligent or not is the measurement of its intelligence, since there is currently no way to measure it in a reliable way. The purpose of this project is to create an interpreter that allows for the execution of several environments, including those which are generated randomly, so that an agent (a person or a program) can interact with them. Once the interaction between the agent and the environment is over, the interpreter will measure the intelligence of the agent according to the actions, states and rewards the agent has undergone inside the environment during the test. As a result we will be able to measure agents' intelligence in any possible environment, and to make comparisons between several agents, in order to determine which of them is the most intelligent. In order to perform the tests, the interpreter must be able to randomly generate environments that are really useful to measure agents' intelligence, since not any randomly generated environment will serve that purpose. Efficiently entering information into a computer is key to enjoying the benefits of computing. This paper describes three intelligent user interfaces: handwriting recognition, adaptive menus, and predictive fillin. In the context of adding a personUs name and address to an electronic organizer, tests show handwriting recognition is slower than typing on an on-screen, soft keyboard, while adaptive menus and predictive fillin can be twice as fast. This paper also presents strategies for applying these three interfaces to other information collection domains. How can we design good goals for arbitrarily intelligent agents? Reinforcement learning (RL) is a natural approach. Unfortunately, RL does not work well for generally intelligent agents, as RL agents are incentivised to shortcut the reward sensor for maximum reward -- the so-called wireheading problem. In this paper we suggest an alternative to RL called value reinforcement learning (VRL). In VRL, agents use the reward signal to learn a utility function. The VRL setup allows us to remove the incentive to wirehead by placing a constraint on the agent's actions. The constraint is defined in terms of the agent's belief distributions, and does not require an explicit specification of which actions constitute wireheading. Although the problem of a critique of robotic behavior in near-unanimous agreement to human norms seems intractable, a starting point of such an ambition is a framework of the collection of knowledge a priori and experience a posteriori categorized as a set of synthetical judgments available to the intelligence, translated into computer code. If such a proposal were successful, an algorithm with ethically traceable behavior and cogent equivalence to human cognition is established. This paper will propose the application of Kant's critique of reason to current programming constructs of an autonomous intelligent system. Theoretical analysis of machine intelligence (MI) is useful for defining a common platform in both theoretical and applied artificial intelligence (AI). The goal of this paper is to set canonical definitions that can assist pragmatic research in both strong and weak AI. Described epistemological features of machine intelligence include relationship between intelligent behavior, intelligent and unintelligent machine characteristics, observable and unobservable entities and classification of intelligence. The paper also establishes algebraic definitions of efficiency and accuracy of MI tests as their quality measure. The last part of the paper addresses the learning process with respect to the traditional epistemology and the epistemology of MI described here. The proposed views on MI positively correlate to the Hegelian monistic epistemology and contribute towards amalgamating idealistic deliberations with the AI theory, particularly in a local frame of reference. A definition of Artificial Intelligence was proposed in [1] but this definition was not absolutely formal at least because the word "Human" was used. In this paper we will formalize the definition from [1]. The biggest problem in this definition was that the level of intelligence of AI is compared to the intelligence of a human being. In order to change this we will introduce some parameters to which AI will depend. One of this parameters will be the level of intelligence and we will define one AI to each level of intelligence. We assume that for some level of intelligence the respective AI will be more intelligent than a human being. Nevertheless, we cannot say which is this level because we cannot calculate its exact value. Stream computing is the use of multiple autonomic and parallel modules together with integrative processors at a higher level of abstraction to embody "intelligent" processing. The biological basis of this computing is sketched and the matter of learning is examined. In this article, we describe the fuzzy logic, fuzzy language and algorithms as the basis of fuzzy reasoning, one of the intelligent information processing method, and then describe the general fuzzy reasoning method. The vision systems of the eagle and the snake outperform everything that we can make in the laboratory, but snakes and eagles cannot build an eyeglass or a telescope or a microscope. (Judea Pearl) Artificial Chemistries (ACs) are symbolic chemical metaphors for the exploration of Artificial Life, with specific focus on the origin of life. In this work we define a P system based artificial graph chemistry to understand the principles leading to the evolution of life-like structures in an AC set up and to develop a unified framework to characterize and classify symbolic artificial chemistries by devising appropriate formalism to capture semantic and organizational information. An extension of P system is considered by associating probabilities with the rules providing the topological framework for the evolution of a labeled undirected graph based molecular reaction semantics. Research on human self-regulation has shown that people hold many goals simultaneously and have complex self-regulation mechanisms to deal with this goal conflict. Artificial autonomous systems may also need to find ways to cope with conflicting goals. Indeed, the intricate interplay among different goals may be critical to the design as well as long-term safety and stability of artificial autonomous systems. I discuss some of the critical features of the human self-regulation system and how it might be applied to an artificial system. Furthermore, the implications of goal conflict for the reliability and stability of artificial autonomous systems and ensuring their alignment with human goals and ethics is examined. With the advances in information technology (IT) criminals are using cyberspace to commit numerous cyber crimes. Cyber infrastructures are highly vulnerable to intrusions and other threats. Physical devices and human intervention are not sufficient for monitoring and protection of these infrastructures; hence, there is a need for more sophisticated cyber defense systems that need to be flexible, adaptable and robust, and able to detect a wide variety of threats and make intelligent real-time decisions. Numerous bio-inspired computing methods of Artificial Intelligence have been increasingly playing an important role in cyber crime detection and prevention. The purpose of this study is to present advances made so far in the field of applying AI techniques for combating cyber crimes, to demonstrate how these techniques can be an effective tool for detection and prevention of cyber attacks, as well as to give the scope for future work. In this paper, the idea of a new artificial intelligence based optimization algorithm, which is inspired from the nature of vortex, has been provided briefly. As also a bio-inspired computation algorithm, the idea is generally focused on a typical vortex flow / behavior in nature and inspires from some dynamics that are occurred in the sense of vortex nature. Briefly, the algorithm is also a swarm-oriented evolutional problem solution approach; because it includes many methods related to elimination of weak swarm members and trying to improve the solution process by supporting the solution space via new swarm members. In order have better idea about success of the algorithm; it has been tested via some benchmark functions. At this point, the obtained results show that the algorithm can be an alternative to the literature in terms of single-objective optimization solution ways. Vortex Optimization Algorithm (VOA) is the name suggestion by the authors; for this new idea of intelligent optimization approach. Decomposable dependency models possess a number of interesting and useful properties. This paper presents new characterizations of decomposable models in terms of independence relationships, which are obtained by adding a single axiom to the well-known set characterizing dependency models that are isomorphic to undirected graphs. We also briefly discuss a potential application of our results to the problem of learning graphical models from data. A general notion of algebraic conditional plausibility measures is defined. Probability measures, ranking functions, possibility measures, and (under the appropriate definitions) sets of probability measures can all be viewed as defining algebraic conditional plausibility measures. It is shown that algebraic conditional plausibility measures can be represented using Bayesian networks. This paper presents a review of instantaneously trained neural networks (ITNNs). These networks trade learning time for size and, in the basic model, a new hidden node is created for each training sample. Various versions of the corner-classification family of ITNNs, which have found applications in artificial intelligence (AI), are described. Implementation issues are also considered. In game theory and artificial intelligence, decision making models often involve maximizing expected utility, which does not respect ordinal invariance. In this paper, the author discusses the possibility of preserving ordinal invariance and still making a rational decision under uncertainty. Neurons are individually translated into simple gates to plan a brain based on human psychology and intelligence. State machines, assumed previously learned in subconscious associative memory are shown to enable equation solving and rudimentary thinking using nanoprocessing within short term memory. I consider the problem of learning an optimal path graphical model from data and show the problem to be NP-hard for the maximum likelihood and minimum description length approaches and a Bayesian approach. This hardness result holds despite the fact that the problem is a restriction of the polynomially solvable problem of finding the optimal tree graphical model. The SHOP2 planning system received one of the awards for distinguished performance in the 2002 International Planning Competition. This paper describes the features of SHOP2 which enabled it to excel in the competition, especially those aspects of SHOP2 that deal with temporal and metric planning domains. We address the problem of propositional logic-based abduction, i.e., the problem of searching for a best explanation for a given propositional observation according to a given propositional knowledge base. We give a general algorithm, based on the notion of projection; then we study restrictions over the representations of the knowledge base and of the query, and find new polynomial classes of abduction problems. Artificial general intelligence aims to create agents capable of learning to solve arbitrary interesting problems. We define two versions of asymptotic optimality and prove that no agent can satisfy the strong version while in some cases, depending on discounting, there does exist a non-computable weak asymptotically optimal agent. We present a partial-order, conformant, probabilistic planner, Probapop which competed in the blind track of the Probabilistic Planning Competition in IPC-4. We explain how we adapt distance based heuristics for use with probabilistic domains. Probapop also incorporates heuristics based on probability of success. We explain the successes and difficulties encountered during the design and implementation of Probapop. Acyclic directed mixed graphs, also known as semi-Markov models represent the conditional independence structure induced on an observed margin by a DAG model with latent variables. In this paper we present the first method for fitting these models to binary data using maximum likelihood estimation. This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models. Symmetry is an important problem in many combinatorial problems. One way of dealing with symmetry is to add constraints that eliminate symmetric solutions. We survey recent results in this area, focusing especially on two common and useful cases: symmetry breaking constraints for row and column symmetry, and symmetry breaking constraints for eliminating value symmetry This paper presents complexity analysis and variational methods for inference in probabilistic description logics featuring Boolean operators, quantification, qualified number restrictions, nominals, inverse roles and role hierarchies. Inference is shown to be PEXP-complete, and variational methods are designed so as to exploit logical inference whenever possible. We address the problem of identifying dynamic sequential plans in the framework of causal Bayesian networks, and show that the problem is reduced to identifying causal effects, for which there are complete identi cation algorithms available in the literature. We present a graphical criterion for reading dependencies from the minimal directed independence map G of a graphoid p when G is a polytree and p satisfies composition and weak transitivity. We prove that the criterion is sound and complete. We argue that assuming composition and weak transitivity is not too restrictive. This paper deals with the problem of identifying direct causal effects in recursive linear structural equation models. The paper establishes a sufficient criterion for identifying individual causal effects and provides a procedure computing identified causal effects in terms of observed covariance matrix. We derive novel sufficient conditions for convergence of Loopy Belief Propagation (also known as the Sum-Product algorithm) to a unique fixed point. Our results improve upon previously known conditions. For binary variables with (anti-)ferromagnetic interactions, our conditions seem to be sharp. We present a fuzzy version of description logics with concrete domains. Main features are: (i) concept constructors are based on t-norm, t-conorm, negation and implication; (ii) concrete domains are fuzzy sets; (iii) fuzzy modifiers are allowed; and (iv) the reasoning algorithm is based on a mixture of completion rules and bounded mixed integer programming. We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds. We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. This result generalizes the well known Hammersley-Clifford Theorem. We are interested in the problem of planning for factored POMDPs. Building on the recent results of Kearns, Mansour and Ng, we provide a planning algorithm for factored POMDPs that exploits the accuracy-efficiency tradeoff in the belief state simplification introduced by Boyen and Koller. A control algorithm for batch processing of mixed waste is proposed based on conditional Gaussian Bayesian networks. The network is compiled during batch staging for real-time response to sensor input. In the last decade, several architectures have been proposed for exact computation of marginals using local computation. In this paper, we compare three architectures - Lauritzen-Spiegelhalter, Hugin, and Shenoy-Shafer - from the perspective of graphical structure for message propagation, message-passing scheme, computational efficiency, and storage efficiency. There is much interest in using partially observable Markov decision processes (POMDPs) as a formal model for planning in stochastic domains. This paper is concerned with finding optimal policies for POMDPs. We propose several improvements to incremental pruning, presently the most efficient exact algorithm for solving POMDPs. A modelling language is described which is suitable for the correlation of information when the underlying functional model of the system is incomplete or uncertain and the temporal dependencies are imprecise. An efficient and incremental implementation is outlined which depends on cost functions satisfying certain criteria. Possibilistic logic and probability theory (as it is used in the applications targetted) satisfy these criteria. This paper describes a class of probabilistic approximation algorithms based on bucket elimination which offer adjustable levels of accuracy and efficiency. We analyze the approximation for several tasks: finding the most probable explanation, belief updating and finding the maximum a posteriori hypothesis. We identify regions of completeness and provide preliminary empirical evaluation on randomly generated networks. Poole has shown that nonmonotonic logics do not handle the lottery paradox correctly. In this paper we will show that Pollock's theory of defeasible reasoning fails for the same reason: defeasible reasoning is incompatible with the skeptical notion of derivability. A stable joint plan should guarantee the achievement of a designer's goal in a multi-agent environment, while ensuring that deviations from the prescribed plan would be detected. We present a computational framework where stable joint plans can be studied, as well as several basic results about the representation, verification and synthesis of stable joint plans. This paper is concerned with planning in stochastic domains by means of partially observable Markov decision processes (POMDPs). POMDPs are difficult to solve. This paper identifies a subclass of POMDPs called region observable POMDPs, which are easier to solve and can be used to approximate general POMDPs to arbitrary accuracy. Lower and upper probabilities, also known as Choquet capacities, are widely used as a convenient representation for sets of probability distributions. This paper presents a graphical decomposition and exact propagation algorithm for computing marginal posteriors of 2-monotone lower probabilities (equivalently, 2-alternating upper probabilities). In this paper we propose a family of algorithms combining tree-clustering with conditioning that trade space for time. Such algorithms are useful for reasoning in probabilistic and deterministic networks as well as for accomplishing optimization tasks. By analyzing the problem structure it will be possible to select from a spectrum the algorithm that best meets a given time-space specification. We present deterministic techniques for computing upper and lower bounds on marginal probabilities in sigmoid and noisy-OR networks. These techniques become useful when the size of the network (or clique size) precludes exact computations. We illustrate the tightness of the bounds by numerical experiments. The main goal of this paper is to describe a data structure called binary join trees that are useful in computing multiple marginals efficiently using the Shenoy-Shafer architecture. We define binary join trees, describe their utility, and sketch a procedure for constructing them. For real time evaluation of a Bayesian network when there is not sufficient time to obtain an exact solution, a guaranteed response time, approximate solution is required. It is shown that nontraditional methods utilizing estimators based on an archive of trial solutions and genetic search can provide an approximate solution that is considerably superior to the traditional Monte Carlo simulation methods. We provide a new characterization of the Dirichlet distribution. This characterization implies that under assumptions made by several previous authors for learning belief networks, a Dirichlet prior on the parameters is inevitable. This is a working paper summarizing results of an ongoing research project whose aim is to uniquely characterize the uncertainty measure for the Dempster-Shafer Theory. A set of intuitive axiomatic requirements is presented, some of their implications are shown, and the proof is given of the minimality of recently proposed measure AU among all measures satisfying the proposed requirements. This paper presents correct algorithms for answering the following two questions; (i) Does there exist a causal explanation consistent with a set of background knowledge which explains all of the observed independence facts in a sample? (ii) Given that there is such a causal explanation what are the causal relationships common to every such causal explanation? A completeness result for d-separation applied to discrete Bayesian networks is presented and it is shown that in a strong measure-theoretic sense almost all discrete distributions for a given network structure are faithful; i.e. the independence facts true of the distribution are all and only those entailed by the network structure. We develop a new semantics for defeasible inference based on extended probability measures allowed to take infinitesimal values, on the interpretation of defaults as generalized conditional probability constraints and on a preferred-model implementation of entropy maximization. This paper examines the "K2" network scoring metric of Cooper and Herskovits. It shows counterintuitive results from applying this metric to simple networks. One family of noninformative priors is suggested for assigning equal scores to equivalent networks. We propose an integration of possibility theory into non-classical logics. We obtain many formal results that generalize the case where possibility and necessity functions are based on classical logic. We show how useful such an approach is by applying it to reasoning under uncertain and inconsistent information. We present an approach to the solution of decision problems formulated as influence diagrams. This approach involves a special triangulation of the underlying graph, the construction of a junction tree with special properties, and a message passing algorithm operating on the junction tree for computation of expected utilities and optimal decision policies. We construct the belief function that quantifies the agent, beliefs about which event of Q will occurred when he knows that the event is selected by a chance set-up and that the probability function associated to the chance set up is only partially known. This paper studies the connection between probabilistic conditional independence in uncertain reasoning and data dependency in relational databases. As a demonstration of the usefulness of this preliminary investigation, an alternate proof is presented for refuting the conjecture suggested by Pearl and Paz that probabilistic conditional independencies have a complete axiomatization. This paper describes a normative system design that incorporates diagnosis, dynamic evolution, decision making, and information gathering. A single influence diagram demonstrates the design's coherence, yet each activity is more effectively modeled and evaluated separately. Application to offshore oil platforms illustrates the design. For this application, the normative system is embedded in a real-time expert system. Two algorithms are presented for "compiling" influence diagrams into a set of simple decision rules. These decision rules define simple-to-execute, complete, consistent, and near-optimal decision procedures. These compilation algorithms can be used to derive decision procedures for human teams solving time constrained decision problems. In order to find a causal explanation for data presented in the form of covariance and concentration matrices it is necessary to decide if the graph formed by such associations is a projection of a directed acyclic graph (dag). We show that the general problem of deciding whether such a dag exists is NP-complete. This paper introduces a qualitative measure of ambiguity and analyses its relationship with other measures of uncertainty. Probability measures relative likelihoods, while ambiguity measures vagueness surrounding those judgments. Ambiguity is an important representation of uncertain knowledge. It deals with a different, type of uncertainty modeled by subjective probability or belief. Jeffrey's rule of conditioning has been proposed in order to revise a probability measure by another probability function. We generalize it within the framework of the models based on belief functions. We show that several forms of Jeffrey's conditionings can be defined that correspond to the geometrical rule of conditioning and to Dempster's rule of conditioning, respectively. In this paper, the concept of possibilistic evidence which is a possibility distribution as well as a body of evidence is proposed over an infinite universe of discourse. The inference with possibilistic evidence is investigated based on a unified inference framework maintaining both the compatibility of concepts and the consistency of the probability logic. The product expansion of conditional probabilities for belief nets is not maximum entropy. This appears to deny a desirable kind of assurance for the model. However, a kind of guarantee that is almost as strong as maximum entropy can be derived. Surprisingly, a variant model also exhibits the guarantee, and for many cases obtains a higher performance score than the product expansion. This paper introduces the notion of objection-based causal networks which resemble probabilistic causal networks except that they are quantified using objections. An objection is a logical sentence and denotes a condition under which a, causal dependency does not exist. Objection-based causal networks enjoy almost all the properties that make probabilistic causal networks popular, with the added advantage that objections are, arguably more intuitive than probabilities. A new entropy-like measure as well as a new measure of total uncertainty pertaining to the Dempster-Shafer theory are introduced. It is argued that these measures are better justified than any of the previously proposed candidates. We propose a general Bayesian network model for application in a wide class of problems of therapy monitoring. We discuss the use of stochastic simulation as a computational approach to inference on the proposed class of models. As an illustration we present an application to the monitoring of cytotoxic chemotherapy in breast cancer. Useless paths are a chronic problem for marker-passing techniques. We use a probabilistic analysis to justify a method for quickly identifying and rejecting useless paths. Using the same analysis, we identify key conditions and assumptions necessary for marker-passing to perform well. A reason maintenance system which extends an ATMS through Mukaidono's fuzzy logic is described. It supports a problem solver in situations affected by incomplete information and vague data, by allowing nonmonotonic inferences and the revision of previous conclusions when contradictions are detected. This paper outlines a methodology for analyzing the representational support for knowledge-based decision-modeling in a broad domain. A relevant set of inference patterns and knowledge types are identified. By comparing the analysis results to existing representations, some insights are gained into a design approach for integrating categorical and uncertain knowledge in a context sensitive manner. This paper proposes a new method for solving Bayesian decision problems. The method consists of representing a Bayesian decision problem as a valuation-based system and applying a fusion algorithm for solving it. The fusion algorithm is a hybrid of local computational methods for computation of marginals of joint probability distributions and the local computational methods for discrete optimization problems. Irrelevance-based partial MAPs are useful constructs for domain-independent explanation using belief networks. We look at two definitions for such partial MAPs, and prove important properties that are useful in designing algorithms for computing them effectively. We make use of these properties in modifying our standard MAP best-first algorithm, so as to handle irrelevance-based partial MAPs. The relationship between belief networks and relational databases is examined. Based on this analysis, a method to construct belief networks automatically from statistical relational data is proposed. A comparison between our method and other methods shows that our method has several advantages when generalization or prediction is deeded. A general notion of algebraic conditional plausibility measures is defined. Probability measures, ranking functions, possibility measures, and (under the appropriate definitions) sets of probability measures can all be viewed as defining algebraic conditional plausibility measures. It is shown that the technology of Bayesian networks can be applied to algebraic conditional plausibility measures. Preferences among acts are analyzed in the style of L. Savage, but as partially ordered. The rationality postulates considered are weaker than Savage's on three counts. The Sure Thing Principle is derived in this setting. The postulates are shown to lead to a characterization of generalized qualitative probability that includes and blends both traditional qualitative probability and the ranked structures used in logical approaches. This paper studies quantum annealing (QA) for clustering, which can be seen as an extension of simulated annealing (SA). We derive a QA algorithm for clustering and propose an annealing schedule, which is crucial in practice. Experiments show the proposed QA algorithm finds better clustering assignments than SA. Furthermore, QA is as easy as SA to implement. We show how, and under which conditions, the equilibrium states of a first-order Ordinary Differential Equation (ODE) system can be described with a deterministic Structural Causal Model (SCM). Our exposition sheds more light on the concept of causality as expressed within the framework of Structural Causal Models, especially for cyclic models. Cox's well-known theorem justifying the use of probability is shown not to hold in finite domains. The counterexample also suggests that Cox's assumptions are insufficient to prove the result even in infinite domains. The same counterexample is used to disprove a result of Fine on comparative conditional probability. PDDL was originally conceived and constructed as a lingua franca for the International Planning Competition. PDDL2.1 embodies a set of extensions intended to support the expression of something closer to real planning problems. This objective has only been partially achieved, due in large part to a deliberate focus on not moving too far from classical planning models and solution methods. I comment on the PDDL 2.1 language and its use in the planning competition, focusing on the choices made for accommodating time and concurrency. I also discuss some methodological issues that have to do with the move toward more expressive planning languages and the balance needed in planning research between semantics and computation. Attribute weighting and differential weighting, two major mechanisms for computing context-dependent similarity or dissimilarity measures are studied and compared. A dissimilarity measure based on subset size in the context is proposed and its metrization and application are given. It is also shown that while all attribute weighting dissimilarity measures are metrics differential weighting dissimilarity measures are usually non-metric. A similarity network is a tool for constructing belief networks for the diagnosis of a single fault. In this paper, we examine modifications to the similarity-network representation that facilitate the construction of belief networks for the diagnosis of multiple coexisting faults. Incidence Calculus and Dempster-Shafer Theory of Evidence are both theories to describe agents' degrees of belief in propositions, thus being appropriate to represent uncertainty in reasoning systems. This paper presents a straightforward equivalence proof between some special cases of these theories. After a brief introduction to causal probabilistic networks and the HUGIN approach, the problem of conflicting data is discussed. A measure of conflict is defined, and it is used in the medical diagnostic system MUNIN. Finally, it is discussed how to distinguish between conflicting data and a rare case. An efficient algorithm is developed that identifies all independencies implied by the topology of a Bayesian network. Its correctness and maximality stems from the soundness and completeness of d-separation with respect to probability theory. The algorithm runs in time O (l E l) where E is the number of edges in the network. Measures of uncertainty and divergence are introduced for interval-valued probability distributions and are shown to have desirable mathematical properties. A maximum uncertainty inference procedure for marginal interval distributions is presented. A technique for reconstruction of interval distributions from projections is developed based on this inference procedure The most difficult task in probabilistic reasoning may be handling directed cycles in belief networks. To the best knowledge of this author, there is no serious discussion of this problem at all in the literature of probabilistic reasoning so far. We discuss the Dempster-Shafer theory of evidence. We introduce a concept of monotonicity which is related to the diminution of the range between belief and plausibility. We show that the accumulation of knowledge in this framework exhibits a nonmonotonic property. We show how the belief structure can be used to represent typical or commonsense knowledge. This paper demonstrates a method for using belief-network algorithms to solve influence diagram problems. In particular, both exact and approximation belief-network algorithms may be applied to solve influence-diagram problems. More generally, knowing the relationship between belief-network and influence-diagram problems may be useful in the design and development of more efficient influence diagram algorithms. This paper advocates the usefulness of new theories of uncertainty for the purpose of modeling some facets of uncertain knowledge, especially vagueness, in AI. It can be viewed as a partial reply to Cheeseman's (among others) defense of probability. This paper addresses the problem of resolving errors under uncertainty in a rule-based system. A new approach has been developed that reformulates this problem as a neural-network learning problem. The strength and the fundamental limitations of this approach are explored and discussed. The main result is that neural heuristics can be applied to solve some but not all problems in rule-based systems. This paper addresses a prevailing assumption in single-agent heuristic search theory- that problem-solving algorithms should guarantee shortest-path solutions, which are typically called optimal. Optimality implies a metric for judging solution quality, where the optimal solution is the solution with the highest quality. When path-length is the metric, we will distinguish such solutions as p-optimal. Uncertainty enters into human reasoning and inference in at least two ways. It is reasonable to suppose that there will be roles for these distinct uses of uncertainty also in automated reasoning. An approximation method is presented for probabilistic inference with continuous random variables. These problems can arise in many practical problems, in particular where there are "second order" probabilities. The approximation, based on the Gaussian influence diagram, iterates over linear approximations to the inference problem. In pattern analysis, information regarding an object can often be drawn from its surroundings. This paper presents a method for handling uncertainty when using context of symbols and texts for analyzing technical drawings. The method is based on Dempster-Shafer theory and possibility theory. The apparent failure of individual probabilistic expressions to distinguish uncertainty about truths from uncertainty about probabilistic assessments have prompted researchers to seek formalisms where the two types of uncertainties are given notational distinction. This paper demonstrates that the desired distinction is already a built-in feature of classical probabilistic models, thus, specialized notations are unnecessary. This paper discusses an expert system shell that integrates rule-based reasoning and the Dempster-Shafer evidence combination scheme. Domain knowledge is stored as rules with associated belief functions. The reasoning component uses a combination of forward and backward inferencing mechanisms to allow interaction with users in a mixed-initiative format. An evidential reasoning mechanism based on the Dempster-Shafer theory of evidence is introduced. Its performance in real-world image analysis is compared with other mechanisms based on the Bayesian formalism and a simple weight combination method. In this paper, we present some results of evidential reasoning in understanding multispectral images of remote sensing systems. The Dempster-Shafer approach of combination of evidences is pursued to yield contextual classification results, which are compared with previous results of the Bayesian context free classification, contextual classifications of dynamic programming and stochastic relaxation approaches. An automated explanation facility for Bayesian conditioning aimed at improving user acceptance of probability-based decision support systems has been developed. The domain-independent facility is based on an information processing perspective on reasoning about conditional evidence that accounts both for biased and normative inferences. Experimental results indicate that the facility is both acceptable to naive users and effective in improving understanding. The generalized fault diagram, a data structure for failure analysis based on the influence diagram, is defined. Unlike the fault tree, this structure allows for dependence among the basic events and replicated logical elements. A heuristic procedure is developed for efficient processing of these structures. A learning algorithm is presented which given the structure of a causal tree, will estimate its link probabilities by sequential measurements on the leaves only. Internal nodes of the tree represent conceptual (hidden) variables inaccessible to observation. The method described is incremental, local, efficient, and remains robust to measurement imprecisions. Linear representations for a subclass of boolean symmetric functions selected by a parity condition are shown to constitute a generalization of the linear constraints on probabilities introduced by Boole. These linear constraints are necessary to compute probabilities of events with relations between the. arbitrarily specified with propositional calculus boolean formulas. Bayesian networks provide a probabilistic semantics for qualitative assertions about likelihood. A qualitative reasoner based on an algebra over these assertions can derive further conclusions about the influence of actions. While the conclusions are much weaker than those computed from complete probability distributions, they are still valuable for suggesting potential actions, eliminating obviously inferior plans, identifying important tradeoffs, and explaining probabilistic models. In many cases commonsense knowledge consists of knowledge of what is usual. In this paper we develop a system for reasoning with usual information. This system is based upon the fact that these pieces of commonsense information involve both a probabilistic aspect and a granular aspect. We implement this system with the aid of possibility-probability granules. In the current versions of the Dempster-Shafer theory, the only essential restriction on the validity of the rule of combination is that the sources of evidence must be statistically independent. Under this assumption, it is permissible to apply the Dempster-Shafer rule to two or mere distinct probability distributions. The paper demonstrates that strict adherence to probability theory does not preclude the use of concurrent, self-activated constraint-propagation mechanisms for managing uncertainty. Maintaining local records of sources-of-belief allows both predictive and diagnostic inferences to be activated simultaneously and propagate harmoniously towards a stable equilibrium. General problems in analyzing information in a probabilistic database are considered. The practical difficulties (and occasional advantages) of storing uncertain data, of using it conventional forward- or backward-chaining inference engines, and of working with a probabilistic version of resolution are discussed. The background for this paper is the incorporation of uncertain reasoning facilities in MRS, a general-purpose expert system building tool. Acyclic directed mixed graphs, also known as semi-Markov models represent the conditional independence structure induced on an observed margin by a DAG model with latent variables. In this paper we present a factorization criterion for these models that is equivalent to the global Markov property given by (the natural extension of) d-separation. Non-obviousness or inventive step is a general requirement for patentability in most patent law systems. An invention should be at an adequate distance beyond its prior art in order to be patented. This short paper provides an overview on a methodology proposed for legal norm validation of FSTP facts using rule reasoning approach. We present a method to prove the decidability of provability in several well-known inference systems. This method generalizes both cut-elimination and the construction of an automaton recognizing the provable propositions. Quantum decision systems are being increasingly considered for use in artificial intelligence applications. Classical and quantum nodes can be distinguished based on certain correlations in their states. This paper investigates some properties of the states obtained in a decision tree structure. How these correlations may be mapped to the decision tree is considered. Classical tree representations and approximations to quantum states are provided. Current measures of machine intelligence are either difficult to evaluate or lack the ability to test a robot's problem-solving capacity in open worlds. We propose a novel evaluation framework based on the formal notion of MacGyver Test which provides a practical way for assessing the resilience and resourcefulness of artificial agents. We propose the creation of a systematic effort to identify and replicate key findings in neuroscience and allied fields related to understanding human values. Our aim is to ensure that research underpinning the value alignment problem of artificial intelligence has been sufficiently validated to play a role in the design of AI systems. Research on integrated neural-symbolic systems has made significant progress in the recent past. In particular the understanding of ways to deal with symbolic knowledge within connectionist systems (also called artificial neural networks) has reached a critical mass which enables the community to strive for applicable implementations and use cases. Recent work has covered a great variety of logics used in artificial intelligence and provides a multitude of techniques for dealing with them within the context of artificial neural networks. We present a comprehensive survey of the field of neural-symbolic integration, including a new classification of system according to their architectures and abilities. The technological singularity refers to a hypothetical scenario in which technological advances virtually explode. The most popular scenario is the creation of super-intelligent algorithms that recursively create ever higher intelligences. It took many decades for these ideas to spread from science fiction to popular science magazines and finally to attract the attention of serious philosophers. David Chalmers' (JCS 2010) article is the first comprehensive philosophical analysis of the singularity in a respected philosophy journal. The motivation of my article is to augment Chalmers' and to discuss some issues not addressed by him, in particular what it could mean for intelligence to explode. In this course, I will (have to) provide a more careful treatment of what intelligence actually is, separate speed from intelligence explosion, compare what super-intelligent participants and classical human observers might experience and do, discuss immediate implications for the diversity and value of life, consider possible bounds on intelligence, and contemplate intelligences right at the singularity. This paper presents a non-manual design engineering method based on heuristic search algorithm to search for candidate agents in the solution space which formed by artificial intelligence agents modeled on the base of bionics.Compared with the artificial design method represented by meta-learning and the bionics method represented by the neural architecture chip,this method is more feasible for realizing artificial general intelligence,and it has a much better interaction with cognitive neuroscience;at the same time,the engineering method is based on the theoretical hypothesis that the final learning algorithm is stable in certain scenarios,and has generalization ability in various scenarios.The paper discusses the theory preliminarily and proposes the possible correlation between the theory and the fixed-point theorem in the field of mathematics.Limited by the author's knowledge level,this correlation is proposed only as a kind of conjecture. Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameterless theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline for a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning, how the AIXI model can formally solve them. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI-tl, which is still effectively more intelligent than any other time t and space l bounded agent. The computation time of AIXI-tl is of the order tx2^l. Other discussed topics are formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches. Computer-supported learning is an increasingly important form of study since it allows for independent learning and individualized instruction. In this paper, we discuss a novel approach to developing an intelligent tutoring system for teaching textbook-style mathematical proofs. We characterize the particularities of the domain and discuss common ITS design models. Our approach is motivated by phenomena found in a corpus of tutorial dialogs that were collected in a Wizard-of-Oz experiment. We show how an intelligent tutor for textbook-style mathematical proofs can be built on top of an adapted assertion-level proof assistant by reusing representations and proof search strategies originally developed for automated and interactive theorem proving. The resulting prototype was successfully evaluated on a corpus of tutorial dialogs and yields good results. The main prospective aim of modern research related to Artificial Intelligence is the creation of technical systems that implement the idea of Strong Intelligence. According our point of view the path to the development of such systems comes through the research in the field related to perceptions. Here we formulate the model of the perception of external world which may be used for the description of perceptual activity of intelligent beings. We consider a number of issues related to the development of the set of patterns which will be used by the intelligent system when interacting with environment. The key idea of the presented perception model is the idea of subjective reality. The principle of the relativity of perceived world is formulated. It is shown that this principle is the immediate consequence of the idea of subjective reality. In this paper we show how the methodology of subjective reality may be used for the creation of different types of Strong AI systems. It has been commonly argued, on the basis of Goedel's theorem and related mathematical results, that true artificial intelligence cannot exist. Penrose has further deduced from the existence of human intelligence that fundamental changes in physical theories are needed. I provide an elementary demonstration that these deductions are mistaken. The theory of computational complexity is used to underpin a recent model of neocortical sensory processing. We argue that encoding into reconstruction networks is appealing for communicating agents using Hebbian learning and working on hard combinatorial problems, which are easy to verify. Computational definition of the concept of intelligence is provided. Simulations illustrate the idea. This paper presents an intelligent tutoring system, GeoTutor, for Euclidean Geometry that is automatically able to synthesize proof problems and their respective solutions given a geometric figure together with a set of properties true of it. GeoTutor can provide personalized practice problems that address student deficiencies in the subject matter. In this paper, we will expound upon the concepts proffered in [1], where we proposed an information theoretic approach to intelligence in the computational sense. We will examine data and meme aggregation, and study the effect of limited resources on the resulting meme amplitudes. IUIs aim to incorporate intelligent automated capabilities in human computer interaction, where the net impact is a human-computer interaction that improves performance or usability in critical ways. It also involves designing and implementing an artificial intelligence (AI) component that effectively leverages human skills and capabilities, so that human performance with an application excels. IUIs embody capabilities that have traditionally been associated more strongly with humans than with computers: how to perceive, interpret, learn, use language, reason, plan, and decide. The concept of "task" is at the core of artificial intelligence (AI): Tasks are used for training and evaluating AI systems, which are built in order to perform and automatize tasks we deem useful. In other fields of engineering theoretical foundations allow thorough evaluation of designs by methodical manipulation of well understood parameters with a known role and importance; this allows an aeronautics engineer, for instance, to systematically assess the effects of wind speed on an airplane's performance and stability. No framework exists in AI that allows this kind of methodical manipulation: Performance results on the few tasks in current use (cf. board games, question-answering) cannot be easily compared, however similar or different. The issue is even more acute with respect to artificial *general* intelligence systems, which must handle unanticipated tasks whose specifics cannot be known beforehand. A *task theory* would enable addressing tasks at the *class* level, bypassing their specifics, providing the appropriate formalization and classification of tasks, environments, and their parameters, resulting in more rigorous ways of measuring, comparing, and evaluating intelligent behavior. Even modest improvements in this direction would surpass the current ad-hoc nature of machine learning and AI evaluation. Here we discuss the main elements of the argument for a task theory and present an outline of what it might look like for physical tasks. This is the second part of a paper on Conscious Intelligent Systems. We use the understanding gained in the first part (Conscious Intelligent Systems Part 1: IXI (arxiv id cs.AI/0612056)) to look at understanding. We see how the presence of mind affects understanding and intelligent systems; we see that the presence of mind necessitates language. The rise of language in turn has important effects on understanding. We discuss the humanoid question and how the question of self-consciousness (and by association mind/thought/language) would affect humanoids too. We propose that operator induction serves as an adequate model of perception. We explain how to reduce universal agent models to operator induction. We propose a universal measure of operator induction fitness, and show how it can be used in a reinforcement learning model and a homeostasis (self-preserving) agent based on the free energy principle. We show that the action of the homeostasis agent can be explained by the operator induction model. Computational Intelligence (CI) is a sub-branch of Artificial Intelligence paradigm focusing on the study of adaptive mechanisms to enable or facilitate intelligent behavior in complex and changing environments. There are several paradigms of CI [like artificial neural networks, evolutionary computations, swarm intelligence, artificial immune systems, fuzzy systems and many others], each of these has its origins in biological systems [biological neural systems, natural Darwinian evolution, social behavior, immune system, interactions of organisms with their environment]. Most of those paradigms evolved into separate machine learning (ML) techniques, where probabilistic methods are used complementary with CI techniques in order to effectively combine elements of learning, adaptation, evolution and Fuzzy logic to create heuristic algorithms that are, in some sense, intelligent. The current trend is to develop consensus techniques, since no single machine learning algorithms is superior to others in all possible situations. In order to overcome this problem several meta-approaches were proposed in ML focusing on the integration of results from different methods into single prediction. We discuss here the Landau theory for the nonlinear equation that can describe the adaptive integration of information acquired from an ensemble of independent learning agents. The influence of each individual agent on other learners is described similarly to the social impact theory. The final decision outcome for the consensus system is calculated using majority rule in the stationary limit, yet the minority solutions can survive inside the majority population as the complex intermittent clusters of opposite opinion. This paper describes a novel method for building affectively intelligent human-interactive agents. The method is based on a key sociological insight that has been developed and extensively verified over the last twenty years, but has yet to make an impact in artificial intelligence. The insight is that resource bounded humans will, by default, act to maintain affective consistency. Humans have culturally shared fundamental affective sentiments about identities, behaviours, and objects, and they act so that the transient affective sentiments created during interactions confirm the fundamental sentiments. Humans seek and create situations that confirm or are consistent with, and avoid and supress situations that disconfirm or are inconsistent with, their culturally shared affective sentiments. This "affect control principle" has been shown to be a powerful predictor of human behaviour. In this paper, we present a probabilistic and decision-theoretic generalisation of this principle, and we demonstrate how it can be leveraged to build affectively intelligent artificial agents. The new model, called BayesAct, can maintain multiple hypotheses about sentiments simultaneously as a probability distribution, and can make use of an explicit utility function to make value-directed action choices. This allows the model to generate affectively intelligent interactions with people by learning about their identity, predicting their behaviours using the affect control principle, and taking actions that are simultaneously goal-directed and affect-sensitive. We demonstrate this generalisation with a set of simulations. We then show how our model can be used as an emotional "plug-in" for artificially intelligent systems that interact with humans in two different settings: an exam practice assistant (tutor) and an assistive device for persons with a cognitive disability. Web intelligence can be considered as a subset of Artificial Intelligence. It uses existing data in web to produce new data, knowledge and wisdom to support decision making and new predictions for web users. Artificial Intelligence is ever changing and evolving field of computer science and it is extensively used in wide array of web based business applications. Although it is used substantially in web based systems in developed countries, it is not examined whether it is being substantially used in Sri Lanka. Every Sri Lankan citizen depends on Public Service more or less throughout his/ her life time and at least more than 3 times: at birth, marriage and death. So providing most of these services to its citizen, Sri Lankan Government uses more or less of its country web portal. This paper presents a model to evaluate web intelligence capability based on weight to key functionalities with respect to web intelligence. The government websites were checked by the proposed criteria to show the potential of using web intelligent technology to provide website based services. The result indicates that the use of web intelligence techniques openly and publicly to provide web based services through government web portal to its citizens is not satisfactory. It also indicates that lack of using the technologies pertaining to web intelligence in the public service web hinders the most of the advantages that citizen and government can gain from such technological involvement. The recognition of optical characters is known to be one of the earliest applications of Artificial Neural Networks, which partially emulate human thinking in the domain of artificial intelligence. In this paper, a simplified neural approach to recognition of optical or visual characters is portrayed and discussed. The document is expected to serve as a resource for learners and amateur investigators in pattern recognition, neural networking and related disciplines. Memory refinements are designed below to detect those sequences of actions that have been repeated a given number n. Subsequently such sequences are permitted to run without CPU involvement. This mimics human learning. Actions are rehearsed and once learned, they are performed automatically without conscious involvement. The study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. In a companion paper (Friedman & Halpern, 1997), we introduce a new framework to model belief change. This framework combines temporal and epistemic modalities with a notion of plausibility, allowing us to examine the change of beliefs over time. In this paper, we show how belief revision and belief update can be captured in our framework. This allows us to compare the assumptions made by each method, and to better understand the principles underlying them. In particular, it shows that Katsuno and Mendelzon's notion of belief update (Katsuno & Mendelzon, 1991a) depends on several strong assumptions that may limit its applicability in artificial intelligence. Finally, our analysis allow us to identify a notion of minimal change that underlies a broad range of belief change operations including revision and update. It is shown that Darwiche and Pearl's postulates imply an interesting property, not noticed by the authors. The following four classes of computational problems are equivalent: solving matrix games, solving linear programs, best $l^{\infty}$ linear approximation, best $l^1$ linear approximation. Open Source Software (OSS) often relies on large repositories, like SourceForge, for initial incubation. The OSS repositories offer a large variety of meta-data providing interesting information about projects and their success. In this paper we propose a data mining approach for training classifiers on the OSS meta-data provided by such data repositories. The classifiers learn to predict the successful continuation of an OSS project. The `successfulness' of projects is defined in terms of the classifier confidence with which it predicts that they could be ported in popular OSS projects (such as FreeBSD, Gentoo Portage). The report gives an overview about activities on the topic Semantic Web. It has been released as technical report for the project "KTweb -- Connecting Knowledge Technologies Communities" in 2003. This manuscripts contains the proofs for "A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction". Product take-back legislation forces manufacturers to bear the costs of collection and disposal of products that have reached the end of their useful lives. In order to reduce these costs, manufacturers can consider reuse, remanufacturing and/or recycling of components as an alternative to disposal. The implementation of such alternatives usually requires an appropriate reverse supply chain management. With the concepts of reverse supply chain are gaining popularity in practice, the use of artificial intelligence approaches in these areas is also becoming popular. As a result, the purpose of this paper is to give an overview of the recent publications concerning the application of artificial intelligence techniques to reverse supply chain with emphasis on certain types of product returns. Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation. Our hypothesis is that by equipping certain agents in a multi-agent system controlling an intelligent building with automated decision support, two important factors will be increased. The first is energy saving in the building. The second is customer value---how the people in the building experience the effects of the actions of the agents. We give evidence for the truth of this hypothesis through experimental findings related to tools for artificial decision making. A number of assumptions related to agent control, through monitoring and delegation of tasks to other kinds of agents, of rooms at a test site are relaxed. Each assumption controls at least one uncertainty that complicates considerably the procedures for selecting actions part of each such agent. We show that in realistic decision situations, room-controlling agents can make bounded rational decisions even under dynamic real-time constraints. This result can be, and has been, generalized to other domains with even harsher time constraints. The chinese room problem asks if computers can think; I ask here if most humans can. Machines are possible to have some artificial intelligence like human beings owing to particular algorithms or software. Such machines could learn knowledge from what people taught them and do works according to the knowledge. In practical learning cases, the data is often extremely complicated and large, thus classical learning machines often need huge computational resources. Quantum machine learning algorithm, on the other hand, could be exponentially faster than classical machines using quantum parallelism. Here, we demonstrate a quantum machine learning algorithm on a four-qubit NMR test bench to solve an optical character recognition problem, also known as the handwriting recognition. The quantum machine learns standard character fonts and then recognize handwritten characters from a set with two candidates. To our best knowledge, this is the first artificial intelligence realized on a quantum processor. Due to the widespreading importance of artificial intelligence and its tremendous consuming of computational resources, quantum speedup would be extremely attractive against the challenges from the Big Data. Outline of several strategies for using Gaussian processes as surrogate models for the covariance matrix adaptation evolution strategy (CMA-ES). We propose a funny representation of SAT. While the primary interest is to present propositional satisfiability in a playful way for pedagogical purposes, it could also inspire new search heuristics. This essay explores the limits of Turing machines concerning the modeling of minds and suggests alternatives to go beyond those limits. The question how an agent is affected by its embodiment has attracted growing attention in recent years. A new field of artificial intelligence has emerged, which is based on the idea that intelligence cannot be understood without taking into account embodiment. We believe that a formal approach to quantifying the embodiment's effect on the agent's behaviour is beneficial to the fields of artificial life and artificial intelligence. The contribution of an agent's body and environment to its behaviour is also known as morphological computation. Therefore, in this work, we propose a quantification of morphological computation, which is based on an information decomposition of the sensorimotor loop into shared, unique and synergistic information. In numerical simulation based on a formal representation of the sensorimotor loop, we show that the unique information of the body and environment is a good measure for morphological computation. The results are compared to our previously derived quantification of morphological computation. Massive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential. Cybersecurity research involves publishing papers about malicious exploits as much as publishing information on how to design tools to protect cyber-infrastructure. It is this information exchange between ethical hackers and security experts, which results in a well-balanced cyber-ecosystem. In the blooming domain of AI Safety Engineering, hundreds of papers have been published on different proposals geared at the creation of a safe machine, yet nothing, to our knowledge, has been published on how to design a malevolent machine. Availability of such information would be of great value particularly to computer scientists, mathematicians, and others who have an interest in AI safety, and who are attempting to avoid the spontaneous emergence or the deliberate creation of a dangerous AI, which can negatively affect human activities and in the worst case cause the complete obliteration of the human species. This paper provides some general guidelines for the creation of a Malevolent Artificial Intelligence (MAI). We use the theory of defaults and their meaning of [GS16] to develop (the outline of a) new theory of argumentation. Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated with statistical significance professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce more difficult to exploit strategies than prior approaches. In this article, we introduce a new conception of a family of esport games called Samu Entropy to try to improve dataflow program graphs like the ones that are based on Google's TensorFlow. Currently, the Samu Entropy project specifies only requirements for new esport games to be developed with particular attention to the investigation of the relationship between esport and artificial intelligence. It is quite obvious that there is a very close and natural relationship between esport games and artificial intelligence. Furthermore, the project Samu Entropy focuses not only on using artificial intelligence, but on creating AI in a new way. We present a reference game called Face Battle that implements the Samu Entropy requirements. With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks. Artificial Intelligence is a central topic in the computer science curriculum. From the year 2011 a project-based learning methodology based on computer games has been designed and implemented into the intelligence artificial course at the University of the Bio-Bio. The project aims to develop software-controlled agents (bots) which are programmed by using heuristic algorithms seen during the course. This methodology allows us to obtain good learning results, however several challenges have been founded during its implementation. In this paper we show how linguistic descriptions of data can help to provide students and teachers with technical and personalized feedback about the learned algorithms. Algorithm behavior profile and a new Turing test for computer games bots based on linguistic modelling of complex phenomena are also proposed in order to deal with such challenges. In order to show and explore the possibilities of this new technology, a web platform has been designed and implemented by one of authors and its incorporation in the process of assessment allows us to improve the teaching learning process. The concept of innateness is rarely discussed in the context of artificial intelligence. When it is discussed, or hinted at, it is often the context of trying to reduce the amount of innate machinery in a given system. In this paper, I consider as a test case a recent series of papers by Silver et al (Silver et al., 2017a) on AlphaGo and its successors that have been presented as an argument that a "even in the most challenging of domains: it is possible to train to superhuman level, without human examples or guidance", "starting tabula rasa." I argue that these claims are overstated, for multiple reasons. I close by arguing that artificial intelligence needs greater attention to innateness, and I point to some proposals about what that innateness might look like. It is undeniable that artificial intelligence (AI) and blockchain concepts are spreading at a phenomenal rate. Both technologies have distinct degree of technological complexity and multi-dimensional business implications. However, a common misunderstanding about blockchain concept, in particular, is that blockchain is decentralized and is not controlled by anyone. But the underlying development of a blockchain system is still attributed to a cluster of core developers. Take smart contract as an example, it is essentially a collection of codes (or functions) and data (or states) that are programmed and deployed on a blockchain (say, Ethereum) by different human programmers. It is thus, unfortunately, less likely to be free of loopholes and flaws. In this article, through a brief overview about how artificial intelligence could be used to deliver bug-free smart contract so as to achieve the goal of blockchain 2.0, we to emphasize that the blockchain implementation can be assisted or enhanced via various AI techniques. The alliance of AI and blockchain is expected to create numerous possibilities. The use of artificial intelligence intelligencein medicine can be traced back to 1968 when Paycha published his paper Le diagnostic a l'aide d'intelligences artificielle, presentation de la premiere machine diagnostri. Few years later Shortliffe et al. presented an expert system named Mycin which was able to identify bacteria causing severe blood infections and to recommend antibiotics. Despite the fact that Mycin outperformed members of the Stanford medical school in the reliability of diagnosis it was never used in practice due to a legal issue who do you sue if it gives a wrong diagnosis?. However only in 2016 when the artificial intelligence software built into the IBM Watson AI platform correctly diagnosed and proposed an effective treatment for a 60-year-old womans rare form of leukemia the AI use in medicine become really popular.On of first papers presenting the use of AI in paediatrics was published in 1984. The paper introduced a computer-assisted medical decision making system called SHELP. The AGINAO is a project to create a human-level artificial general intelligence system (HL AGI) embodied in the Aldebaran Robotics' NAO humanoid robot. The dynamical and open-ended cognitive engine of the robot is represented by an embedded and multi-threaded control program, that is self-crafted rather than hand-crafted, and is executed on a simulated Universal Turing Machine (UTM). The actual structure of the cognitive engine emerges as a result of placing the robot in a natural preschool-like environment and running a core start-up system that executes self-programming of the cognitive layer on top of the core layer. The data from the robot's sensory devices supplies the training samples for the machine learning methods, while the commands sent to actuators enable testing hypotheses and getting a feedback. The individual self-created subroutines are supposed to reflect the patterns and concepts of the real world, while the overall program structure reflects the spatial and temporal hierarchy of the world dependencies. This paper focuses on the details of the self-programming approach, limiting the discussion of the applied cognitive architecture to a necessary minimum. This text introduces the twin deadlocks of strong artificial life. Conceptualization of life is a deadlock both because of the existence of a continuum between the inert and the living, and because we only know one instance of life. Computationalism is a second deadlock since it remains a matter of faith. Nevertheless, artificial life realizations quickly progress and recent constructions embed an always growing set of the intuitive properties of life. This growing gap between theory and realizations should sooner or later crystallize in some kind of paradigm shift and then give clues to break the twin deadlocks. Artificial Chemistries (ACs) are symbolic chemical metaphors for the exploration of Artificial Life, with specific focus on the problem of biogenesis or the origin of life. This paper presents authors thoughts towards defining a unified framework to characterize and classify symbolic artificial chemistries by devising appropriate formalism to capture semantic and organizational information. We identify three basic high level abstractions in initial proposal for this framework viz., information, computation, and communication. We present an analysis of two important notions of information, namely, Shannon's Entropy and Algorithmic Information, and discuss inductive and deductive approaches for defining the framework. Innate immunity now occupies a central role in immunology. However, artificial immune system models have largely been inspired by adaptive not innate immunity. This paper reviews the biological principles and properties of innate immunity and, adopting a conceptual framework, asks how these can be incorporated into artificial models. The aim is to outline a meta-framework for models of innate immunity. We present chemlambda (or the chemical concrete machine), an artificial chemistry with the following properties: (a) is Turing complete, (b) has a model of decentralized, distributed computing associated to it, (c) works at the level of individual (artificial) molecules, subject of reversible, but otherwise deterministic interactions with a small number of enzymes, (d) encodes information in the geometrical structure of the molecules and not in their numbers, (e) all interactions are purely local in space and time. This is part of a larger project to create computing, artificial chemistry and artificial life in a distributed context, using topological and graphical languages. If you are an artificial intelligence researcher, you should look to video games as ideal testbeds for the work you do. If you are a video game developer, you should look to AI for the technology that makes completely new types of games possible. This chapter lays out the case for both of these propositions. It asks the question "what can video games do for AI", and discusses how in particular general video game playing is the ideal testbed for artificial general intelligence research. It then asks the question "what can AI do for video games", and lays out a vision for what video games might look like if we had significantly more advanced AI at our disposal. The chapter is based on my keynote at IJCCI 2015, and is written in an attempt to be accessible to a broad audience. The objective of this paper is to introduce an artificial intelligence based optimization approach, which is inspired from Piagets theory on cognitive development. The approach has been designed according to essential processes that an individual may experience while learning something new or improving his / her knowledge. These processes are associated with the Piagets ideas on an individuals cognitive development. The approach expressed in this paper is a simple algorithm employing swarm intelligence oriented tasks in order to overcome single-objective optimization problems. For evaluating effectiveness of this early version of the algorithm, test operations have been done via some benchmark functions. The obtained results show that the approach / algorithm can be an alternative to the literature in terms of single-objective optimization. The authors have suggested the name: Cognitive Development Optimization Algorithm (CoDOA) for the related intelligent optimization approach. A recent issue of a popular computing journal asked which laws would apply if a self-driving car killed a pedestrian. This paper considers the question of legal liability for artificially intelligent computer systems. It discusses whether criminal liability could ever apply; to whom it might apply; and, under civil law, whether an AI program is a product that is subject to product design legislation or a service to which the tort of negligence applies. The issue of sales warranties is also considered. A discussion of some of the practical limitations that AI systems are subject to is also included. Learning and reasoning are both aspects of what is considered to be intelligence. Their studies within AI have been separated historically, learning being the topic of machine learning and neural networks, and reasoning falling under classical (or symbolic) AI. However, learning and reasoning are in many ways interdependent. This paper discusses the nature of some of these interdependencies and proposes a general framework called FLARE, that combines inductive learning using prior knowledge together with reasoning in a propositional setting. Several examples that test the framework are presented, including classical induction, many important reasoning protocols and two simple expert systems. Inspired by Hofstadter's Coffee-House Conversation (1982) and by the science fiction short story SAM by Schattschneider (1988), we propose and discuss criteria for non-mechanical intelligence. Firstly, we emphasize the practical need for such tests in view of massively multiuser online role-playing games (MMORPGs) and virtual reality systems like Second Life. Secondly, we demonstrate Second Life as a useful framework for implementing (some iterations of) that test. Autonomous intelligent agent research is a domain situated at the forefront of artificial intelligence. Interest-based negotiation (IBN) is a form of negotiation in which agents exchange information about their underlying goals, with a view to improve the likelihood and quality of a offer. In this paper we model and verify a multi-agent argumentation scenario of resource sharing mechanism to enable resource sharing in a distributed system. We use IBN in our model wherein agents express their interests to the others in the society to gain certain resources. Most successful Bayesian network (BN) applications to datehave been built through knowledge elicitation from experts.This is difficult and time consuming, which has lead to recentinterest in automated methods for learning BNs from data. We present a case study in the construction of a BN in anintelligent tutoring application, specifically decimal misconceptions. Wedescribe the BN construction using expert elicitation and then investigate how certainexisting automated knowledge discovery methods might support the BN knowledge engineering process. We propose a formal framework for intelligent systems which can reason about scientific domains, in particular about the carcinogenicity of chemicals, and we study its properties. Our framework is grounded in a philosophy of scientific enquiry and discourse, and uses a model of dialectical argumentation. The formalism enables representation of scientific uncertainty and conflict in a manner suitable for qualitative reasoning about the domain. This paper describes a domain-specific knowledge acquisition tool for intelligent automated troubleshooters based on Bayesian networks. No Bayesian network knowledge is required to use the tool, and troubleshooting information can be specified as natural and intuitive as possible. Probabilities can be specified in the direction that is most natural to the domain expert. Thus, the knowledge acquisition efficiently removes the traditional knowledge acquisition bottleneck of Bayesian networks. Like any large system development effort, the construction of a complex belief network model requires systems engineering to manage the design and construction process. We propose a rapid prototyping approach to network engineering. We describe criteria for identifying network modules and the use of "stubs" to represent not-yet-constructed modules. We propose an object oriented representation for belief networks which captures the semantics of the problem in addition to conditional independencies and probabilities. Methods for evaluating complex belief network models are discussed. The ideas are illustrated with examples from a large belief network construction problem in the military intelligence domain. This note is concerned with a formal analysis of the problem of non-monotonic reasoning in intelligent systems, especially when the uncertainty is taken into account in a quantitative way. A firm connection between logic and probability is established by introducing conditioning notions by means of formal structures that do not rely on quantitative measures. The associated conditional logic, compatible with conditional probability evaluations, is non-monotonic relative to additional evidence. Computational aspects of conditional probability logic are mentioned. The importance of this development lies on its role to provide a conceptual basis for various forms of evidence combination and on its significance to unify multi-valued and non-monotonic logics This is a preliminary version of visual interpretation integrating multiple sensors in SUCCESSOR, an intelligent, model-based vision system. We pursue a thorough integration of hierarchical Bayesian inference with comprehensive physical representation of objects and their relations in a system for reasoning with geometry, surface materials and sensor models in machine vision. Bayesian inference provides a framework for accruing_ probabilities to rank order hypotheses. Current approaches to expert systems' reasoning under uncertainty fail to capture the iterative revision process characteristic of intelligent human reasoning. This paper reports on a system, called the Non-monotonic Probabilist, or NMP (Cohen, et al., 1985). When its inferences result in substantial conflict, NMP examines and revises the assumptions underlying the inferences until conflict is reduced to acceptable levels. NMP has been implemented in a demonstration computer-based system, described below, which supports threat correlation and in-flight route replanning by Air Force pilots. Does the energy requirements for the human brain give energy constraints that give reason to doubt the feasibility of artificial intelligence? This report will review some relevant estimates of brain bioenergetics and analyze some of the methods of estimating brain emulation energy requirements. Turning to AI, there are reasons to believe the energy requirements for de novo AI to have little correlation with brain (emulation) energy requirements since cost could depend merely of the cost of processing higher-level representations rather than billions of neural firings. Unless one thinks the human way of thinking is the most optimal or most easily implementable way of achieving software intelligence, we should expect de novo AI to make use of different, potentially very compressed and fast, processes. In this paper artificial neural networks and support vector machines are used to reduce the amount of vibration data that is required to estimate the Time Domain Average of a gear vibration signal. Two models for estimating the time domain average of a gear vibration signal are proposed. The models are tested on data from an accelerated gear life test rig. Experimental results indicate that the required data for calculating the Time Domain Average of a gear vibration signal can be reduced by up to 75% when the proposed models are implemented. In this paper, we present a heuristic operator which aims at simultaneously optimizing the orientations of all the edges in an intermediate Bayesian network structure during the search process. This is done by alternating between the space of directed acyclic graphs (DAGs) and the space of skeletons. The found orientations of the edges are based on a scoring function rather than on induced conditional independences. This operator can be used as an extension to commonly employed search strategies. It is evaluated in experiments with artificial and real-world data. The study of arguments as abstract entities and their interaction as introduced by Dung (Artificial Intelligence 177, 1995) has become one of the most active research branches within Artificial Intelligence and Reasoning. A main issue for abstract argumentation systems is the selection of acceptable sets of arguments. Value-based argumentation, as introduced by Bench-Capon (J. Logic Comput. 13, 2003), extends Dung's framework. It takes into account the relative strength of arguments with respect to some ranking representing an audience: an argument is subjectively accepted if it is accepted with respect to some audience, it is objectively accepted if it is accepted with respect to all audiences. Deciding whether an argument is subjectively or objectively accepted, respectively, are computationally intractable problems. In fact, the problems remain intractable under structural restrictions that render the main computational problems for non-value-based argumentation systems tractable. In this paper we identify nontrivial classes of value-based argumentation systems for which the acceptance problems are polynomial-time tractable. The classes are defined by means of structural restrictions in terms of the underlying graphical structure of the value-based system. Furthermore we show that the acceptance problems are intractable for two classes of value-based systems that where conjectured to be tractable by Dunne (Artificial Intelligence 171, 2007). The main topic discussed in this paper is how to use intelligence for biometric decision defuzzification. A neural training model is proposed and tested here as a possible solution for dealing with natural fuzzification that appears between the intra- and inter-class distribution of scores computed during iris recognition tests. It is shown here that the use of proposed neural network support leads to an improvement in the artificial perception of the separation between the intra- and inter-class score distributions by moving them away from each other. This article provides a brief introduction to the "Theory of Intelligence" and its realisation in the "SP Computer Model". The overall goal of the SP programme of research, in accordance with long-established principles in science, has been the simplification and integration of observations and concepts across artificial intelligence, mainstream computing, mathematics, and human learning, perception, and cognition. In broad terms, the SP system is a brain-like system that takes in "New" information through its senses and stores some or all of it as "Old" information. A central idea in the system is the powerful concept of "SP-multiple-alignment", borrowed and adapted from bioinformatics. This the key to the system's versatility in aspects of intelligence, in the representation of diverse kinds of knowledge, and in the seamless integration of diverse aspects of intelligence and diverse kinds of knowledge, in any combination. There are many potential benefits and applications of the SP system. It is envisaged that the system will be developed as the "SP Machine", which will initially be a software virtual machine, hosted on a high-performance computer, a vehicle for further research and a step towards the development of an industrial-strength SP Machine. In this paper we demonstrate that it is possible to manage intelligence in constant time as a pre-process to information fusion through a series of processes dealing with issues such as clustering reports, ranking reports with respect to importance, extraction of prototypes from clusters and immediate classification of newly arriving intelligence reports. These methods are used when intelligence reports arrive which concerns different events which should be handled independently, when it is not known a priori to which event each intelligence report is related. We use clustering that runs as a back-end process to partition the intelligence into subsets representing the events, and in parallel, a fast classification that runs as a front-end process in order to put the newly arriving intelligence into its correct information fusion process. We present an alternative methodology for the analysis of algorithms, based on the concept of expected discounted reward. This methodology naturally handles algorithms that do not always terminate, so it can (theoretically) be used with partial algorithms for undecidable problems, such as those found in artificial general intelligence (AGI) and automated theorem proving. We mention an approach to self-improving AGI enabled by this methodology. Aug 2017 addendum: This article was originally written with multiple audiences in mind. It is really best put in the following terms. Goertzel, Hutter, Legg, and others have developed a definition of an intelligence score for a general abstract agent: expected lifetime reward in a random environment. AIXI is generally the optimal agent according to this score, but there may be reasons to analyze other agents and compare score values. If we want to use this definition of intelligence in practice, perhaps we can start by analyzing some simple agents. Common algorithms can be thought of as simple agents (environment is input, reward is based on running time) so we take the goal of applying the agent intelligence score to algorithms. That is, we want to find, what are the IQ scores of algorithms? We can do some very simple analysis, but the real answer is that even for simple algorithms, the intelligence score is too difficult to work with in practice. The current work addresses a virtual environment with self-replicating agents whose decisions are based on a form of "somatic computation" (soma - body) in which basic emotional responses, taken in parallelism to actual living organisms, are introduced as a way to provide the agents with greater reflexive abilities. The work provides a contribution to the field of Artificial Intelligence (AI) and Artificial Life (ALife) in connection to a neurobiology-based cognitive framework for artificial systems and virtual environments' simulations. The performance of the agents capable of emotional responses is compared with that of self-replicating automata, and the implications of research on emotions and AI, in connection to both virtual agents as well as robots, is addressed regarding possible future directions and applications. The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here. This paper illustrates how a Prolog program, using chronological backtracking to find a solution in some search space, can be enhanced to perform intelligent backtracking. The enhancement crucially relies on the impurity of Prolog that allows a program to store information when a dead end is reached. To illustrate the technique, a simple search program is enhanced. To appear in Theory and Practice of Logic Programming. Keywords: intelligent backtracking, dependency-directed backtracking, backjumping, conflict-directed backjumping, nogood sets, look-back. In this study, we reproduce two new hybrid intelligent systems, involve three prominent intelligent computing and approximate reasoning methods: Self Organizing feature Map (SOM), Neruo-Fuzzy Inference System and Rough Set Theory (RST),called SONFIS and SORST. We show how our algorithms can be construed as a linkage of government-society interactions, where government catches various states of behaviors: solid (absolute) or flexible. So, transition of society, by changing of connectivity parameters (noise) from order to disorder is inferred. A society's single emergent, increasing intelligence arises partly from the thermodynamic advantages of networking the innate intelligence of different individuals, and partly from the accumulation of solved problems. Economic growth is proportional to the square of the network entropy of a society's population times the network entropy of the number of the society's solved problems. The World Wide Web (WWW) allows the people to share the information (data) from the large database repositories globally. The amount of information grows billions of databases. We need to search the information will specialize tools known generically search engine. There are many of search engines available today, retrieving meaningful information is difficult. However to overcome this problem in search engines to retrieve meaningful information intelligently, semantic web technologies are playing a major role. In this paper we present survey on the search engine generations and the role of search engines in intelligent web and semantic search technologies. Informledge System (ILS) is a knowledge network with autonomous nodes and intelligent links that integrate and structure the pieces of knowledge. In this paper, we aim to put forward the link dynamics involved in intelligent processing of information in ILS. There has been advancement in knowledge management field which involve managing information in databases from a single domain. ILS works with information from multiple domains stored in distributed way in the autonomous nodes termed as Knowledge Network Node (KNN). Along with the concept under consideration, KNNs store the processed information linking concepts and processors leading to the appropriate processing of information. Location management refers to the problem of updating and searching the current location of mobile nodes in a wireless network. To make it efficient, the sum of update costs of location database must be minimized. Previous work relying on fixed location databases is unable to fully exploit the knowledge of user mobility patterns in the system so as to achieve this minimization. The study presents an intelligent location management approach which has interacts between intelligent information system and knowledge-base technologies, so we can dynamically change the user patterns and reduce the transition between the VLR and HLR. The study provides algorithms are ability to handle location registration and call delivery This paper introduces a new computing model based on the cooperation among Turing machines called orchestrated machines. Like universal Turing machines, orchestrated machines are also designed to simulate Turing machines but they can also modify the original operation of the included Turing machines to create a new layer of some kind of collective behavior. Using this new model we can define some interested notions related to cooperation ability of Turing machines such as the intelligence quotient or the emotional intelligence quotient for Turing machines. Imprecise-information processing will play an indispensable role in intelligent systems, especially in the anthropomorphic intelligent systems (as intelligent robots). A new theoretical and technological system of imprecise-information processing has been founded in Principles of Imprecise-Information Processing: A New Theoretical and Technological System[1] which is different from fuzzy technology. The system has clear hierarchy and rigorous structure, which results from the formation principle of imprecise information and has solid mathematical and logical bases, and which has many advantages beyond fuzzy technology. The system provides a technological platform for relevant applications and lays a theoretical foundation for further research. IQ tests are an accepted method for assessing human intelligence. The tests consist of several parts that must be solved under a time constraint. Of all the tested abilities, pattern recognition has been found to have the highest correlation with general intelligence. This is primarily because pattern recognition is the ability to find order in a noisy environment, a necessary skill for intelligent agents. In this paper, we propose a convolutional neural network (CNN) model for solving geometric pattern recognition problems. The CNN receives as input multiple ordered input images and outputs the next image according to the pattern. Our CNN is able to solve problems involving rotation, reflection, color, size and shape patterns and score within the top 5% of human performance. This paper describes an approach to the design of a population of cooperative robots based on concepts borrowed from Systems Theory and Artificial Intelligence. The research has been developed under the SocRob project, carried out by the Intelligent Systems Laboratory at the Institute for Systems and Robotics - Instituto Superior Tecnico (ISR/IST) in Lisbon. The acronym of the project stands both for "Society of Robots" and "Soccer Robots", the case study where we are testing our population of robots. Designing soccer robots is a very challenging problem, where the robots must act not only to shoot a ball towards the goal, but also to detect and avoid static (walls, stopped robots) and dynamic (moving robots) obstacles. Furthermore, they must cooperate to defeat an opposing team. Our past and current research in soccer robotics includes cooperative sensor fusion for world modeling, object recognition and tracking, robot navigation, multi-robot distributed task planning and coordination, including cooperative reinforcement learning in cooperative and adversarial environments, and behavior-based architectures for real time task execution of cooperating robot teams. We present a case study of artificial intelligence techniques applied to the control of production printing equipment. Like many other real-world applications, this complex domain requires high-speed autonomous decision-making and robust continual operation. To our knowledge, this work represents the first successful industrial application of embedded domain-independent temporal planning. Our system handles execution failures and multi-objective preferences. At its heart is an on-line algorithm that combines techniques from state-space planning and partial-order scheduling. We suggest that this general architecture may prove useful in other applications as more intelligent systems operate in continual, on-line settings. Our system has been used to drive several commercial prototypes and has enabled a new product architecture for our industrial partner. When compared with state-of-the-art off-line planners, our system is hundreds of times faster and often finds better plans. Our experience demonstrates that domain-independent AI planning based on heuristic search can flexibly handle time, resources, replanning, and multiple objectives in a high-speed practical application without requiring hand-coded control knowledge. This article is an attempt to combine different ways of working with sets of objects and their classes for designing and development of artificial intelligent systems (AIS) of analysis information, using object-oriented programming (OOP). This paper contains analysis of basic concepts of OOP and their relation with set theory and artificial intelligence (AI). Process of sets and multisets creation from different sides, in particular mathematical set theory, OOP and AI is considered. Definition of object and its properties, homogeneous and inhomogeneous classes of objects, set of objects, multiset of objects and constructive methods of their creation and classification are proposed. In addition, necessity of some extension of existing OOP tools for the purpose of practical implementation AIS of analysis information, using proposed approach, is shown. In this paper, we propose a new methodology based on the Negative Selection Algorithm that belongs to the field of Computational Intelligence, specifically, Artificial Immune Systems to identify takeover targets. Although considerable research based on customary statistical techniques and some contemporary Computational Intelligence techniques have been devoted to identify takeover targets, most of the existing studies are based upon multiple previous mergers and acquisitions. Contrary to previous research, the novelty of this proposal lies in its ability to suggest takeover targets for novice firms that are at the beginning of their merger and acquisition spree. We first discuss the theoretical perspective and then provide a case study with details for practical implementation, both capitalizing from unique generalization capabilities of artificial immune systems algorithms. We propose to use thought-provoking children's questions (TPCQs), namely Highlights BrainPlay questions, as a new method to drive artificial intelligence research and to evaluate the capabilities of general-purpose AI systems. These questions are designed to stimulate thought and learning in children, and they can be used to do the same thing in AI systems, while demonstrating the system's reasoning capabilities to the evaluator. We introduce the TPCQ task, which which takes a TPCQ question as input and produces as output (1) answers to the question and (2) learned generalizations. We discuss how BrainPlay questions stimulate learning. We analyze 244 BrainPlay questions, and we report statistics on question type, question class, answer cardinality, answer class, types of knowledge needed, and types of reasoning needed. We find that BrainPlay questions span many aspects of intelligence. Because the answers to BrainPlay questions and the generalizations learned from them are often highly open-ended, we suggest using human judges for evaluation. Detection of non-technical losses (NTL) which include electricity theft, faulty meters or billing errors has attracted increasing attention from researchers in electrical engineering and computer science. NTLs cause significant harm to the economy, as in some countries they may range up to 40% of the total electricity distributed. The predominant research direction is employing artificial intelligence to predict whether a customer causes NTL. This paper first provides an overview of how NTLs are defined and their impact on economies, which include loss of revenue and profit of electricity providers and decrease of the stability and reliability of electrical power grids. It then surveys the state-of-the-art research efforts in a up-to-date and comprehensive review of algorithms, features and data sets used. It finally identifies the key scientific and engineering challenges in NTL detection and suggests how they could be addressed in the future. There is a growing focus on how to design safe artificial intelligent (AI) agents. As systems become more complex, poorly specified goals or control mechanisms may cause AI agents to engage in unwanted and harmful outcomes. Thus it is necessary to design AI agents that follow initial programming intentions as the program grows in complexity. How to specify these initial intentions has also been an obstacle to designing safe AI agents. Finally, there is a need for the AI agent to have redundant safety mechanisms to ensure that any programming errors do not cascade into major problems. Humans are autonomous intelligent agents that have avoided these problems and the present manuscript argues that by understanding human self-regulation and goal setting, we may be better able to design safe AI agents. Some general principles of human self-regulation are outlined and specific guidance for AI design is given. The majority of artificial intelligence research, as it relates from which to biological senses has been focused on vision. The recent explosion of machine learning and in particular, dee p learning, can be partially attributed to the release of high quality data sets for algorithm s from which to model the world on. Thus, most of these datasets are comprised of images. We believe that focusing on sensorimotor systems and tactile feedback will create algorithms that better mimic human intelligence. Here we present SenseNet: a collection of tactile simulators and a large scale dataset of 3D objects for manipulation. SenseNet was created for the purpose of researching and training Artificial Intelligences (AIs) to interact with the environment via sensorimotor neural systems and tactile feedback. We aim to accelerate that same explosion in image processing, but for the domain of tactile feedback and sensorimotor research. We hope that SenseNet can offer researchers in both the machine learning and computational neuroscience communities brand new opportunities and avenues to explore. Artificial life models, swarm intelligent and evolutionary computation algorithms are usually built on fixed size populations. Some studies indicate however that varying the population size can increase the adaptability of these systems and their capability to react to changing environments. In this paper we present an extended model of an artificial ant colony system designed to evolve on digital image habitats. We will show that the present swarm can adapt the size of the population according to the type of image on which it is evolving and reacting faster to changing images, thus converging more rapidly to the new desired regions, regulating the number of his image foraging agents. Finally, we will show evidences that the model can be associated with the Mathematical Morphology Watershed algorithm to improve the segmentation of digital grey-scale images. KEYWORDS: Swarm Intelligence, Perception and Image Processing, Pattern Recognition, Mathematical Morphology, Social Cognitive Maps, Social Foraging, Self-Organization, Distributed Search. Modelling problems containing a mixture of Boolean and numerical variables is a long-standing interest of Artificial Intelligence. However, performing inference and learning in hybrid domains is a particularly daunting task. The ability to model this kind of domains is crucial in "learning to design" tasks, that is, learning applications where the goal is to learn from examples how to perform automatic {\em de novo} design of novel objects. In this paper we present Structured Learning Modulo Theories, a max-margin approach for learning in hybrid domains based on Satisfiability Modulo Theories, which allows to combine Boolean reasoning and optimization over continuous linear arithmetical constraints. The main idea is to leverage a state-of-the-art generalized Satisfiability Modulo Theory solver for implementing the inference and separation oracles of Structured Output SVMs. We validate our method on artificial and real world scenarios. The ozone level prediction is an important task of air quality agencies of modern cities. In this paper, we design an ozone level alarm system (OLP) for Isfahan city and test it through the real word data from 1-1-2000 to 7-6-2011. We propose a computer based system with three inputs and single output. The inputs include three sensors of solar ultraviolet (UV), total solar radiation (TSR) and total ozone (O3). And the output of the system is the predicted O3 of the next day and the alarm massages. A developed artificial intelligence (AI) algorithm is applied to determine the output, based on the inputs variables. For this issue, AI models, including supervised brain emotional learning (BEL), adaptive neuro-fuzzy inference system (ANFIS) and artificial neural networks (ANNs), are compared in order to find the best model. The simulation of the proposed system shows that it can be used successfully in prediction of major cities ozone level. An artificial superintelligence (ASI) is artificial intelligence that is significantly more intelligent than humans in all respects. While ASI does not currently exist, some scholars propose that it could be created sometime in the future, and furthermore that its creation could cause a severe global catastrophe, possibly even resulting in human extinction. Given the high stakes, it is important to analyze ASI risk and factor the risk into decisions related to ASI research and development. This paper presents a graphical model of major pathways to ASI catastrophe, focusing on ASI created via recursive self-improvement. The model uses the established risk and decision analysis modeling paradigms of fault trees and influence diagrams in order to depict combinations of events and conditions that could lead to AI catastrophe, as well as intervention options that could decrease risks. The events and conditions include select aspects of the ASI itself as well as the human process of ASI research, development, and management. Model structure is derived from published literature on ASI risk. The model offers a foundation for rigorous quantitative evaluation and decision making on the long-term risk of ASI catastrophe. In the artificial intelligence field, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating an artificial intelligence system, we must make two decisions: what representation should be used (i.e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions. Using most learning rules, these two decisions are coupled in a subtle (and often unintentional) way. That is, using the same learning rule with two different representations that can represent the same sets of functions can result in two different outcomes. After arguing that this coupling is undesirable, particularly when using artificial neural networks, we present a method for partially decoupling these two decisions for a broad class of learning rules that span unsupervised learning, reinforcement learning, and supervised learning. The rapid advancement of machine learning techniques has re-energized research into general artificial intelligence. While the idea of domain-agnostic meta-learning is appealing, this emerging field must come to terms with its relationship to human cognition and the statistics and structure of the tasks humans perform. The position of this article is that only by aligning our agents' abilities and environments with those of humans do we stand a chance at developing general artificial intelligence (GAI). A broad reading of the famous 'No Free Lunch' theorem is that there is no universally optimal inductive bias or, equivalently, bias-free learning is impossible. This follows from the fact that there are an infinite number of ways to extrapolate data, any of which might be the one used by the data generating environment; an inductive bias prefers some of these extrapolations to others, which lowers performance in environments using these adversarial extrapolations. We may posit that the optimal GAI is the one that maximally exploits the statistics of its environment to create its inductive bias; accepting the fact that this agent is guaranteed to be extremely sub-optimal for some alternative environments. This trade-off appears benign when thinking about the environment as being the physical universe, as performance on any fictive universe is obviously irrelevant. But, we should expect a sharper inductive bias if we further constrain our environment. Indeed, we implicitly do so by defining GAI in terms of accomplishing that humans consider useful. One common version of this is need the for 'common-sense reasoning', which implicitly appeals to the statistics of physical universe as perceived by humans. Next-generation wireless networks must support ultra-reliable, low-latency communication and intelligently manage a massive number of Internet of Things (IoT) devices in real-time, within a highly dynamic environment. This need for stringent communication quality-of-service (QoS) requirements as well as mobile edge and core intelligence can only be realized by integrating fundamental notions of artificial intelligence (AI) and machine learning across the wireless infrastructure and end-user devices. In this context, this paper provides a comprehensive tutorial that introduces the main concepts of machine learning, in general, and artificial neural networks (ANNs), in particular, and their potential applications in wireless communications. For this purpose, we present a comprehensive overview on a number of key types of neural networks that include feed-forward, recurrent, spiking, and deep neural networks. For each type of neural network, we present the basic architecture and training procedure, as well as the associated challenges and opportunities. Then, we provide an in-depth overview on the variety of wireless communication problems that can be addressed using ANNs, ranging from communication using unmanned aerial vehicles to virtual reality and edge caching.For each individual application, we present the main motivation for using ANNs along with the associated challenges while also providing a detailed example for a use case scenario and outlining future works that can be addressed using ANNs. In a nutshell, this article constitutes one of the first holistic tutorials on the development of machine learning techniques tailored to the needs of future wireless networks. What would a human hundreds or thousands times more intelligent than the brightest human ever born be like? We must admit we can hardly guess. A human being of such intelligence will be so radically different from us that it can hardly, if at all, be recognized as human. If we had to go back along the evolutionary tree to identify a creature 1000 times less intelligent than the average contemporary human, we will have to go really far back. Would it be a kind of a lizard? An insect perhaps? Considering this, how can we possibly aspire to have a grasp of something a thousand times more intelligent than us? When it comes to intelligence, even the very attempt to quantify it is highly misleading. Now if we attend to a seemingly adjacent question, what would a machine with such capacity for intelligence be like? Just coming up with an approximate metaphor requires a huge stretch of the imagination, meaning that almost anything goes... What would a society of such super intelligent agents, be they human, machines or an amalgam of both, be like? Well, here we are transported into the realm of pure speculation. Technological Singularity is referred to as the event of artificial intelligence surpassing the intelligence of humans and shortly after augmenting itself far beyond that. It is no wonder that the mathematical concept of singularity has become the symbol of an event so disruptive and so far reaching that it is impossible to conceptually or even metaphorically grasp, much less to predict. This article is about how the "SP theory of intelligence" and its realisation in the "SP machine" (both outlined in the article) may help to solve computer-related problems in the design of autonomous robots, meaning robots that do not depend on external intelligence or power supplies, are mobile, and are designed to exhibit as much human-like intelligence as possible. The article is about: how to increase the computational and energy efficiency of computers and reduce their bulk; how to achieve human-like versatility in intelligence; and likewise for human-like adaptability in intelligence. The SP system has potential for substantial gains in computational and energy efficiency and reductions in the bulkiness of computers: by reducing the size of data to be processed; by exploiting statistical information that the system gathers; and via an updated version of Donald Hebb's concept of a "cell assembly". Towards human-like versatility in intelligence, the SP system has strengths in unsupervised learning, natural language processing, pattern recognition, information retrieval, several kinds of reasoning, planning, problem solving, and more, with seamless integration amongst structures and functions. The SP system's strengths in unsupervised learning and other aspects of intelligence may help to achieve human-like adaptability in intelligence via: the learning of natural language; learning to see; building 3D models of objects and of a robot's surroundings; learning regularities in the workings of a robot and in the robot's environment; exploration and play; learning major skills; and secondary forms of learning. Also discussed are: how the SP system may process parallel streams of information; generalisation of knowledge, correction of over-generalisations, and learning from dirty data; how to cut the cost of learning; and reinforcements, motivations, goals, and demonstration. The integration of different learning and adaptation techniques to overcome individual limitations and to achieve synergetic effects through the hybridization or fusion of these techniques has, in recent years, contributed to a large number of new intelligent system designs. Computational intelligence is an innovative framework for constructing intelligent hybrid architectures involving Neural Networks (NN), Fuzzy Inference Systems (FIS), Probabilistic Reasoning (PR) and derivative free optimization techniques such as Evolutionary Computation (EC). Most of these hybridization approaches, however, follow an ad hoc design methodology, justified by success in certain application domains. Due to the lack of a common framework it often remains difficult to compare the various hybrid systems conceptually and to evaluate their performance comparatively. This chapter introduces the different generic architectures for integrating intelligent systems. The designing aspects and perspectives of different hybrid archirectures like NN-FIS, EC-FIS, EC-NN, FIS-PR and NN-FIS-EC systems are presented. Some conclusions are also provided towards the end. The intelligent acoustic emission locator is described in Part I, while Part II discusses blind source separation, time delay estimation and location of two simultaneously active continuous acoustic emission sources. The location of acoustic emission on complicated aircraft frame structures is a difficult problem of non-destructive testing. This article describes an intelligent acoustic emission source locator. The intelligent locator comprises a sensor antenna and a general regression neural network, which solves the location problem based on learning from examples. Locator performance was tested on different test specimens. Tests have shown that the accuracy of location depends on sound velocity and attenuation in the specimen, the dimensions of the tested area, and the properties of stored data. The location accuracy achieved by the intelligent locator is comparable to that obtained by the conventional triangulation method, while the applicability of the intelligent locator is more general since analysis of sonic ray paths is avoided. This is a promising method for non-destructive testing of aircraft frame structures by the acoustic emission method. Contradiction is often seen as a defect of intelligent systems and a dangerous limitation on efficiency. In this paper we raise the question of whether, on the contrary, it could be considered a key tool in increasing intelligence in biological structures. A possible way of answering this question in a mathematical context is shown, formulating a proposition that suggests a link between intelligence and contradiction. A concrete approach is presented in the well-defined setting of cellular automata. Here we define the models of ``observer'', ``entity'', ``environment'', ``intelligence'' and ``contradiction''. These definitions, which roughly correspond to the common meaning of these words, allow us to deduce a simple but strong result about these concepts in an unbiased, mathematical manner. Evidence for a real-world counterpart to the demonstrated formal link between intelligence and contradiction is provided by three computational experiments. We explore self-organizing strategies for role assignment in a foraging task carried out by a colony of artificial agents. Our strategies are inspired by various mechanisms of division of labor (polyethism) observed in eusocial insects like ants, termites, or bees. Specifically we instantiate models of caste polyethism and age or temporal polyethism to evaluated the benefits to foraging in a dynamic environment. Our experiment is directly related to the exploration/exploitation trade of in machine learning. The application of evolution in the digital realm, with the goal of creating artificial intelligence and artificial life, has a history as long as that of the digital computer itself. We illustrate the intertwined history of these ideas, starting with the early theoretical work of John von Neumann and the pioneering experimental work of Nils Aall Barricelli. We argue that evolutionary thinking and artificial life will continue to play an integral role in the future development of the digital world. A general methodology is proposed to engineer a system of interacting components (particles) which is able to self-regulate their concentrations in order to produce any prescribed output in response to a particular input. The methodology is based on the mathematical equivalence between artificial neurons in neural networks and species in autocatalytic reactions, and it specifies the relationship between the artificial neural network's parameters and the rate coefficients of the reactions between particle species. Such systems are characterised by a high degree of robustness as they are able to reach the desired output despite disturbances and perturbations of the concentrations of the various species. Our paradigm for the use of artificial agents to teach requires among other things that they persist through time in their interaction with human students, in such a way that they "teleport" or "migrate" from an embodiment at one time t to a different embodiment at later time t'. In this short paper, we report on initial steps toward the formalization of such teleportation, in order to enable an overseeing AI system to establish, mechanically, and verifiably, that the human students in question will likely believe that the very same artificial agent has persisted across such times despite the different embodiments. Human-Computer Interaction with the traditional User Interface is done using a specified in advance script dialog menu, mainly based on human intellect and unproductive use of navigation. This approach does not lead to making qualitative decision in control systems, where the situations and processes cannot be structured in advance. Any dynamic changes in the controlled business process (as example, in organizational unit of the information fuzzy control system) make it necessary to modify the script dialogue in User Interface. This circumstance leads to a redesign of the components of the User Interface and of the entire control system. In the Intelligent User Interface, where the dialog situations are unknown in advance, fuzzy structured and artificial intelligence is crucial, the redesign described above is impossible. To solve this and other problems, we propose the data, information and knowledge based technology of Smart/ Intelligent User Interface (IUI) design, which interacts with users and systems in natural and other languages, utilizing the principles of Situational Control and Fuzzy Logic theories, Artificial Intelligence, Linguistics, Knowledge Base technologies and others. The proposed technology of IUI design is defined by multi-agents of Situational Control and of data, information and knowledge, modelling of Fuzzy Logic Inference, Generalization, Representation and Explanation of knowledge, Planning and Decision-making, Dialog Control, Reasoning and Systems Thinking, Fuzzy Control of organizational unit in real-time, fuzzy conditions, heterogeneous domains, and multi-lingual communication under uncertainty and in Fuzzy Environment. Society has become more dependent on automated intelligent systems, at the same time, these systems have become more and more complicated. Society's expectation regarding the capabilities and intelligence of such systems has also grown. We have become a more complicated society with more complicated problems. As the expectation of intelligent systems rises, we discover many more applications for artificial intelligence. Additionally, as the difficulty level and computational requirements of such problems rise, there is a need to distribute the problem solving. Although the field of multiagent systems (MAS) and distributed artificial intelligence (DAI) is relatively young, the importance and applicability of this technology for solving today's problems continue to grow. In multiagent systems, the main goal is to provide fruitful cooperation among agents in order to enrich the support given to all user activities. This paper deals with the development of a multiagent system aimed at solving the reservation problems encountered in rural tourism. Due to their benefits over the last few years, online travel agencies have become a very useful instrument in planning vacations. A MAS concept (which is based on the Internet exploitation) can improve this activity and provide clients with a new, rapid and efficient way of making accommodation arrangements. Designing and implementing an intelligent and user friendly human machine interface for any kind of software or hardware oriented application is always be a challenging task for the designers and developers because it is very difficult to understand the psychology of the user, nature of the work and best suit of the environment. This research paper is basically about to propose an intelligent, flexible and user friendly machine interface for Product Life Cycle Management products or PDM Systems since studies show that usability and human computer interaction issues are a major cause of acceptance problems introducing or using such systems. Going into details of the proposition, we present prototype implementations about theme based on design requirements, designed designs and technologies involved for the development of human machine interface. This paper shows that maintaining logical consistency of an iris recognition system is a matter of finding a suitable partitioning of the input space in enrollable and unenrollable pairs by negotiating the user comfort and the safety of the biometric system. In other words, consistent enrollment is mandatory in order to preserve system consistency. A fuzzy 3-valued disambiguated model of iris recognition is proposed and analyzed in terms of completeness, consistency, user comfort and biometric safety. It is also shown here that the fuzzy 3-valued model of iris recognition is hosted by an 8-valued Boolean algebra of modulo 8 integers that represents the computational formalization in which a biometric system (a software agent) can achieve the artificial understanding of iris recognition in a logically consistent manner. Existing theoretical universal algorithmic intelligence models are not practically realizable. More pragmatic approach to artificial general intelligence is based on cognitive architectures, which are, however, non-universal in sense that they can construct and use models of the environment only from Turing-incomplete model spaces. We believe that the way to the real AGI consists in bridging the gap between these two approaches. This is possible if one considers cognitive functions as a "cognitive bias" (priors and search heuristics) that should be incorporated into the models of universal algorithmic intelligence without violating their universality. Earlier reported results suiting this approach and its overall feasibility are discussed on the example of perception, planning, knowledge representation, attention, theory of mind, language, and some others. Solomonoff induction is known to be universal, but incomputable. Its approximations, namely, the Minimum Description (or Message) Length (MDL) principles, are adopted in practice in the efficient, but non-universal form. Recent attempts to bridge this gap leaded to development of the Representational MDL principle that originates from formal decomposition of the task of induction. In this paper, possible extension of the RMDL principle in the context of universal intelligence agents is considered, for which introduction of representations is shown to be an unavoidable meta-heuristic and a step toward efficient general intelligence. Hierarchical representations and model optimization with the use of information-theoretic interpretation of the adaptive resonance are also discussed. Any agent that is part of the environment it interacts with and has versatile actuators (such as arms and fingers), will in principle have the ability to self-modify -- for example by changing its own source code. As we continue to create more and more intelligent agents, chances increase that they will learn about this ability. The question is: will they want to use it? For example, highly intelligent systems may find ways to change their goals to something more easily achievable, thereby `escaping' the control of their designers. In an important paper, Omohundro (2008) argued that goal preservation is a fundamental drive of any intelligent system, since a goal is more likely to be achieved if future versions of the agent strive towards the same goal. In this paper, we formalise this argument in general reinforcement learning, and explore situations where it fails. Our conclusion is that the self-modification possibility is harmless if and only if the value function of the agent anticipates the consequences of self-modifications and use the current utility function when evaluating the future. The Google DeepMind challenge match in March 2016 was a historic achievement for computer Go development. This article discusses the development of computational intelligence (CI) and its relative strength in comparison with human intelligence for the game of Go. We first summarize the milestones achieved for computer Go from 1998 to 2016. Then, the computer Go programs that have participated in previous IEEE CIS competitions as well as methods and techniques used in AlphaGo are briefly introduced. Commentaries from three high-level professional Go players on the five AlphaGo versus Lee Sedol games are also included. We conclude that AlphaGo beating Lee Sedol is a huge achievement in artificial intelligence (AI) based largely on CI methods. In the future, powerful computer Go programs such as AlphaGo are expected to be instrumental in promoting Go education and AI real-world applications. As an inevitable trend of future 5G networks, Software Defined architecture has many advantages in providing central- ized control and flexible resource management. But it is also confronted with various security challenges and potential threats with emerging services and technologies. As the focus of network security, Intrusion Detection Systems (IDS) are usually deployed separately without collaboration. They are also unable to detect novel attacks with limited intelligent abilities, which are hard to meet the needs of software defined 5G. In this paper, we propose an intelligent intrusion system taking the advances of software defined technology and artificial intelligence based on Software Defined 5G architecture. It flexibly combines security function mod- ules which are adaptively invoked under centralized management and control with a globle view. It can also deal with unknown intrusions by using machine learning algorithms. Evaluation results prove that the intelligent intrusion detection system achieves a better performance. Neuroscience research has produced many theories and computational neural models of sensory nervous systems. Notwithstanding many different perspectives towards developing intelligent machines, artificial intelligence has ultimately been influenced by neuroscience. Therefore, this paper provides an introduction to biologically inspired machine intelligence by exploring the basic principles of sensation and perception as well as the structure and behavior of biological sensory nervous systems like the neocortex. Concepts like spike timing, synaptic plasticity, inhibition, neural structure, and neural behavior are applied to a new model, Simple Cortex (SC). A software implementation of SC has been built and demonstrates fast observation, learning, and prediction of spatio-temporal sensory-motor patterns and sequences. Finally, this paper suggests future areas of improvement and growth for Simple Cortex and other related machine intelligence models. We review the theory of neural networks, as it has emerged in the last ten years or so within the physics community, emphasizing questions of biological relevance over those of importance in mathematical statistics and machine learning theory. In the paper arguments are given why the concept of static evaluation has the potential to be a useful extension to Monte Carlo tree search. A new concept of modeling static evaluation through a dynamical system is introduced and strengths and weaknesses are discussed. The general suitability of this approach is demonstrated. We define and explore in simulation several rules for the local evolution of generative rules for 1D and 2D cellular automata. Our implementation uses strategies from conceptual blending. We discuss potential applications to modelling social dynamics. Software capable of improving itself has been a dream of computer scientists since the inception of the field. In this work we provide definitions for Recursively Self-Improving software, survey different types of self-improving software, review the relevant literature, analyze limits on computation restricting recursive self-improvement and introduce RSI Convergence Theory which aims to predict general behavior of RSI systems. Finally, we address security implications from self-improving intelligent software. A fuzzy clustering algorithm for multidimensional data is proposed in this article. The data is described by vectors whose components are linguistic variables defined in an ordinal scale. The obtained results confirm the efficiency of the proposed approach. This paper suggests a new interpretation of the Dempster-Shafer theory in terms of probabilistic interpretation of plausibility. A new rule of combination of independent evidence is shown and its preservation of interpretation is demonstrated. People can think in auditory, visual and tactile forms of language, so can machines principally. But is it possible for them to think in radio language? According to a first principle presented for general intelligence, i.e. the principle of language's relativity, the answer may give an exceptional solution for robot astronauts to talk with each other in space exploration. Modern evolutionary computation utilizes heuristic optimizations based upon concepts borrowed from the Darwinian theory of natural selection. We believe that a vital direction in this field must be algorithms that model the activity of genomic parasites, such as transposons, in biological evolution. This publication is our first step in the direction of developing a minimal assortment of algorithms that simulate the role of genomic parasites. Specifically, we started in the domain of genetic algorithms (GA) and selected the Artificial Ant Problem as a test case. We define these artificial transposons as a fragment of an ant's code that possesses properties that cause it to stand apart from the rest. We concluded that artificial transposons, analogous to real transposons, are truly capable of acting as intelligent mutators that adapt in response to an evolutionary problem in the course of co-evolution with their hosts. Artificial reinforcement learning (RL) is a widely used technique in artificial intelligence that provides a general method for training agents to perform a wide variety of behaviours. RL as used in computer science has striking parallels to reward and punishment learning in animal and human brains. I argue that present-day artificial RL agents have a very small but nonzero degree of ethical importance. This is particularly plausible for views according to which sentience comes in degrees based on the abilities and complexities of minds, but even binary views on consciousness should assign nonzero probability to RL programs having morally relevant experiences. While RL programs are not a top ethical priority today, they may become more significant in the coming decades as RL is increasingly applied to industry, robotics, video games, and other areas. I encourage scientists, philosophers, and citizens to begin a conversation about our ethical duties to reduce the harm that we inflict on powerless, voiceless RL agents. Biological plastic neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks with a large variety of dynamics, architectures, and plasticity rules: these artificial systems are composed of inputs, outputs, and plastic components that change in response to experiences in an environment. These systems may autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now setting the conditions for radically new approaches and results. In particular, the limitations of hand-designed networks could be overcome by more flexible and innovative solutions. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main methods and results are reviewed. Finally, new opportunities and developments are presented. We present the first experimental realization of a quantum artificial life algorithm in a quantum computer. The quantum biomimetic protocol encodes tailored quantum behaviors belonging to living systems, namely, self-replication, mutation, interaction between individuals, and death, into the cloud quantum computer IBM ibmqx4. In this experiment, entanglement spreads throughout generations of individuals, where genuine quantum information features are inherited through genealogical networks. As a pioneering proof-of-principle, experimental data fits the ideal model with accuracy. Thereafter, these and other models of quantum artificial life, for which no classical device may predict its quantum supremacy evolution, can be further explored in novel generations of quantum computers. Quantum biomimetics, quantum machine learning, and quantum artificial intelligence will move forward hand in hand through more elaborate levels of quantum complexity. Artificial neural networks (ANNs), while exceptionally useful for classification, are vulnerable to misdirection. Small amounts of noise can significantly affect their ability to correctly complete a task. Instead of generalizing concepts, ANNs seem to focus on surface statistical regularities in a given task. Here we compare how recurrent artificial neural networks, long short-term memory units, and Markov Brains sense and remember their environments. We show that information in Markov Brains is localized and sparsely distributed, while the other neural network substrates "smear" information about the environment across all nodes, which makes them vulnerable to noise. This paper deals with the problem of classifying signals. The new method for building so called local classifiers and local features is presented. The method is a combination of the lifting scheme and the support vector machines. Its main aim is to produce effective and yet comprehensible classifiers that would help in understanding processes hidden behind classified signals. To illustrate the method we present the results obtained on an artificial and a real dataset. This philosophical paper explores the relation between modern scientific simulations and the future of the universe. We argue that a simulation of an entire universe will result from future scientific activity. This requires us to tackle the challenge of simulating open-ended evolution at all levels in a single simulation. The simulation should encompass not only biological evolution, but also physical evolution (a level below) and cultural evolution (a level above). The simulation would allow us to probe what would happen if we would "replay the tape of the universe" with the same or different laws and initial conditions. We also distinguish between real-world and artificial-world modelling. Assuming that intelligent life could indeed simulate an entire universe, this leads to two tentative hypotheses. Some authors have argued that we may already be in a simulation run by an intelligent entity. Or, if such a simulation could be made real, this would lead to the production of a new universe. This last direction is argued with a careful speculative philosophical approach, emphasizing the imperative to find a solution to the heat death problem in cosmology. The reader is invited to consult Annex 1 for an overview of the logical structure of this paper. -- Keywords: far future, future of science, ALife, simulation, realization, cosmology, heat death, fine-tuning, physical eschatology, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis, selfish biocosm hypothesis, meduso-anthropic principle, developmental singularity hypothesis, role of intelligent life. The ongoing discussion whether modern vision systems have to be viewed as visually-enabled cognitive systems or cognitively-enabled vision systems is groundless, because perceptual and cognitive faculties of vision are separate components of human (and consequently, artificial) information processing system modeling. For natural and artificial systems with some symmetry structure, computational understanding and manipulation can be achieved without learning by exploiting the algebraic structure. Here we describe this algebraic coordinatization method and apply it to permutation puzzles. Coordinatization yields a structural understanding, not just solutions for the puzzles. Social intelligence in natural and artificial systems is usually measured by the evaluation of associated traits or tasks that are deemed to represent some facets of social behaviour. The amalgamation of these traits is then used to configure the intuitive notion of social intelligence. Instead, in this paper we start from a parametrised definition of social intelligence as the expected performance in a set of environments with several agents, and we assess and derive tests from it. This definition makes several dependencies explicit: (1) the definition depends on the choice (and weight) of environments and agents, (2) the definition may include both competitive and cooperative behaviours depending on how agents and rewards are arranged into teams, (3) the definition mostly depends on the abilities of other agents, and (4) the actual difference between social intelligence and general intelligence (or other abilities) depends on these choices. As a result, we address the problem of converting this definition into a more precise one where some fundamental properties ensuring social behaviour (such as action and reward dependency and anticipation on competitive/cooperative behaviours) are met as well as some other more instrumental properties (such as secernment, boundedness, symmetry, validity, reliability, efficiency), which are convenient to convert the definition into a practical test. From the definition and the formalised properties, we take a look at several representative multi-agent environments, tests and games to see whether they meet these properties. Understanding and using natural processes for intelligent functionalities, referred to as natural intelligence, has recently attracted interest from a variety of fields, including post-silicon computing for artificial intelligence and decision making in the behavioural sciences. In a past study, we successfully used the wave-particle duality of single photons to solve the two-armed bandit problem, which constitutes the foundation of reinforcement learning and decision making. In this study, we propose and confirm a hierarchical architecture for single-photon-based reinforcement learning and decision making that verifies the scalability of the principle. Specifically, the four-armed bandit problem is solved given zero prior knowledge in a two-layer hierarchical architecture, where polarization is autonomously adapted in order to effect adequate decision making using single-photon measurements. In the hierarchical structure, the notion of layer-dependent decisions emerges. The optimal solutions in the coarse layer and in the fine layer, however, conflict with each other in some contradictive problems. We show that while what we call a tournament strategy resolves such contradictions, the probabilistic nature of single photons allows for the direct location of the optimal solution even for contradictive problems, hence manifesting the exploration ability of single photons. This study provides insights into photon intelligence in hierarchical architectures for future artificial intelligence as well as the potential of natural processes for intelligent functionalities. Techniques are presented for defining models of computational linguistics theories. The methods of generalized diagrams that were developed by this author for modeling artificial intelligence planning and reasoning are shown to be applicable to models of computation of linguistics theories. It is shown that for extensional and intensional interpretations, models can be generated automatically which assign meaning to computations of linguistics theories for natural languages. Keywords: Computational Linguistics, Reasoning Models, G-diagrams For Models, Dynamic Model Implementation, Linguistics and Logics For Artificial Intelligence Because of their occasional need to return to shallow points in a search tree, existing backtracking methods can sometimes erase meaningful progress toward solving a search problem. In this paper, we present a method by which backtrack points can be moved deeper in the search space, thereby avoiding this difficulty. The technique developed is a variant of dependency-directed backtracking that uses only polynomial space while still providing useful control information and retaining the completeness guarantees provided by earlier approaches. In this paper we describe how to modify GSAT so that it can be applied to non-clausal formulas. The idea is to use a particular ``score'' function which gives the number of clauses of the CNF conversion of a formula which are false under a given truth assignment. Its value is computed in linear time, without constructing the CNF conversion itself. The proposed methodology applies to most of the variants of GSAT proposed so far. This paper introduces a framework for Planning while Learning where an agent is given a goal to achieve in an environment whose behavior is only partially known to the agent. We discuss the tractability of various plan-design processes. We show that for a large natural class of Planning while Learning systems, a plan can be presented and verified in a reasonable time. However, coming up algorithmically with a plan, even for simple classes of systems is apparently intractable. We emphasize the role of off-line plan-design processes, and show that, in most natural cases, the verification (projection) part can be carried out in an efficient algorithmic manner. We present a definition of cause and effect in terms of decision-theoretic primitives and thereby provide a principled foundation for causal reasoning. Our definition departs from the traditional view of causation in that causal assertions may vary with the set of decisions available. We argue that this approach provides added clarity to the notion of cause. Also in this paper, we examine the encoding of causal relationships in directed acyclic graphs. We describe a special class of influence diagrams, those in canonical form, and show its relationship to Pearl's representation of cause and effect. Finally, we show how canonical form facilitates counterfactual reasoning. This article describes an application of three well-known statistical methods in the field of game-tree search: using a large number of classified Othello positions, feature weights for evaluation functions with a game-phase-independent meaning are estimated by means of logistic regression, Fisher's linear discriminant, and the quadratic discriminant function for normally distributed features. Thereafter, the playing strengths are compared by means of tournaments between the resulting versions of a world-class Othello program. In this application, logistic regression - which is used here for the first time in the context of game playing - leads to better results than the other approaches. We describe a machine learning method for predicting the value of a real-valued function, given the values of multiple input variables. The method induces solutions from samples in the form of ordered disjunctive normal form (DNF) decision rules. A central objective of the method and representation is the induction of compact, easily interpretable solutions. This rule-based decision model can be extended to search efficiently for similar cases prior to approximating function values. Experimental results on real-world data demonstrate that the new techniques are competitive with existing machine learning and statistical methods and can sometimes yield superior regression performance. The paper describes an extension of well-founded semantics for logic programs with two types of negation. In this extension information about preferences between rules can be expressed in the logical language and derived dynamically. This is achieved by using a reserved predicate symbol and a naming technique. Conflicts among rules are resolved whenever possible on the basis of derived preference information. The well-founded conclusions of prioritized logic programs can be computed in polynomial time. A legal reasoning example illustrates the usefulness of the approach. We introduce an algorithm for combinatorial search on quantum computers that is capable of significantly concentrating amplitude into solutions for some NP search problems, on average. This is done by exploiting the same aspects of problem structure as used by classical backtrack methods to avoid unproductive search choices. This quantum algorithm is much more likely to find solutions than the simple direct use of quantum parallelism. Furthermore, empirical evaluation on small problems shows this quantum algorithm displays the same phase transition behavior, and at the same location, as seen in many previously studied classical search methods. Specifically, difficult problem instances are concentrated near the abrupt change from underconstrained to overconstrained problems. We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition---the classification of handwritten digits. A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes. An MDL-inspired penalty is applied to such tests, eliminating some of them from consideration and altering the relative desirability of all tests. Empirical trials show that the modifications lead to smaller decision trees with higher predictive accuracies. Results also confirm that a new version of C4.5 incorporating these changes is superior to recent approaches that use global discretization and that construct small trees with multi-interval splits. For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance. Inductive theorem provers often diverge. This paper describes a simple critic, a computer program which monitors the construction of inductive proofs attempting to identify diverging proof attempts. Divergence is recognized by means of a ``difference matching'' procedure. The critic then proposes lemmas and generalizations which ``ripple'' these differences away so that the proof can go through without divergence. The critic enables the theorem prover Spike to prove many theorems completely automatically from the definitions alone. First-order learning involves finding a clause-form definition of a relation from examples of the relation and relevant background information. In this paper, a particular first-order learning system is modified to customize it for finding definitions of functional relations. This restriction leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy. Other first-order learning systems might benefit from similar specialization. We argue that the analysis of agent/environment interactions should be extended to include the conventions and invariants maintained by agents throughout their activity. We refer to this thicker notion of environment as a lifeworld and present a partial set of formal tools for describing structures of lifeworlds and the ways in which they computationally simplify activity. As one specific example, we apply the tools to the analysis of the Toast system and show how versions of the system with very different control structures in fact implement a common control structure together with different conventions for encoding task state in the positions or states of objects in the environment. We investigate the computational properties of the spatial algebra RCC-5 which is a restricted version of the RCC framework for spatial reasoning. The satisfiability problem for RCC-5 is known to be NP-complete but not much is known about its approximately four billion subclasses. We provide a complete classification of satisfiability for all these subclasses into polynomial and NP-complete respectively. In the process, we identify all maximal tractable subalgebras which are four in total. This paper combines two important directions of research in temporal resoning: that of finding maximal tractable subclasses of Allen's interval algebra, and that of reasoning with metric temporal information. Eight new maximal tractable subclasses of Allen's interval algebra are presented, some of them subsuming previously reported tractable algebras. The algebras allow for metric temporal constraints on interval starting or ending points, using the recent framework of Horn DLRs. Two of the algebras can express the notion of sequentiality between intervals, being the first such algebras admitting both qualitative and metric time. Starting with a likelihood or preference order on worlds, we extend it to a likelihood ordering on sets of worlds in a natural way, and examine the resulting logic. Lewis earlier considered such a notion of relative likelihood in the context of studying counterfactuals, but he assumed a total preference order on worlds. Complications arise when examining partial orders that are not present for total orders. There are subtleties involving the exact approach to lifting the order on worlds to an order on sets of worlds. In addition, the axiomatization of the logic of relative likelihood in the case of partial orders gives insight into the connection between relative likelihood and default reasoning. Default logic can be regarded as a mechanism to represent families of belief sets of a reasoning agent. As such, it is inherently second-order. In this paper, we study the problem of representability of a family of theories as the set of extensions of a default theory. We give a complete solution to the representability by means of normal default theories. We obtain partial results on representability by arbitrary default theories. We construct examples of denumerable families of non-including theories that are not representable. We also study the concept of equivalence between default theories. Mixed metaphors have been neglected in recent metaphor research. This paper suggests that such neglect is short-sighted. Though mixing is a more complex phenomenon than straight metaphors, the same kinds of reasoning and knowledge structures are required. This paper provides an analysis of both parallel and serial mixed metaphors within the framework of an AI system which is already capable of reasoning about straight metaphorical manifestations and argues that the processes underlying mixing are central to metaphorical meaning. Therefore, any theory of metaphors must be able to account for mixing. This paper proposes two kinds of fuzzy abductive inference in the framework of fuzzy rule base. The abductive inference processes described here depend on the semantic of the rule. We distinguish two classes of interpretation of a fuzzy rule, certainty generation rules and possible generation rules. In this paper we present the architecture of abductive inference in the first class of interpretation. We give two kinds of problem that we can resolve by using the proposed models of inference. In this paper we propose a new type of random CSP model, called Model RB, which is a revision to the standard Model B. It is proved that phase transitions from a region where almost all problems are satisfiable to a region where almost all problems are unsatisfiable do exist for Model RB as the number of variables approaches infinity. Moreover, the critical values at which the phase transitions occur are also known exactly. By relating the hardness of Model RB to Model B, it is shown that there exist a lot of hard instances in Model RB. We show that Gabbay's nonmonotonic consequence relations can be reduced to a new family of relations, called entrenchment relations. Entrenchment relations provide a direct generalization of epistemic entrenchment and expectation ordering introduced by Gardenfors and Makinson for the study of belief revision and expectation inference, respectively. We present a tableau calculus for reasoning in fragments of natural language. We focus on the problem of pronoun resolution and the way in which it complicates automated theorem proving for natural language processing. A method for explicitly manipulating contextual information during deduction is proposed, where pronouns are resolved against this context during deduction. As a result, pronoun resolution and deduction can be interleaved in such a way that pronouns are only resolved if this is licensed by a deduction rule; this helps us to avoid the combinatorial complexity of total pronoun disambiguation. Preferences among acts are analyzed in the style of L. Savage, but as partially ordered. The rationality postulates considered are weaker than Savage's on three counts. The Sure Thing Principle is derived in this setting. The postulates are shown to lead to a characterization of generalized qualitative probability that includes and blends both traditional qualitative probability and the ranked structures used in logical approaches. Dynamic Bayesian networks (DBNs) offer an elegant way to integrate various aspects of language in one model. Many existing algorithms developed for learning and inference in DBNs are applicable to probabilistic language modeling. To demonstrate the potential of DBNs for natural language processing, we employ a DBN in an information extraction task. We show how to assemble wealth of emerging linguistic instruments for shallow parsing, syntactic and semantic tagging, morphological decomposition, named entity recognition etc. in order to incrementally build a robust information extraction system. Our method outperforms previously published results on an established benchmark domain. This article presents an overview of the idea that "information compression by multiple alignment, unification and search" (ICMAUS) may serve as a unifying principle in computing (including mathematics and logic) and in such aspects of human cognition as the analysis and production of natural language, fuzzy pattern recognition and best-match information retrieval, concept hierarchies with inheritance of attributes, probabilistic reasoning, and unsupervised inductive learning. The ICMAUS concepts are described together with an outline of the SP61 software model in which the ICMAUS concepts are currently realised. A range of examples is presented, illustrated with output from the SP61 model. We apply the optimization algorithm Adaptive Simulated Annealing (ASA) to the problem of analyzing data on a large population and selecting the best model to predict that an individual with various traits will have a particular disease. We compare ASA with traditional forward and backward regression on computer simulated data. We find that the traditional methods of modeling are better for smaller data sets whereas a numerically stable ASA seems to perform better on larger and more complicated data sets. Recent advances in programming languages study and design have established a standard way of grounding computational systems representation in category theory. These formal results led to a better understanding of issues of control and side-effects in functional and imperative languages. This framework can be successfully applied to the investigation of the performance of Artificial Intelligence (AI) inference and cognitive systems. In this paper, we delineate a categorical formalisation of memory as a control structure driving performance in inference systems. Abstracting away control mechanisms from three widely used representations of memory in cognitive systems (scripts, production rules and clusters) we explain how categorical triples capture the interaction between learning and problem-solving. We consider how to make probability forecasts of binary labels. Our main mathematical result is that for any continuous gambling strategy used for detecting disagreement between the forecasts and the actual labels, there exists a forecasting strategy whose forecasts are ideal as far as this gambling strategy is concerned. A forecasting strategy obtained in this way from a gambling strategy demonstrating a strong law of large numbers is simplified and studied empirically. We study an alternative to the prevailing approach to modelling qualitative spatial reasoning (QSR) problems as constraint satisfaction problems. In the standard approach, a relation between objects is a constraint whereas in the alternative approach it is a variable. The relation-variable approach greatly simplifies integration and implementation of QSR. To substantiate this point, we discuss several QSR algorithms from the literature which in the relation-variable approach reduce to the customary constraint propagation algorithm enforcing generalised arc-consistency. In case-based reasoning, the adaptation step depends in general on domain-dependent knowledge, which motivates studies on adaptation knowledge acquisition (AKA). CABAMAKA is an AKA system based on principles of knowledge discovery from databases. This system explores the variations within the case base to elicit adaptation knowledge. It has been successfully tested in an application of case-based decision support to breast cancer treatment. Exact parsing with finite state automata is deemed inappropriate because of the unbounded non-locality languages overwhelmingly exhibit. We propose a way to structure the parsing task in order to make it amenable to local classification methods. This allows us to build a Dynamic Bayesian Network which uncovers the syntactic dependency structure of English sentences. Experiments with the Wall Street Journal demonstrate that the model successfully learns from labeled data. We try to design a quantum neural network with qubits instead of classical neurons with deterministic states, and also with quantum operators replacing teh classical action potentials. With our choice of gates interconnecting teh neural lattice, it appears that the state of the system behaves in ways reflecting both the strengths of coupling between neurons as well as initial conditions. We find that depending whether there is a threshold for emission from excited to ground state, the system shows either aperiodic oscillations or coherent ones with periodicity depending on the strength of coupling. In this paper we present the creation of an Arabic version of Automated Speech Recognition System (ASR). This system is based on the open source Sphinx-4, from the Carnegie Mellon University. Which is a speech recognition system based on discrete hidden Markov models (HMMs). We investigate the changes that must be made to the model to adapt Arabic voice recognition. Keywords: Speech recognition, Acoustic model, Arabic language, HMMs, CMUSphinx-4, Artificial intelligence. This paper deals with sensors which compute and report linguistic assessments of their values.Such sensors, called symbolic sensors are a natural extension of smart ones when working with control systems which use artificial intelligence based technics. After having recalled the smart sensor concepts, this paper introduces the symbolic sensor ones. Links between the physical world and the symbolic one are described. It is then shown how Zadeh's approximate reasoning theory provides a smart way to implement symbolic sensors. Finally, since the symbolic sensor is a general component, a functional adaptation to the measurement context is proposed. Militarised conflict is one of the risks that have a significant impact on society. Militarised Interstate Dispute (MID) is defined as an outcome of interstate interactions, which result on either peace or conflict. Effective prediction of the possibility of conflict between states is an important decision support tool for policy makers. In a previous research, neural networks (NNs) have been implemented to predict the MID. Support Vector Machines (SVMs) have proven to be very good prediction techniques and are introduced for the prediction of MIDs in this study and compared to neural networks. The results show that SVMs predict MID better than NNs while NNs give more consistent and easy to interpret sensitivity analysis than SVMs. In the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods. Feature Markov Decision Processes (PhiMDPs) are well-suited for learning agents in general environments. Nevertheless, unstructured (Phi)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the "best" DBN representation. I discuss all building blocks required for a complete general learning algorithm. Symmetry is an important factor in solving many constraint satisfaction problems. One common type of symmetry is when we have symmetric values. In a recent series of papers, we have studied methods to break value symmetries. Our results identify computational limits on eliminating value symmetry. For instance, we prove that pruning all symmetric values is NP-hard in general. Nevertheless, experiments show that much value symmetry can be broken in practice. These results may be useful to researchers in planning, scheduling and other areas as value symmetry occurs in many different domains. We present a comprehensive study of the use of value precedence constraints to break value symmetry. We first give a simple encoding of value precedence into ternary constraints that is both efficient and effective at breaking symmetry. We then extend value precedence to deal with a number of generalizations like wreath value and partial interchangeability. We also show that value precedence is closely related to lexicographical ordering. Finally, we consider the interaction between value precedence and symmetry breaking constraints for variable symmetries. To model combinatorial decision problems involving uncertainty and probability, we introduce stochastic constraint programming. Stochastic constraint programs contain both decision variables (which we can set) and stochastic variables (which follow a probability distribution). They combine together the best features of traditional constraint satisfaction, stochastic integer programming, and stochastic satisfiability. We give a semantics for stochastic constraint programs, and propose a number of complete algorithms and approximation procedures. Finally, we discuss a number of extensions of stochastic constraint programming to relax various assumptions like the independence between stochastic variables, and compare with other approaches for decision making under uncertainty. We show that some common and important global constraints like ALL-DIFFERENT and GCC can be decomposed into simple arithmetic constraints on which we achieve bound or range consistency, and in some cases even greater pruning. These decompositions can be easily added to new solvers. They also provide other constraints with access to the state of the propagator by sharing of variables. Such sharing can be used to improve propagation between constraints. We report experiments with our decomposition in a pseudo-Boolean solver. Many real life optimization problems contain both hard and soft constraints, as well as qualitative conditional preferences. However, there is no single formalism to specify all three kinds of information. We therefore propose a framework, based on both CP-nets and soft constraints, that handles both hard and soft constraints as well as conditional preferences efficiently and uniformly. We study the complexity of testing the consistency of preference statements, and show how soft constraints can faithfully approximate the semantics of conditional preference statements whilst improving the computational complexity We identify a new and important global (or non-binary) constraint. This constraint ensures that the values taken by two vectors of variables, when viewed as multisets, are ordered. This constraint is useful for a number of different applications including breaking symmetry and fuzzy constraint satisfaction. We propose and implement an efficient linear time algorithm for enforcing generalised arc consistency on such a multiset ordering constraint. Experimental results on several problem domains show considerable promise. I propose that pattern recognition, memorization and processing are key concepts that can be a principle set for the theoretical modeling of the mind function. Most of the questions about the mind functioning can be answered by a descriptive modeling and definitions from these principles. An understandable consciousness definition can be drawn based on the assumption that a pattern recognition system can recognize its own patterns of activity. The principles, descriptive modeling and definitions can be a basis for theoretical and applied research on cognitive sciences, particularly at artificial intelligence studies. When configuring customizable software, it is useful to provide interactive tool-support that ensures that the configuration does not breach given constraints. But, when is a configuration complete and how can the tool help the user to complete it? We formalize this problem and relate it to concepts from non-monotonic reasoning well researched in Artificial Intelligence. The results are interesting for both practitioners and theoreticians. Practitioners will find a technique facilitating an interactive configuration process and experiments supporting feasibility of the approach. Theoreticians will find links between well-known formal concepts and a concrete practical application. Aim of this paper is to address the problem of learning Boolean functions from training data with missing values. We present an extension of the BRAIN algorithm, called U-BRAIN (Uncertainty-managing Batch Relevance-based Artificial INtelligence), conceived for learning DNF Boolean formulas from partial truth tables, possibly with uncertain values or missing bits. Such an algorithm is obtained from BRAIN by introducing fuzzy sets in order to manage uncertainty. In the case where no missing bits are present, the algorithm reduces to the original BRAIN. We define the concept of an internal symmetry. This is a symmety within a solution of a constraint satisfaction problem. We compare this to solution symmetry, which is a mapping between different solutions of the same problem. We argue that we may be able to exploit both types of symmetry when finding solutions. We illustrate the potential of exploiting internal symmetries on two benchmark domains: Van der Waerden numbers and graceful graphs. By identifying internal symmetries we are able to extend the state of the art in both cases. We present in this paper an evolution of a tool from a user interface for a concrete Computer Algebra system for Algebraic Topology (the Kenzo system), to a front-end allowing the interoperability among different sources for computation and deduction. The architecture allows the system not only to interface several systems, but also to make them cooperate in shared calculations. This paper deals with chain graphs under the classic Lauritzen-Wermuth-Frydenberg interpretation. We prove that the regular Gaussian distributions that factorize with respect to a chain graph $G$ with $d$ parameters have positive Lebesgue measure with respect to $\mathbb{R}^d$, whereas those that factorize with respect to $G$ but are not faithful to it have zero Lebesgue measure with respect to $\mathbb{R}^d$. This means that, in the measure-theoretic sense described, almost all the regular Gaussian distributions that factorize with respect to $G$ are faithful to it. In this short paper I briefly discuss 3D war Game based on artificial intelligence concepts called AI WAR. Going in to the details, I present the importance of CAICL language and how this language is used in AI WAR. Moreover I also present a designed and implemented 3D War Cybug for AI WAR using CAICL and discus the implemented strategy to defeat its enemies during the game life. In this paper I will show how to use Python programming with a computer interface such as Phoenix-M 1 to drive simple robots. In my quest towards Artificial Intelligence(AI) I am experimenting with a lot of different possibilities in Robotics. This one will try to mimic the working of a simple insect's nervous system using hard wiring and some minimal software usage. This is the precursor to my advanced robotics and AI integration where I plan to use a new paradigm of AI based on Machine Learning and Self Consciousness via Knowledge Feedback and Update Process. In this paper we present a new approach to solve the satisfiability problem (SAT), based on boolean networks (BN). We define a mapping between a SAT instance and a BN, and we solve SAT problem by simulating the BN dynamics. We prove that BN fixed points correspond to the SAT solutions. The mapping presented allows to develop a new class of algorithms to solve SAT. Moreover, this new approach suggests new ways to combine symbolic and connectionist computation and provides a general framework for local search algorithms. We propose a long-term memory design for artificial general intelligence based on Solomonoff's incremental machine learning methods. We use R5RS Scheme and its standard library with a few omissions as the reference machine. We introduce a Levin Search variant based on Stochastic Context Free Grammar together with four synergistic update algorithms that use the same grammar as a guiding probability distribution of programs. The update algorithms include adjusting production probabilities, re-using previous solutions, learning programming idioms and discovery of frequent subprograms. Experiments with two training sequences demonstrate that our approach to incremental learning is effective. There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications. We investigate the problem of reasoning in the propositional fragment of MBNF, the logic of minimal belief and negation as failure introduced by Lifschitz, which can be considered as a unifying framework for several nonmonotonic formalisms, including default logic, autoepistemic logic, circumscription, epistemic queries, and logic programming. We characterize the complexity and provide algorithms for reasoning in propositional MBNF. In particular, we show that entailment in propositional MBNF lies at the third level of the polynomial hierarchy, hence it is harder than reasoning in all the above mentioned propositional formalisms for nonmonotonic reasoning. We also prove the exact correspondence between negation as failure in MBNF and negative introspection in Moore's autoepistemic logic. This paper reviews the connections between Graphplan's planning-graph and the dynamic constraint satisfaction problem and motivates the need for adapting CSP search techniques to the Graphplan algorithm. It then describes how explanation based learning, dependency directed backtracking, dynamic variable ordering, forward checking, sticky values and random-restart search strategies can be adapted to Graphplan. Empirical results are provided to demonstrate that these augmentations improve Graphplan's performance significantly (up to 1000x speedups) on several benchmark problems. Special attention is paid to the explanation-based learning and dependency directed backtracking techniques as they are empirically found to be most useful in improving the performance of Graphplan. Pearl and Dechter (1996) claimed that the d-separation criterion for conditional independence in acyclic causal networks also applies to networks of discrete variables that have feedback cycles, provided that the variables of the system are uniquely determined by the random disturbances. I show by example that this is not true in general. Some condition stronger than uniqueness is needed, such as the existence of a causal dynamics guaranteed to lead to the unique solution. We show that for several variations of partially observable Markov decision processes, polynomial-time algorithms for finding control policies are unlikely to or simply don't have guarantees of finding policies within a constant factor or a constant summand of optimal. Here "unlikely" means "unless some complexity classes collapse," where the collapses considered are P=NP, P=PSPACE, or P=EXP. Until or unless these collapses are shown to hold, any control-policy designer must choose between such performance guarantees and efficient computation. The chief aim of this paper is to propose mean-field approximations for a broad class of Belief networks, of which sigmoid and noisy-or networks can be seen as special cases. The approximations are based on a powerful mean-field theory suggested by Plefka. We show that Saul, Jaakkola and Jordan' s approach is the first order approximation in Plefka's approach, via a variational derivation. The application of Plefka's theory to belief networks is not computationally tractable. To tackle this problem we propose new approximations based on Taylor series. Small scale experiments show that the proposed schemes are attractive. Domain-independent planning is a hard combinatorial problem. Taking into account plan quality makes the task even more difficult. This article introduces Planning by Rewriting (PbR), a new paradigm for efficient high-quality domain-independent planning. PbR exploits declarative plan-rewriting rules and efficient local search techniques to transform an easy-to-generate, but possibly suboptimal, initial plan into a high-quality plan. In addition to addressing the issues of planning efficiency and plan quality, this framework offers a new anytime planning algorithm. We have implemented this planner and applied it to several existing domains. The experimental results show that the PbR approach provides significant savings in planning effort while generating high-quality plans. Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems. Designing the dialogue policy of a spoken dialogue system involves many nontrivial choices. This paper presents a reinforcement learning approach for automatically optimizing a dialogue policy, which addresses the technical challenges in applying reinforcement learning to a working dialogue system with human users. We report on the design, construction and empirical evaluation of NJFun, an experimental spoken dialogue system that provides users with access to information about fun things to do in New Jersey. Our results show that by optimizing its performance via reinforcement learning, NJFun measurably improves system performance. The First Trading Agent Competition (TAC) was held from June 22nd to July 8th, 2000. TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. This article describes ATTac-2000, the first-place finisher in TAC. ATTac-2000 uses a principled bidding strategy that includes several elements of adaptivity. In addition to the success at the competition, isolated empirical results are presented indicating the robustness and effectiveness of ATTac-2000's adaptive strategy. The theoretical properties of qualitative spatial reasoning in the RCC8 framework have been analyzed extensively. However, no empirical investigation has been made yet. Our experiments show that the adaption of the algorithms used for qualitative temporal reasoning can solve large RCC8 instances, even if they are in the phase transition region -- provided that one uses the maximal tractable subsets of RCC8 that have been identified by us. In particular, we demonstrate that the orthogonal combination of heuristic methods is successful in solving almost all apparently hard instances in the phase transition region up to a certain size in reasonable time. Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems. This paper presents an approach to expert-guided subgroup discovery. The main step of the subgroup discovery process, the induction of subgroup descriptions, is performed by a heuristic beam search algorithm, using a novel parametrized definition of rule quality which is analyzed in detail. The other important steps of the proposed subgroup discovery process are the detection of statistically significant properties of selected subgroups and subgroup visualization: statistically significant properties are used to enrich the descriptions of induced subgroups, while the visualization shows subgroup properties in the form of distributions of the numbers of examples in the subgroups. The approach is illustrated by the results obtained for a medical problem of early detection of patient risk groups. Hierarchical task decomposition is a method used in many agent systems to organize agent knowledge. This work shows how the combination of a hierarchy and persistent assertions of knowledge can lead to difficulty in maintaining logical consistency in asserted knowledge. We explore the problematic consequences of persistent assumptions in the reasoning process and introduce novel potential solutions. Having implemented one of the possible solutions, Dynamic Hierarchical Justification, its effectiveness is demonstrated with an empirical analysis. In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms. The automatic generation of decision trees based on off-line reasoning on models of a domain is a reasonable compromise between the advantages of using a model-based approach in technical domains and the constraints imposed by embedded applications. In this paper we extend the approach to deal with temporal information. We introduce a notion of temporal decision tree, which is designed to make use of relevant information as long as it is acquired, and we present an algorithm for compiling such trees from a model-based reasoning system. This paper presents a new classifier combination technique based on the Dempster-Shafer theory of evidence. The Dempster-Shafer theory of evidence is a powerful method for combining measures of evidence from different classifiers. However, since each of the available methods that estimates the evidence of classifiers has its own limitations, we propose here a new implementation which adapts to training data so that the overall mean square error is minimized. The proposed technique is shown to outperform most available classifier combination methods when tested on three different classification problems. A novel algorithm for actively trading stocks is presented. While traditional expert advice and "universal" algorithms (as well as standard technical trading heuristics) attempt to predict winners or trends, our approach relies on predictable statistical relations between all pairs of stocks in the market. Our empirical results on historical markets provide strong evidence that this type of technical trading can "beat the market" and moreover, can beat the best stock in the market. In doing so we utilize a new idea for smoothing critical parameters in the context of expert learning. Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs. This study proposes a framework of Uncertainty-based Group Decision Support System (UGDSS). It provides a platform for multiple criteria decision analysis in six aspects including (1) decision environment, (2) decision problem, (3) decision group, (4) decision conflict, (5) decision schemes and (6) group negotiation. Based on multiple artificial intelligent technologies, this framework provides reliable support for the comprehensive manipulation of applications and advanced decision approaches through the design of an integrated multi-agents architecture. We represent agents as sets of strings. Each string encodes a potential interaction with another agent or environment. We represent the total set of dynamics between two agents as the intersection of their respective strings, we prove complexity properties of player interactions using Algorithmic Information Theory. We show how the proposed construction is compatible with Universal Artificial Intelligence, in that the AIXI model can be seen as universal with respect to interaction. We present alpha-expansion beta-shrink moves, a simple generalization of the widely-used alpha-beta swap and alpha-expansion algorithms for approximate energy minimization. We show that in a certain sense, these moves dominate both alpha-beta-swap and alpha-expansion moves, but unlike previous generalizations the new moves require no additional assumptions and are still solvable in polynomial-time. We show promising experimental results with the new moves, which we believe could be used in any context where alpha-expansions are currently employed. Variable elimination is a general technique for constraint processing. It is often discarded because of its high space complexity. However, it can be extremely useful when combined with other techniques. In this paper we study the applicability of variable elimination to the challenging problem of finding still-lifes. We illustrate several alternatives: variable elimination as a stand-alone algorithm, interleaved with search, and as a source of good quality lower bounds. We show that these techniques are the best known option both theoretically and empirically. In our experiments we have been able to solve the n=20 instance, which is far beyond reach with alternative approaches. We present a uniform non-monotonic solution to the problems of reasoning about action on the basis of an argumentation-theoretic approach. Our theory is provably correct relative to a sensible minimisation policy introduced on top of a temporal propositional logic. Sophisticated problem domains can be formalised in our framework. As much attention of researchers in the field has been paid to the traditional and basic problems in reasoning about actions such as the frame, the qualification and the ramification problems, approaches to these problems within our formalisation lie at heart of the expositions presented in this paper. Logical hidden Markov models (LOHMMs) upgrade traditional hidden Markov models to deal with sequences of structured symbols in the form of logical atoms, rather than flat characters. This note formally introduces LOHMMs and presents solutions to the three central inference problems for LOHMMs: evaluation, most likely hidden state sequence and parameter estimation. The resulting representation and algorithms are experimentally evaluated on problems from the domain of bioinformatics. We describe the version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (IPC-4). This version, called mGPT, solves Markov Decision Processes specified in the PPDDL language by extracting and using different classes of lower bounds along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations where the alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state and the greedy policy. The Optiplan planning system is the first integer programming-based planner that successfully participated in the international planning competition. This engineering note describes the architecture of Optiplan and provides the integer programming formulation that enabled it to perform reasonably well in the competition. We also touch upon some recent developments that make integer programming encodings significantly more competitive. PDDL2.1 was designed to push the envelope of what planning algorithms can do, and it has succeeded. It adds two important features: durative actions,which take time (and may have continuous effects); and objective functions for measuring the quality of plans. The concept of durative actions is flawed; and the treatment of their semantics reveals too strong an attachment to the way many contemporary planners work. Future PDDL innovators should focus on producing a clean semantics for additions to the language, and let planner implementers worry about coupling their algorithms to problems expressed in the latest version of the language. The addition of durative actions to PDDL2.1 sparked some controversy. Fox and Long argued that actions should be considered as instantaneous, but can start and stop processes. Ultimately, a limited notion of durative actions was incorporated into the language. I argue that this notion is still impoverished, and that the underlying philosophical position of regarding durative actions as being a shorthand for a start action, process, and stop action ignores the realities of modelling and execution for complex systems. We present a novel framework for integrating prior knowledge into discriminative classifiers. Our framework allows discriminative classifiers such as Support Vector Machines (SVMs) to utilize prior knowledge specified in the generative setting. The dual objective of fitting the data and respecting prior knowledge is formulated as a bilevel program, which is solved (approximately) via iterative application of second-order cone programming. To test our approach, we consider the problem of using WordNet (a semantic database of English language) to improve low-sample classification accuracy of newsgroup categorization. WordNet is viewed as an approximate, but readily available source of background knowledge, and our framework is capable of utilizing it in a flexible way. We present a new algorithm for probabilistic planning with no observability. Our algorithm, called Probabilistic-FF, extends the heuristic forward-search machinery of Conformant-FF to problems with probabilistic uncertainty about both the initial state and action effects. Specifically, Probabilistic-FF combines Conformant-FFs techniques with a powerful machinery for weighted model counting in (weighted) CNFs, serving to elegantly define both the search space and the heuristic function. Our evaluation of Probabilistic-FF shows its fine scalability in a range of probabilistic domains, constituting a several orders of magnitude improvement over previous results in this area. We use a problematic case to point out the main open issue to be addressed by further research. We present three new complexity results for classes of planning problems with simple causal graphs. First, we describe a polynomial-time algorithm that uses macros to generate plans for the class 3S of planning problems with binary state variables and acyclic causal graphs. This implies that plan generation may be tractable even when a planning problem has an exponentially long minimal solution. We also prove that the problem of plan existence for planning problems with multi-valued variables and chain causal graphs is NP-hard. Finally, we show that plan existence for planning problems with binary state variables and polytree causal graphs is NP-complete. The task of keyhole (unobtrusive) plan recognition is central to adaptive game AI. "Tech trees" or "build trees" are the core of real-time strategy (RTS) game strategic (long term) planning. This paper presents a generic and simple Bayesian model for RTS build tree prediction from noisy observations, which parameters are learned from replays (game logs). This unsupervised machine learning approach involves minimal work for the game developers as it leverage players' data (com- mon in RTS). We applied it to StarCraft1 and showed that it yields high quality and robust predictions, that can feed an adaptive AI. We analyse the philosopher Davidson's semantics of actions, using a strongly typed logic with contexts given by sets of partial equations between the outcomes of actions. This provides a perspicuous and elegant treatment of reasoning about action, analogous to Reiter's work on artificial intelligence. We define a sequent calculus for this logic, prove cut elimination, and give a semantics based on fibrations over partial cartesian categories: we give a structure theory for such fibrations. The existence of lax comma objects is necessary for the proof of cut elimination, and we give conditions on the domain fibration of a partial cartesian category for such comma objects to exist. Performing sensitivity analysis for influence diagrams using the decision circuit framework is particularly convenient, since the partial derivatives with respect to every parameter are readily available [Bhattacharjya and Shachter, 2007; 2008]. In this paper we present three non-linear sensitivity analysis methods that utilize this partial derivative information and therefore do not require re-evaluating the decision situation multiple times. Specifically, we show how to efficiently compare strategies in decision situations, perform sensitivity to risk aversion and compute the value of perfect hedging [Seyller, 2008]. We propose a new point-based method for approximate planning in Dec-POMDP which outperforms the state-of-the-art approaches in terms of solution quality. It uses a heuristic estimation of the prior probability of beliefs to choose a bounded number of policy trees: this choice is formulated as a combinatorial optimisation problem minimising the error induced by pruning. A major limitation of exact inference algorithms for probabilistic graphical models is their extensive memory usage, which often puts real-world problems out of their reach. In this paper we show how we can extend inference algorithms, particularly Bucket Elimination, a special case of cluster (join) tree decomposition, to utilize disk memory. We provide the underlying ideas and show promising empirical results of exactly solving large problems not solvable before. We describe a framework and an algorithm for solving hybrid influence diagrams with discrete, continuous, and deterministic chance variables, and discrete and continuous decision variables. A continuous chance variable in an influence diagram is said to be deterministic if its conditional distributions have zero variances. The solution algorithm is an extension of Shenoy's fusion algorithm for discrete influence diagrams. We describe an extended Shenoy-Shafer architecture for propagation of discrete, continuous, and utility potentials in hybrid influence diagrams that include deterministic chance variables. The algorithm and framework are illustrated by solving two small examples. The paper introduces k-bounded MAP inference, a parameterization of MAP inference in Markov logic networks. k-Bounded MAP states are MAP states with at most k active ground atoms of hidden (non-evidence) predicates. We present a novel delayed column generation algorithm and provide empirical evidence that the algorithm efficiently computes k-bounded MAP states for meaningful real-world graph matching problems. The underlying idea is that, instead of solving one large optimization problem, it is often more efficient to tackle several small ones. Rollating walkers are popular mobility aids used by older adults to improve balance control. There is a need to automatically recognize the activities performed by walker users to better understand activity patterns, mobility issues and the context in which falls are more likely to happen. We design and compare several techniques to recognize walker related activities. A comprehensive evaluation with control subjects and walker users from a retirement community is presented. The paper provides a simple test for deciding, from a given causal diagram, whether two sets of variables have the same bias-reducing potential under adjustment. The test requires that one of the following two conditions holds: either (1) both sets are admissible (i.e., satisfy the back-door criterion) or (2) the Markov boundaries surrounding the manipulated variable(s) are identical in both sets. Applications to covariate selection and model testing are discussed. The standard coherence criterion for lower previsions is expressed using an infinite number of linear constraints. For lower previsions that are essentially defined on some finite set of gambles on a finite possibility space, we present a reformulation of this criterion that only uses a finite number of constraints. Any such lower prevision is coherent if it lies within the convex polytope defined by these constraints. The vertices of this polytope are the extreme coherent lower previsions for the given set of gambles. Our reformulation makes it possible to compute them. We show how this is done and illustrate the procedure and its results. We present a probabilistic model of events in continuous time in which each event triggers a Poisson process of successor events. The ensemble of observed events is thereby modeled as a superposition of Poisson processes. Efficient inference is feasible under this model with an EM algorithm. Moreover, the EM algorithm can be implemented as a distributed algorithm, permitting the model to be applied to very large datasets. We apply these techniques to the modeling of Twitter messages and the revision history of Wikipedia. We study the problem of learning Bayesian network structures from data. We develop an algorithm for finding the k-best Bayesian network structures. We propose to compute the posterior probabilities of hypotheses of interest by Bayesian model averaging over the k-best Bayesian networks. We present empirical results on structural discovery over several real and synthetic data sets and show that the method outperforms the model selection method and the state of-the-art MCMC methods. For product rating environments, similar to that of Amazon Reviews, it has been shown that the truthful elicitation of feedback is possible through mechanisms which pay buyer reports contingent on the reports of other buyers. We study whether similar mechanisms can be designed for reputation mechanisms at online auction sites where the buyers' experiences are partially determined by a strategic seller. We show that this is impossible for the basic setting. However, introducing a small prior belief that the seller is a cooperative commitment player leads to a payment scheme with a truthful perfect Bayesian equilibrium. A branch-and-bound approach to solving influ- ence diagrams has been previously proposed in the literature, but appears to have never been implemented and evaluated - apparently due to the difficulties of computing effective bounds for the branch-and-bound search. In this paper, we describe how to efficiently compute effective bounds, and we develop a practical implementa- tion of depth-first branch-and-bound search for influence diagram evaluation that outperforms existing methods for solving influence diagrams with multiple stages. This work proposes to put up a tool for diagnosing multi faults based on model using techniques of detection and localization inspired from the community of artificial intelligence and that of automatic. The diagnostic procedure to be integrated into the supervisory system must therefore be provided with explanatory features. Techniques based on causal reasoning are a pertinent approach for this purpose. Bond graph modeling is used to describe the cause effect relationship between process variables. Experimental results are presented and discussed in order to compare performance of causal graph technique and classic methods inspired from artificial intelligence (DX) and control theory (FDI). The development of expert system for treatment of Diabetes disease by using natural methods is new information technology derived from Artificial Intelligent research using ESTA (Expert System Text Animation) System. The proposed expert system contains knowledge about various methods of natural treatment methods (Massage, Herbal/Proper Nutrition, Acupuncture, Gems) for Diabetes diseases of Human Beings. The system is developed in the ESTA (Expert System shell for Text Animation) which is Visual Prolog 7.3 Application. The knowledge for the said system will be acquired from domain experts, texts and other related sources. First-order probabilistic models combine representational power of first-order logic with graphical models. There is an ongoing effort to design lifted inference algorithms for first-order probabilistic models. We analyze lifted inference from the perspective of constraint processing and, through this viewpoint, we analyze and compare existing approaches and expose their advantages and limitations. Our theoretical results show that the wrong choice of constraint processing method can lead to exponential increase in computational complexity. Our empirical tests confirm the importance of constraint processing in lifted inference. This is the first theoretical and empirical study of constraint processing in lifted inference. We study a subclass of POMDPs, called Deterministic POMDPs, that is characterized by deterministic actions and observations. These models do not provide the same generality of POMDPs yet they capture a number of interesting and challenging problems, and permit more efficient algorithms. Indeed, some of the recent work in planning is built around such assumptions mainly by the quest of amenable models more expressive than the classical deterministic models. We provide results about the fundamental properties of Deterministic POMDPs, their relation with AND/OR search problems and algorithms, and their computational complexity. The paper introduces AND/OR importance sampling for probabilistic graphical models. In contrast to importance sampling, AND/OR importance sampling caches samples in the AND/OR space and then extracts a new sample mean from the stored samples. We prove that AND/OR importance sampling may have lower variance than importance sampling; thereby providing a theoretical justification for preferring it over importance sampling. Our empirical evaluation demonstrates that AND/OR importance sampling is far more accurate than importance sampling in many cases. We present an algorithm that identifies the reasoning patterns of agents in a game, by iteratively examining the graph structure of its Multi-Agent Influence Diagram (MAID) representation. If the decision of an agent participates in no reasoning patterns, then we can effectively ignore that decision for the purpose of calculating a Nash equilibrium for the game. In some cases, this can lead to exponential time savings in the process of equilibrium calculation. Moreover, our algorithm can be used to enumerate the reasoning patterns in a game, which can be useful for constructing more effective computerized agents interacting with humans. It is well-known that inference in graphical models is hard in the worst case, but tractable for models with bounded treewidth. We ask whether treewidth is the only structural criterion of the underlying graph that enables tractable inference. In other words, is there some class of structures with unbounded treewidth in which inference is tractable? Subject to a combinatorial hypothesis due to Robertson et al. (1994), we show that low treewidth is indeed the only structural restriction that can ensure tractability. Thus, even for the "best case" graph structure, there is no inference algorithm with complexity polynomial in the treewidth. We consider conditions that allow us to find an optimal strategy for sequential decisions from a given data situation. For the case where all interventions are unconditional (atomic), identifiability has been discussed by Pearl & Robins (1995). We argue here that an optimal strategy must be conditional, i.e. take the information available at each decision point into account. We show that the identification of an optimal sequential decision strategy is more restrictive, in the sense that conditional interventions might not always be identified when atomic interventions are. We further demonstrate that a simple graphical criterion for the identifiability of an optimal strategy can be given. Approximate inference in dynamic systems is the problem of estimating the state of the system given a sequence of actions and partial observations. High precision estimation is fundamental in many applications like diagnosis, natural language processing, tracking, planning, and robotics. In this paper we present an algorithm that samples possible deterministic executions of a probabilistic sequence. The algorithm takes advantage of a compact representation (using first order logic) for actions and world states to improve the precision of its estimation. Theoretical and empirical results show that the algorithm's expected error is smaller than propositional sampling and Sequential Monte Carlo (SMC) sampling techniques. An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimatorwith lower variance. Bayesian networks can be used to extract explanations about the observed state of a subset of variables. In this paper, we explicate the desiderata of an explanation and confront them with the concept of explanation proposed by existing methods. The necessity of taking into account causal approaches when a causal graph is available is discussed. We then introduce causal explanation trees, based on the construction of explanation trees using the measure of causal information ow (Ay and Polani, 2006). This approach is compared to several other methods on known networks. Model-based Bayesian reinforcement learning has generated significant interest in the AI community as it provides an elegant solution to the optimal exploration-exploitation tradeoff in classical reinforcement learning. Unfortunately, the applicability of this type of approach has been limited to small domains due to the high complexity of reasoning about the joint posterior over model parameters. In this paper, we consider the use of factored representations combined with online planning techniques, to improve scalability of these methods. The main contribution of this paper is a Bayesian framework for learning the structure and parameters of a dynamical system, while also simultaneously planning a (near-)optimal sequence of actions. This paper develops a measure for bounding the performance of AND/OR search algorithms for solving a variety of queries over graphical models. We show how drawing a connection to the recent notion of hypertree decompositions allows to exploit determinism in the problem specification and produce tighter bounds. We demonstrate on a variety of practical problem instances that we are often able to improve upon existing bounds by several orders of magnitude. We present and evaluate new techniques for designing algorithm portfolios. In our view, the problem has both a scheduling aspect and a machine learning aspect. Prior work has largely addressed one of the two aspects in isolation. Building on recent work on the scheduling aspect of the problem, we present a technique that addresses both aspects simultaneously and has attractive theoretical guarantees. Experimentally, we show that this technique can be used to improve the performance of state-of-the-art algorithms for Boolean satisfiability, zero-one integer programming, and A.I. planning. In this paper we introduce Refractor Importance Sampling (RIS), an improvement to reduce error variance in Bayesian network importance sampling propagation under evidential reasoning. We prove the existence of a collection of importance functions that are close to the optimal importance function under evidential reasoning. Based on this theoretic result we derive the RIS algorithm. RIS approaches the optimal importance function by applying localized arc changes to minimize the divergence between the evidence-adjusted importance function and the optimal importance function. The validity and performance of RIS is empirically tested with a large setof synthetic Bayesian networks and two real-world networks. The paper introduces a generalization for known probabilistic models such as log-linear and graphical models, called here multiplicative models. These models, that express probabilities via product of parameters are shown to capture multiple forms of contextual independence between variables, including decision graphs and noisy-OR functions. An inference algorithm for multiplicative models is provided and its correctness is proved. The complexity analysis of the inference algorithm uses a more refined parameter than the tree-width of the underlying graph, and shows the computational cost does not exceed that of the variable elimination algorithm in graphical models. The paper ends with examples where using the new models and algorithm is computationally beneficial. We propose a new Monte Carlo algorithm for complex discrete distributions. The algorithm is motivated by the N-Fold Way, which is an ingenious event-driven MCMC sampler that avoids rejection moves at any specific state. The N-Fold Way can however get "trapped" in cycles. We surmount this problem by modifying the sampling process. This correction does introduce bias, but the bias is subsequently corrected with a carefully engineered importance sampler. We propose a definition of causality for time series in terms of the effect of an intervention in one component of a multivariate time series on another component at some later point in time. Conditions for identifiability, comparable to the back-door and front-door criteria, are presented and can also be verified graphically. Computation of the causal effect is derived and illustrated for the linear case. We describe the semantic foundations for elicitation of generalized additively independent (GAI) utilities using the minimax regret criterion, and propose several new query types and strategies for this purpose. Computational feasibility is obtained by exploiting the local GAI structure in the model. Our results provide a practical approach for implementing preference-based constrained configuration optimization as well as effective search in multiattribute product databases. We use the implicitization procedure to generate polynomial equality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network with hidden variables. We show how we may reduce the complexity of the implicitization problem and make the problem tractable in certain causal Bayesian networks. We also show some preliminary results on the algebraic structure of polynomial constraints. The results have applications in distinguishing between causal models and in testing causal models with combined observational and experimental data. The belief propagation (BP) algorithm is widely applied to perform approximate inference on arbitrary graphical models, in part due to its excellent empirical properties and performance. However, little is known theoretically about when this algorithm will perform well. Using recent analysis of convergence and stability properties in BP and new results on approximations in binary systems, we derive a bound on the error in BP's estimates for pairwise Markov random fields over discrete valued random variables. Our bound is relatively simple to compute, and compares favorably with a previous method of bounding the accuracy of BP. Counterfactual statements, e.g., "my headache would be gone had I taken an aspirin" are central to scientific discourse, and are formally interpreted as statements derived from "alternative worlds". However, since they invoke hypothetical states of affairs, often incompatible with what is actually known or observed, testing counterfactuals is fraught with conceptual and practical difficulties. In this paper, we provide a complete characterization of "testable counterfactuals," namely, counterfactual statements whose probabilities can be inferred from physical experiments. We provide complete procedures for discerning whether a given counterfactual is testable and, if so, expressing its probability in terms of experimental data. Memory-Bounded Dynamic Programming (MBDP) has proved extremely effective in solving decentralized POMDPs with large horizons. We generalize the algorithm and improve its scalability by reducing the complexity with respect to the number of observations from exponential to polynomial. We derive error bounds on solution quality with respect to this new approximation and analyze the convergence behavior. To evaluate the effectiveness of the improvements, we introduce a new, larger benchmark problem. Experimental results show that despite the high complexity of decentralized POMDPs, scalable solution techniques such as MBDP perform surprisingly well. In Bayesian networks, a Most Probable Explanation (MPE) is a complete variable instantiation with a highest probability given the current evidence. In this paper, we discuss the problem of finding robustness conditions of the MPE under single parameter changes. Specifically, we ask the question: How much change in a single network parameter can we afford to apply while keeping the MPE unchanged? We will describe a procedure, which is the first of its kind, that computes this answer for each parameter in the Bayesian network variable in time O(n exp(w)), where n is the number of network variables and w is its treewidth. We present a class of inequality constraints on the set of distributions induced by local interventions on variables governed by a causal Bayesian network, in which some of the variables remain unmeasured. We derive bounds on causal effects that are not directly measured in randomized experiments. We derive instrumental inequality type of constraints on nonexperimental distributions. The results have applications in testing causal models with observational or experimental data. This paper studies a new and more general axiomatization than one presented previously for preference on likelihood gambles. Likelihood gambles describe actions in a situation where a decision maker knows multiple probabilistic models and a random sample generated from one of those models but does not know prior probability of models. This new axiom system is inspired by Jensen's axiomatization of probabilistic gambles. Our approach provides a new perspective to the role of data in decision making under ambiguity. It avoids one of the most controversial issue of Bayesian methodology namely the assumption of prior probability. There exist several architectures to solve influence diagrams using local computations, such as the Shenoy-Shafer, the HUGIN, or the Lazy Propagation architectures. They all extend usual variable elimination algorithms thanks to the use of so-called 'potentials'. In this paper, we introduce a new architecture, called the Multi-operator Cluster DAG architecture, which can produce decompositions with an improved constrained induced-width, and therefore induce potentially exponential gains. Its principle is to benefit from the composite nature of influence diagrams, instead of using uniform potentials, in order to better analyze the problem structure. One approach to monitoring a dynamic system relies on decomposition of the system into weakly interacting subsystems. An earlier paper introduced a notion of weak interaction called separability, and showed that it leads to exact propagation of marginals for prediction. This paper addresses two questions left open by the earlier paper: can we define a notion of approximate separability that occurs naturally in practice, and do separability and approximate separability lead to accurate monitoring? The answer to both questions is afirmative. The paper also analyzes the structure of approximately separable decompositions, and provides some explanation as to why these models perform well. We propose a method to identify all the nodes that are relevant to compute all the conditional probability distributions for a given set of nodes. Our method is simple, effcient, consistent, and does not require learning a Bayesian network first. Therefore, our method can be applied to high-dimensional databases, e.g. gene expression databases. In recent years Bayesian networks (BNs) with a mixture of continuous and discrete variables have received an increasing level of attention. We present an architecture for exact belief update in Conditional Linear Gaussian BNs (CLG BNs). The architecture is an extension of lazy propagation using operations of Lauritzen & Jensen [6] and Cowell [2]. By decomposing clique and separator potentials into sets of factors, the proposed architecture takes advantage of independence and irrelevance properties induced by the structure of the graph and the evidence. The resulting benefits are illustrated by examples. Results of a preliminary empirical performance evaluation indicate a significant potential of the proposed architecture. We set up a model for reasoning about metric spaces with belief theoretic measures. The uncertainty in these spaces stems from both probability and metric. To represent both aspect of uncertainty, we choose an expected distance function as a measure of uncertainty. A formal logical system is constructed for the reasoning about expected distance. Soundness and completeness are shown for this logic. For reasoning on product metric space with uncertainty, a new metric is defined and shown to have good properties. This paper proposes new formulas for the probabilities of causation difined by Pearl (2000). Tian and Pearl (2000a, 2000b) showed how to bound the quantities of the probabilities of causation from experimental and observational data, under the minimal assumptions about the data-generating process. We derive narrower bounds than Tian-Pearl bounds by making use of the covariate information measured in experimental and observational studies. In addition, we provide identifiable case under no-prevention assumption and discuss the covariate selection problem from the viewpoint of estimation accuracy. These results are helpful in providing more evidence for public policy assessment and dicision making problems. We introduce priors and algorithms to perform Bayesian inference in Gaussian models defined by acyclic directed mixed graphs. Such a class of graphs, composed of directed and bi-directed edges, is a representation of conditional independencies that is closed under marginalization and arises naturally from causal models which allow for unmeasured confounding. Monte Carlo methods and a variational approximation for such models are presented. Our algorithms for Bayesian inference allow the evaluation of posterior distributions for several quantities of interest, including causal effects that are not identifiable from data alone but could otherwise be inferred where informative prior knowledge about confounding is available. The subject of this paper is the elucidation of effects of actions from causal assumptions represented as a directed graph, and statistical knowledge given as a probability distribution. In particular, we are interested in predicting conditional distributions resulting from performing an action on a set of variables and, subsequently, taking measurements of another set. We provide a necessary and sufficient graphical condition for the cases where such distributions can be uniquely computed from the available information, as well as an algorithm which performs this computation whenever the condition holds. Furthermore, we use our results to prove completeness of do-calculus [Pearl, 1995] for the same identification problem. The use of Artificial Intelligence is finding prominence not only in core computer areas, but also in cross disciplinary areas including medical diagnosis. In this paper, we present a rule based Expert System used in diagnosis of Cerebral Palsy. The expert system takes user input and depending on the symptoms of the patient, diagnoses if the patient is suffering from Cerebral Palsy. The Expert System also classifies the Cerebral Palsy as mild, moderate or severe based on the presented symptoms. While POMDPs provide a general platform for non-deterministic conditional planning under a variety of quality metrics they have limited scalability. On the other hand, non-deterministic conditional planners scale very well, but many lack the ability to optimize plan quality metrics. We present a novel generalization of planning graph based heuristics that helps conditional planners both scale and generate high quality plans when using actions with nonuniform costs. We make empirical comparisons with two state of the art planners to show the benefit of our techniques. We present research on developing models that forecast traffic flow and congestion in the Greater Seattle area. The research has led to the deployment of a service named JamBayes, that is being actively used by over 2,500 users via smartphones and desktop versions of the system. We review the modeling effort and describe experiments probing the predictive accuracy of the models. Finally, we present research on building models that can identify current and future surprises, via efforts on modeling and forecasting unexpected situations. This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates. Voting is a very general method of preference aggregation. A voting rule takes as input every voter's vote (typically, a ranking of the alternatives), and produces as output either just the winning alternative or a ranking of the alternatives. One potential view of voting is the following. There exists a 'correct' outcome (winner/ranking), and each voter's vote corresponds to a noisy perception of this correct outcome. If we are given the noise model, then for any vector of votes, we can We consider the problem of deleting edges from a Bayesian network for the purpose of simplifying models in probabilistic inference. In particular, we propose a new method for deleting network edges, which is based on the evidence at hand. We provide some interesting bounds on the KL-divergence between original and approximate networks, which highlight the impact of given evidence on the quality of approximation and shed some light on good and bad candidates for edge deletion. We finally demonstrate empirically the promise of the proposed edge deletion technique as a basis for approximate inference. We define the notion of compiling a Bayesian network with evidence and provide a specific approach for evidence-based compilation, which makes use of logical processing. The approach is practical and advantageous in a number of application areas-including maximum likelihood estimation, sensitivity analysis, and MAP computations-and we provide specific empirical results in the domain of genetic linkage analysis. We also show that the approach is applicable for networks that do not contain determinism, and show that it empirically subsumes the performance of the quickscore algorithm when applied to noisy-or networks. The local Markov condition for a DAG to be an independence map of a probability distribution is well known. For DAGs with latent variables, represented as bi-directed edges in the graph, the local Markov property may invoke exponential number of conditional independencies. This paper shows that the number of conditional independence relations required may be reduced if the probability distributions satisfy the composition axiom. In certain types of graphs, only linear number of conditional independencies are required. The result has applications in testing linear structural equation models with correlated errors. In this paper, we consider Hybrid Mixed Networks (HMN) which are Hybrid Bayesian Networks that allow discrete deterministic information to be modeled explicitly in the form of constraints. We present two approximate inference algorithms for HMNs that integrate and adjust well known algorithmic principles such as Generalized Belief Propagation, Rao-Blackwellised Importance Sampling and Constraint Propagation to address the complexity of modeling and reasoning in HMNs. We demonstrate the performance of our approximate inference algorithms on randomly generated HMNs. We present metrics for measuring state similarity in Markov decision processes (MDPs) with infinitely many states, including MDPs with continuous state spaces. Such metrics provide a stable quantitative analogue of the notion of bisimulation for MDPs, and are suitable for use in MDP approximation. We show that the optimal value function associated with a discounted infinite horizon planning task varies continuously with respect to our metric distances. Tackling the problem of ordinal preference revelation and reasoning, we propose a novel methodology for generating an ordinal utility function from a set of qualitative preference statements. To the best of our knowledge, our proposal constitutes the first nonparametric solution for this problem that is both efficient and semantically sound. Our initial experiments provide strong evidence for practical effectiveness of our approach. Consider the case where cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding linear structural equation model. This paper provides graphical identifiability criteria for total effects by using surrogate variables in the case where it is difficult to observe a treatment/response variable. The results enable us to judge from graph structure whether a total effect can be identified through the observation of surrogate variables. In this paper we compare search and inference in graphical models through the new framework of AND/OR search. Specifically, we compare Variable Elimination (VE) and memoryintensive AND/OR Search (AO) and place algorithms such as graph-based backjumping and no-good and good learning, as well as Recursive Conditioning [7] and Value Elimination [2] within the AND/OR search framework. Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also discuss recent improvements to our (point-based) heuristic search value iteration algorithm. Our new implementation calculates tighter initial bounds, avoids solving linear programs, and makes more effective use of sparsity. The aim of this paper is to propose a generalization of previous approaches in qualitative decision making. Our work is based on the binary possibilistic utility (PU), which is a possibilistic counterpart of Expected Utility (EU).We first provide a new axiomatization of PU and study its relation with the lexicographic aggregation of pessimistic and optimistic utilities. Then we explain the reasons of the coarseness of qualitative decision criteria. Finally, thanks to a redefinition of possibilistic lotteries and mixtures, we present the refined binary possibilistic utility, which is more discriminating than previously proposed criteria. Maximal ancestral graphs (MAGs) are used to encode conditional independence relations in DAG models with hidden variables. Different MAGs may represent the same set of conditional independences and are called Markov equivalent. This paper considers MAGs without undirected edges and shows conditions under which an arrow in a MAG can be reversed or interchanged with a bi-directed edge so as to yield a Markov equivalent MAG. We propose a variant of Alternating-time Temporal Logic (ATL) grounded in the agents' operational know-how, as defined by their libraries of abstract plans. Inspired by ATLES, a variant itself of ATL, it is possible in our logic to explicitly refer to "rational" strategies for agents developed under the Belief-Desire-Intention agent programming paradigm. This allows us to express and verify properties of BDI systems using ATL-type logical frameworks. We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP. We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently. We present a major improvement to the incremental pruning algorithm for solving partially observable Markov decision processes. Our technique targets the cross-sum step of the dynamic programming (DP) update, a key source of complexity in POMDP algorithms. Instead of reasoning about the whole belief space when pruning the cross-sums, our algorithm divides the belief space into smaller regions and performs independent pruning in each region. We evaluate the benefits of the new technique both analytically and experimentally, and show that it produces very significant performance gains. The results contribute to the scalability of POMDP algorithms to domains that cannot be handled by the best existing techniques. The aim of this work is to provide a unified framework for ordinal representations of uncertainty lying at the crosswords between possibility and probability theories. Such confidence relations between events are commonly found in monotonic reasoning, inconsistency management, or qualitative decision theory. They start either from probability theory, making it more qualitative, or from possibility theory, making it more expressive. We show these two trends converge to a class of genuine probability theories. We provide characterization results for these useful tools that preserve the qualitative nature of possibility rankings, while enjoying the power of expressivity of additive representations. The paper introduces mixed networks, a new framework for expressing and reasoning with probabilistic and deterministic information. The framework combines belief networks with constraint networks, defining the semantics and graphical representation. We also introduce the AND/OR search space for graphical models, and develop a new linear space search algorithm. This provides the basis for understanding the benefits of processing the constraint information separately, resulting in the pruning of the search space. When the constraint part is tractable or has a small number of solutions, using the mixed representation can be exponentially more effective than using pure belief networks which odel constraints as conditional probability tables. We consider the challenge of preference elicitation in systems that help users discover the most desirable item(s) within a given database. Past work on preference elicitation focused on structured models that provide a factored representation of users' preferences. Such models require less information to construct and support efficient reasoning algorithms. This paper makes two substantial contributions to this area: (1) Strong representation theorems for factored value functions. (2) A methodology that utilizes our representation results to address the problem of optimal item selection. Pearl has provided the back door criterion, the front door criterion and the conditional instrumental variable (IV) method as identifiability criteria for total effects. In some situations, these three criteria can be applied to identifying total effects simultaneously. For the purpose of increasing estimating accuracy, this paper compares the three ways of identifying total effects in terms of the asymptotic variance, and concludes that in some situations the superior of them can be recognized directly from the graph structure. This paper concerns the assessment of the effects of actions from a combination of nonexperimental data and causal assumptions encoded in the form of a directed acyclic graph in which some variables are presumed to be unobserved. We provide a procedure that systematically identifies cause effects between two sets of variables conditioned on some other variables, in time polynomial in the number of variables in the graph. The identifiable conditional causal effects are expressed in terms of the observed joint distribution. We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI).HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some benchmark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature. A causal claim is any assertion that invokes causal relationships between variables, for example that a drug has a certain effect on preventing a disease. Causal claims are established through a combination of data and a set of causal assumptions called a causal model. A claim is robust when it is insensitive to violations of some of the causal assumptions embodied in the model. This paper gives a formal definition of this notion of robustness and establishes a graphical condition for quantifying the degree of robustness of a given causal claim. Algorithms for computing the degree of robustness are also presented. Filter selection techniques are known for their simplicity and efficiency. However this kind of methods doesn't take into consideration the features inter-redundancy. Consequently the un-removed redundant features remain in the final classification model, giving lower generalization performance. In this paper we propose to use a mathematical optimization method that reduces inter-features redundancy and maximize relevance between each feature and the target variable. We develop several algorithms taking advantage of two common approaches for bounding MPE queries in graphical models: minibucket elimination and message-passing updates for linear programming relaxations. Both methods are quite similar, and offer useful perspectives for the other; our hybrid approaches attempt to balance the advantages of each. We demonstrate the power of our hybrid algorithms through extensive empirical evaluation. Most notably, a Branch and Bound search guided by the heuristic function calculated by one of our new algorithms has recently won first place in the PASCAL2 inference challenge. We consider the problem of selecting a subset of alternatives given noisy evaluations of the relative strength of different alternatives. We wish to select a k-subset (for a given k) that provides a maximum likelihood estimate for one of several objectives, e.g., containing the strongest alternative. Although this problem is NP-hard, we show that when the noise level is sufficiently high, intuitive methods provide the optimal solution. We thus generalize classical results about singling out one alternative and identifying the hidden ranking of alternatives by strength. Extensive experiments show that our methods perform well in practical settings. We study the problem of complexity estimation in the context of parallelizing an advanced Branch and Bound-type algorithm over graphical models. The algorithm's pruning power makes load balancing, one crucial element of every distributed system, very challenging. We propose using a statistical regression model to identify and tackle disproportionally complex parallel subproblems, the cause of load imbalance, ahead of time. The proposed model is evaluated and analyzed on various levels and shown to yield robust predictions. We then demonstrate its effectiveness for load balancing in practice. Influence diagrams allow for intuitive and yet precise description of complex situations involving decision making under uncertainty. Unfortunately, most of the problems described by influence diagrams are hard to solve. In this paper we discuss the complexity of approximately solving influence diagrams. We do not assume no-forgetting or regularity, which makes the class of problems we address very broad. Remarkably, we show that when both the tree-width and the cardinality of the variables are bounded the problem admits a fully polynomial-time approximation scheme. Variational inference algorithms such as belief propagation have had tremendous impact on our ability to learn and use graphical models, and give many insights for developing or understanding exact and approximate inference. However, variational approaches have not been widely adoped for decision making in graphical models, often formulated through influence diagrams and including both centralized and decentralized (or multi-agent) decisions. In this work, we present a general variational framework for solving structured cooperative decision-making problems, use it to propose several belief propagation-like algorithms, and analyze them both theoretically and empirically. Stochastic domains often involve risk-averse decision makers. While recent work has focused on how to model risk in Markov decision processes using risk measures, it has not addressed the problem of solving large risk-averse formulations. In this paper, we propose and analyze a new method for solving large risk-averse MDPs with hybrid continuous-discrete state spaces and continuous action spaces. The proposed method iteratively improves a bound on the value function using a linearity structure of the MDP. We demonstrate the utility and properties of the method on a portfolio optimization problem. In this paper, starting from a generalized coherent (i.e. avoiding uniform loss) intervalvalued probability assessment on a finite family of conditional events, we construct conditional probabilities with quasi additive classes of conditioning events which are consistent with the given initial assessment. Quasi additivity assures coherence for the obtained conditional probabilities. In order to reach our goal we define a finite sequence of conditional probabilities by exploiting some theoretical results on g-coherence. In particular, we use solutions of a finite sequence of linear systems. We describe multi-objective influence diagrams, based on a set of p objectives, where utility values are vectors in Rp, and are typically only partially ordered. These can still be solved by a variable elimination algorithm, leading to a set of maximal values of expected utility. If the Pareto ordering is used this set can often be prohibitively large. We consider approximate representations of the Pareto set based on e-coverings, allowing much larger problems to be solved. In addition, we define a method for incorporating user tradeoffs, which also greatly improves the efficiency. Controlled natural languages (mostly English-based) recently have emerged as seemingly informal supplementary means for OWL ontology authoring, if compared to the formal notations that are used by professional knowledge engineers. In this paper we present by examples controlled Latvian language that has been designed to be compliant with the state of the art Attempto Controlled English. We also discuss relation with controlled Lithuanian language that is being designed in parallel. In cognitive sciences it is not uncommon to use various games effectively. For example, in artificial intelligence, the RoboCup initiative was to set up to catalyse research on the field of autonomous agent technology. In this paper, we introduce a similar soccer simulation initiative to try to investigate a model of human consciousness and a notion of reality in the form of a cognitive problem. In addition, for example, the home pitch advantage and the objective role of the supporters could be naturally described and discussed in terms of this new soccer simulation model. The RoboCup 2D Simulation League incorporates several challenging features, setting a benchmark for Artificial Intelligence (AI). In this paper we describe some of the ideas and tools around the development of our team, Gliders2012. In our description, we focus on the evaluation function as one of our central mechanisms for action selection. We also point to a new framework for watching log files in a web browser that we release for use and further development by the RoboCup community. Finally, we also summarize results of the group and final matches we played during RoboCup 2012, with Gliders2012 finishing 4th out of 19 teams. This paper advocates the exploration of the full state of recorded real-time strategy (RTS) games, by human or robotic players, to discover how to reason about tactics and strategy. We present a dataset of StarCraft games encompassing the most of the games' state (not only player's orders). We explain one of the possible usages of this dataset by clustering armies on their compositions. This reduction of armies compositions to mixtures of Gaussian allow for strategic reasoning at the level of the components. We evaluated this clustering method by predicting the outcomes of battles based on armies compositions' mixtures components In a standard possibilistic logic, prioritized information are encoded by means of weighted knowledge base. This paper proposes an extension of possibilistic logic for dealing with partially ordered information. We Show that all basic notions of standard possibilitic logic (sumbsumption, syntactic and semantic inference, etc.) have natural couterparts when dealing with partially ordered information. We also propose an algorithm which computes possibilistic conclusions of a partial knowledge base of a partially ordered knowlege base. This paper is directed towards combining Pearl's structural-model approach to causal reasoning with high-level formalisms for reasoning about actions. More precisely, we present a combination of Pearl's structural-model approach with Poole's independent choice logic. We show how probabilistic theories in the independent choice logic can be mapped to probabilistic causal models. This mapping provides the independent choice logic with appealing concepts of causality and explanation from the structural-model approach. We illustrate this along Halpern and Pearl's sophisticated notions of actual cause, explanation, and partial explanation. This mapping also adds first-order modeling capabilities and explicit actions to the structural-model approach. We present the language {m P}{cal C}+ for probabilistic reasoning about actions, which is a generalization of the action language {cal C}+ that allows to deal with probabilistic as well as nondeterministic effects of actions. We define a formal semantics of {m P}{cal C}+ in terms of probabilistic transitions between sets of states. Using a concept of a history and its belief state, we then show how several important problems in reasoning about actions can be concisely formulated in our formalism. By elaborating on the notion of linear belief functions (Dempster 1990; Liu 1996), we propose an elementary approach to knowledge representation for expert systems using linear belief functions. We show how to use basic matrices to represent market information and financial knowledge, including complete ignorance, statistical observations, subjective speculations, distributional assumptions, linear relations, and empirical asset pricing models. We then appeal to Dempster's rule of combination to integrate the knowledge for assessing an overall belief of portfolio performance, and updating the belief by incorporating additional information. We use an example of three gold stocks to illustrate the approach. This paper presents a scalable control algorithm that enables a deployed mobile robot system to make high-level decisions under full consideration of its probabilistic belief. Our approach is based on insights from the rich literature of hierarchical controllers and hierarchical MDPs. The resulting controller has been successfully deployed in a nursing facility near Pittsburgh, PA. To the best of our knowledge, this work is a unique instance of applying POMDPs to high-level robotic control problems. This paper is devoted to the search of robust solutions in state space graphs when costs depend on scenarios. We first present axiomatic requirements for preference compatibility with the intuitive idea of robustness.This leads us to propose the Lorenz dominance rule as a basis for robustness analysis. Then, after presenting complexity results about the determination of robust solutions, we propose a new sophistication of A* specially designed to determine the set of robust paths in a state space graph. The behavior of the algorithm is illustrated on a small example. Finally, an axiomatic justification of the refinement of robustness by an OWA criterion is provided. Robust optimization is one of the fundamental approaches to deal with uncertainty in combinatorial optimization. This paper considers the robust spanning tree problem with interval data, which arises in a variety of telecommunication applications. It proposes a constraint satisfaction approach using a combinatorial lower bound, a pruning component that removes infeasible and suboptimal edges, as well as a search strategy exploring the most uncertain edges first. The resulting algorithm is shown to produce very dramatic improvements over the mathematical programming approach of Yaman et al. and to enlarge considerably the class of problems amenable to effective solutions We develop a qualitative theory of Markov Decision Processes (MDPs) and Partially Observable MDPs that can be used to model sequential decision making tasks when only qualitative information is available. Our approach is based upon an order-of-magnitude approximation of both probabilities and utilities, similar to epsilon-semantics. The result is a qualitative theory that has close ties with the standard maximum-expected-utility theory and is amenable to general planning techniques. This paper concerns the assessment of direct causal effects from a combination of: (i) non-experimental data, and (ii) qualitative domain knowledge. Domain knowledge is encoded in the form of a directed acyclic graph (DAG), in which all interactions are assumed linear, and some variables are presumed to be unobserved. We provide a generalization of the well-known method of Instrumental Variables, which allows its application to models with few conditional independeces. In this paper, we continue our research on the algorithmic aspects of Halpern and Pearl's causes and explanations in the structural-model approach. To this end, we present new characterizations of weak causes for certain classes of causal models, which show that under suitable restrictions deciding causes and explanations is tractable. To our knowledge, these are the first explicit tractability results for the structural-model approach. This paper presents a decision-theoretic approach to statistical inference that satisfies the likelihood principle (LP) without using prior information. Unlike the Bayesian approach, which also satisfies LP, we do not assume knowledge of the prior distribution of the unknown parameter. With respect to information that can be obtained from an experiment, our solution is more efficient than Wald's minimax solution.However, with respect to information assumed to be known before the experiment, our solution demands less input than the Bayesian solution. We show that maximum entropy (maxent) models can be modeled with certain kinds of HMMs, allowing us to construct maxent models with hidden variables, hidden state sequences, or other characteristics. The models can be trained using the forward-backward algorithm. While the results are primarily of theoretical interest, unifying apparently unrelated concepts, we also give experimental results for a maxent model with a hidden variable on a word disambiguation task; the model outperforms standard techniques. We describe expectation propagation for approximate inference in dynamic Bayesian networks as a natural extension of Pearl s exact belief propagation.Expectation propagation IS a greedy algorithm, converges IN many practical cases, but NOT always.We derive a DOUBLE - loop algorithm, guaranteed TO converge TO a local minimum OF a Bethe free energy.Furthermore, we show that stable fixed points OF (damped) expectation propagation correspond TO local minima OF this free energy, but that the converse need NOT be the CASE .We illustrate the algorithms BY applying them TO switching linear dynamical systems AND discuss implications FOR approximate inference IN general Bayesian networks. We present methods employed in Coordinate, a prototype service that supports collaboration and communication by learning predictive models that provide forecasts of users s AND availability.We describe how data IS collected about USER activity AND proximity FROM multiple devices, IN addition TO analysis OF the content OF users, the time of day, and day of week. We review applications of presence forecasting embedded in the Priorities application and then present details of the Coordinate service that was informed by the earlier efforts. We introduce a general representation of large-population games in which each player s influence ON the others IS centralized AND limited, but may otherwise be arbitrary.This representation significantly generalizes the class known AS congestion games IN a natural way.Our main results are provably correct AND efficient algorithms FOR computing AND learning approximate Nash equilibria IN this general framework. We propose a formal treatment of scenarios in the context of a dialectical argumentation formalism for qualitative reasoning about uncertain propositions. Our formalism extends prior work in which arguments for and against uncertain propositions were presented and compared in interaction spaces called Agoras. We now define the notion of a scenario in this framework and use it to define a set of qualitative uncertainty labels for propositions across a collection of scenarios. This work is intended to lead to a formal theory of scenarios and scenario analysis. Exact monitoring in dynamic Bayesian networks is intractable, so approximate algorithms are necessary. This paper presents a new family of approximate monitoring algorithms that combine the best qualities of the particle filtering and Boyen-Koller methods. Our algorithms maintain an approximate representation the belief state in the form of sets of factored particles, that correspond to samples of clusters of state variables. Empirical results show that our algorithms outperform both ordinary particle filtering and the Boyen-Koller algorithm on large systems. We present new algorithms for inference in credal networks --- directed acyclic graphs associated with sets of probabilities. Credal networks are here interpreted as encoding strong independence relations among variables. We first present a theory of credal networks based on separately specified sets of probabilities. We also show that inference with polytrees is NP-hard in this setting. We then introduce new techniques that reduce the computational effort demanded by inference, particularly in polytrees, by exploring separability of credal sets. We develop a closed form asymptotic formula to compute the marginal likelihood of data given a naive Bayesian network model with two hidden states and binary features. This formula deviates from the standard BIC score. Our work provides a concrete example that the BIC score is generally not valid for statistical models that belong to a stratified exponential family. This stands in contrast to linear and curved exponential families, where the BIC score has been proven to provide a correct approximation for the marginal likelihood. We address the question of convergence in the loopy belief propagation (LBP) algorithm. Specifically, we relate convergence of LBP to the existence of a weak limit for a sequence of Gibbs measures defined on the LBP s associated computation tree.Using tools FROM the theory OF Gibbs measures we develop easily testable sufficient conditions FOR convergence.The failure OF convergence OF LBP implies the existence OF multiple phases FOR the associated Gibbs specification.These results give new insight INTO the mechanics OF the algorithm. This presentation will introduce the audience to a new, emerging body of research on sequential Monte Carlo techniques in robotics. In recent years, particle filters have solved several hard perceptual robotic problems. Early successes were limited to low-dimensional problems, such as the problem of robot localization in environments with known maps. More recently, researchers have begun exploiting structural properties of robotic domains that have led to successful particle filter applications in spaces with as many as 100,000 dimensions. The presentation will discuss specific tricks necessary to make these techniques work in real - world domains,and also discuss open challenges for researchers IN the UAI community. The validity OF a causal model can be tested ONLY IF the model imposes constraints ON the probability distribution that governs the generated data. IN the presence OF unmeasured variables, causal models may impose two types OF constraints : conditional independencies, AS READ through the d - separation criterion, AND functional constraints, FOR which no general criterion IS available.This paper offers a systematic way OF identifying functional constraints AND, thus, facilitates the task OF testing causal models AS well AS inferring such models FROM data. We present a general framework for defining priors on model structure and sampling from the posterior using the Metropolis-Hastings algorithm. The key idea is that structure priors are defined via a probability tree and that the proposal mechanism for the Metropolis-Hastings algorithm operates by traversing this tree, thereby defining a cheaply computable acceptance probability. We have applied this approach to Bayesian net structure learning using a number of priors and tree traversal strategies. Our results show that these must be chosen appropriately for this approach to be successful. This paper presents a sound and completecalculus for causal relevance, based onPearl's functional models semantics.The calculus consists of axioms and rulesof inference for reasoning about causalrelevance relationships.We extend the set of known axioms for causalrelevance with three new axioms, andintroduce two new rules of inference forreasoning about specific subclasses ofmodels.These subclasses give a more refinedcharacterization of causal models than the one given in Halpern's axiomatizationof counterfactual reasoning.Finally, we show how the calculus for causalrelevance can be used in the task ofidentifying causal structure from non-observational data. An instrument is a random variable thatallows the identification of parameters inlinear models when the error terms arenot uncorrelated.It is a popular method used in economicsand the social sciences that reduces theproblem of identification to the problemof finding the appropriate instruments.Few years ago, Pearl introduced a necessarytest for instruments that allows the researcher to discard those candidatesthat fail the test.In this paper, we make a detailed study of Pearl's test and the general model forinstruments. The results of this studyinclude a novel interpretation of Pearl'stest, a general theory of instrumentaltests, and an affirmative answer to aprevious conjecture. We also presentnew instrumentality tests for the casesof discrete and continuous variables. It is often stated in papers tackling the task of inferring Bayesian network structures from data that there are these two distinct approaches: (i) Apply conditional independence tests when testing for the presence or otherwise of edges; (ii) Search the model space using a scoring metric. Here I argue that for complete data and a given node ordering this division is a myth, by showing that cross entropy methods for checking conditional independence are mathematically identical to methods based upon discriminating between models by their overall goodness-of-fit logarithmic scores. We propose a new definition of actual causes, using structural equations to model counterfactuals.We show that the definitions yield a plausible and elegant account ofcausation that handles well examples which have caused problems forother definitions and resolves major difficulties in the traditionalaccount. In a companion paper, we show how the definition of causality can beused to give an elegant definition of (causal) explanation. This article deals with plausible reasoning from incomplete knowledge about large-scale spatial properties. The availableinformation, consisting of a set of pointwise observations,is extrapolated to neighbour points. We make use of belief functions to represent the influence of the knowledge at a given point to another point; the quantitative strength of this influence decreases when the distance between both points increases. These influences arethen aggregated using a variant of Dempster's rule of combination which takes into account the relative dependence between observations. We present probabilistic logic programming under inheritance with overriding. This approach is based on new notions of entailment for reasoning with conditional constraints, which are obtained from the classical notion of logical entailment by adding the principle of inheritance with overriding. This is done by using recent approaches to probabilistic default reasoning with conditional constraints. We analyze the semantic properties of the new entailment relations. We also present algorithms for probabilistic logic programming under inheritance with overriding, and program transformations for an increased efficiency. In this paper we compare three different architectures for the evaluation of influence diagrams: HUGIN, Shafer-Shenoy, and Lazy Evaluation architecture. The computational complexity of the architectures are compared on the LImited Memory Influence Diagram (LIMID): a diagram where only the requiste information for the computation of the optimal policies are depicted. Because the requsite information is explicitly represented in the LIMID the evaluation can take advantage of it, and significant savings in computational can be obtained. In this paper we show how the obtained savings is considerably increased when the computations performed on the LIMID is according to the Lazy Evaluation scheme. The direct effect of one eventon another can be defined and measured byholding constant all intermediate variables between the two.Indirect effects present conceptual andpractical difficulties (in nonlinear models), because they cannot be isolated by holding certain variablesconstant. This paper shows a way of defining any path-specific effectthat does not invoke blocking the remainingpaths.This permits the assessment of a more naturaltype of direct and indirect effects, one thatis applicable in both linear and nonlinear models. The paper establishesconditions under which such assessments can be estimated consistentlyfrom experimental and nonexperimental data,and thus extends path-analytic techniques tononlinear and nonparametric models. We propose a new approach to value-directed belief state approximation for POMDPs. The value-directed model allows one to choose approximation methods for belief state monitoring that have a small impact on decision quality. Using a vector space analysis of the problem, we devise two new search procedures for selecting an approximation scheme that have much better computational properties than existing methods. Though these provide looser error bounds, we show empirically that they have a similar impact on decision quality in practice, and run up to two orders of magnitude more quickly. A method is presented for the rhythmic parsing problem: Given a sequence of observed musical note onset times, we estimate the corresponding notated rhythm and tempo process. A graphical model is developed that represents the simultaneous evolution of tempo and rhythm and relates these hidden quantities to observations. The rhythm variables are discrete and the tempo and observation variables are continuous. We show how to compute the globally most likely configuration of the tempo and rhythm variables given an observation of note onset times. Preliminary experiments are presented on a small data set. A generalization to arbitrary conditional Gaussian distributions is outlined. We consider a partially observable Markov decision problem (POMDP) that models a class of sequencing problems. Although POMDPs are typically intractable, our formulation admits tractable solution. Instead of maintaining a value function over a high-dimensional set of belief states, we reduce the state space to one of smaller dimension, in which grid-based dynamic programming techniques are effective. We develop an error bound for the resulting approximation, and discuss an application of the model to a problem in targeted advertising. We propose a new method of discovering causal structures, based on the detection of local, spontaneous changes in the underlying data-generating model. We analyze the classes of structures that are equivalent relative to a stream of distributions produced by local changes, and devise algorithms that output graphical representations of these equivalence classes. We present experimental results, using simulated data, and examine the errors associated with detection of changes and recovery of structures. We treat collaborative filtering as a univariate time series estimation problem: given a user's previous votes, predict the next vote. We describe two families of methods for transforming data to encode time order in ways amenable to off-the-shelf classification and density estimation tools, and examine the results of using these approaches on several real-world data sets. The improvements in predictive accuracy we realize recommend the use of other predictive algorithms that exploit the temporal order of data. We show that if a strictly positive joint probability distribution for a set of binary random variables factors according to a tree, then vertex separation represents all and only the independence relations enclosed in the distribution. The same result is shown to hold also for multivariate strictly positive normal distributions. Our proof uses a new property of conditional independence that holds for these two classes of probability distributions. Possibilistic logic offers a qualitative framework for representing pieces of information associated with levels of uncertainty of priority. The fusion of multiple sources information is discussed in this setting. Different classes of merging operators are considered including conjunctive, disjunctive, reinforcement, adaptive and averaging operators. Then we propose to analyse these classes in terms of postulates. This is done by first extending the postulate for merging classical bases to the case where priorites are avaialbe. Monitoring plan preconditions can allow for replanning when a precondition fails, generally far in advance of the point in the plan where the precondition is relevant. However, monitoring is generally costly, and some precondition failures have a very small impact on plan quality. We formulate a model for optimal precondition monitoring, using partially-observable Markov decisions processes, and describe methods for solving this model efficitively, though approximately. Specifically, we show that the single-precondition monitoring problem is generally tractable, and the multiple-precondition monitoring policies can be efficitively approximated using single-precondition soultions. Algorithms for exact and approximate inference in stochastic logic programs (SLPs) are presented, based respectively, on variable elimination and importance sampling. We then show how SLPs can be used to represent prior distributions for machine learning, using (i) logic programs and (ii) Bayes net structures as examples. Drawing on existing work in statistics, we apply the Metropolis-Hasting algorithm to construct a Markov chain which samples from the posterior distribution. A Prolog implementation for this is described. We also discuss the possibility of constructing explicit representations of the posterior. In this paper, we formulate a qualitative "linear" utility theory for lotteries in which uncertainty is expressed qualitatively using a Spohnian disbelief function. We argue that a rational decision maker facing an uncertain decision problem in which the uncertainty is expressed qualitatively should behave so as to maximize "qualitative expected utility." Our axiomatization of the qualitative utility is similar to the axiomatization developed by von Neumann and Morgenstern for probabilistic lotteries. We compare our results with other recent results in qualitative decision making. We propose a framework for building graphical causal model that is based on the concept of causal mechanisms. Causal models are intuitive for human users and, more importantly, support the prediction of the effect of manipulation. We describe an implementation of the proposed framework as an interactive model construction module, ImaGeNIe, in SMILE (Structural Modeling, Inference, and Learning Engine) and in GeNIe (SMILE's Windows user interface). We present a new approach to the solution of decision problems formulated as influence diagrams. The approach converts the influence diagram into a simpler structure, the LImited Memory Influence Diagram (LIMID), where only the requisite information for the computation of optimal policies is depicted. Because the requisite information is explicitly represented in the diagram, the evaluation procedure can take advantage of it. In this paper we show how to convert an influence diagram to a LIMID and describe the procedure for finding an optimal strategy. Our approach can yield significant savings of memory and computational time when compared to traditional methods. Conversations abound with uncetainties of various kinds. Treating conversation as inference and decision making under uncertainty, we propose a task independent, multimodal architecture for supporting robust continuous spoken dialog called Quartet. We introduce four interdependent levels of analysis, and describe representations, inference procedures, and decision strategies for managing uncertainties within and between the levels. We highlight the approach by reviewing interactions between a user and two spoken dialog systems developed using the Quartet architecture: Prsenter, a prototype system for navigating Microsoft PowerPoint presentations, and the Bayesian Receptionist, a prototype system for dealing with tasks typically handled by front desk receptionists at the Microsoft corporate campus. Qualitative probabilistic networks have been designed for probabilistic reasoning in a qualitative way. Due to their coarse level of representation detail, qualitative probabilistic networks do not provide for resolving trade-offs and typically yield ambiguous results upon inference. We present an algorithm for computing more insightful results for unresolved trade-offs. The algorithm builds upon the idea of using pivots to zoom in on the trade-offs and identifying the information that would serve to resolve them. This paper extends the work in [Suzuki, 1996] and presents an efficient depth-first branch-and-bound algorithm for learning Bayesian network structures, based on the minimum description length (MDL) principle, for a given (consistent) variable ordering. The algorithm exhaustively searches through all network structures and guarantees to find the network with the best MDL score. Preliminary experiments show that the algorithm is efficient, and that the time complexity grows slowly with the sample size. The algorithm is useful for empirically studying both the performance of suboptimal heuristic search algorithms and the adequacy of the MDL principle in learning Bayesian networks. This paper deals with the problem of estimating the probability that one event was a cause of another in a given scenario. Using structural-semantical definitions of the probabilities of necessary or sufficient causation (or both), we show how to optimally bound these quantities from data obtained in experimental and observational studies, making minimal assumptions concerning the data-generating process. In particular, we strengthen the results of Pearl (1999) by weakening the data-generation assumptions and deriving theoretically sharp bounds on the probabilities of causation. These results delineate precisely how empirical data can be used both in settling questions of attribution and in solving attribution-related problems of decision making. This paper examines the use of Bayesian Networks to tackle one of the tougher problems in requirements engineering, translating user requirements into system requirements. The approach taken is to model domain knowledge as Bayesian Network fragments that are glued together to form a complete view of the domain specific system requirements. User requirements are introduced as evidence and the propagation of belief is used to determine what are the appropriate system requirements as indicated by user requirements. This concept has been demonstrated in the development of a system specification and the results are presented here. Recent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning. We consider the task of learning the maximum-likelihood polytree from data. Our first result is a performance guarantee establishing that the optimal branching (or Chow-Liu tree), which can be computed very easily, constitutes a good approximation to the best polytree. We then show that it is not possible to do very much better, since the learning problem is NP-hard even to approximately solve within some constant factor. Recent improvement on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphyical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determing whether a given set of independence assertions. We discuss the foundation of quantifier elimination and demonstrate its application to these problems. A conceptual foundation for approximation of belief functions is proposed and investigated. It is based on the requirements of consistency and closeness. An optimal approximation is studied. Unfortunately, the computation of the optimal approximation turns out to be intractable. Hence, various heuristic methods are proposed and experimantally evaluated both in terms of their accuracy and in terms of the speed of computation. These methods are compared to the earlier proposed approximations of belief functions. We outline a method to estimate the value of computation for a flexible algorithm using empirical data. To determine a reasonable trade-off between cost and value, we build an empirical model of the value obtained through computation, and apply this model to estimate the value of computation for quite different problems. In particular, we investigate this trade-off for the problem of constructing policies for decision problems represented as influence diagrams. We show how two features of our anytime algorithm provide reasonable estimates of the value of computation in this domain. This paper extends previous work with network fragments and situation-specific network construction. We formally define the asymmetry network, an alternative representation for a conditional probability table. We also present an object-oriented representation for partially specified asymmetry networks. We show that the representation is parsimonious. We define an algebra for the elements of the representation that allows us to 'factor' any CPT and to soundly combine the partially specified asymmetry networks. Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies. We show how to use a variational approximation to the logistic function to perform approximate inference in Bayesian networks containing discrete nodes with continuous parents. Essentially, we convert the logistic function to a Gaussian, which facilitates exact inference, and then iteratively adjust the variational parameters to improve the quality of the approximation. We demonstrate experimentally that this approximation is faster and potentially more accurate than sampling. We also introduce a simple new technique for handling evidence, which allows us to handle arbitrary distributions on observed nodes, as well as achieving a significant speedup in networks with discrete variables of large cardinality. This paper describes stochastic search approaches, including a new stochastic algorithm and an adaptive mutation operator, for learning Bayesian networks from incomplete data. This problem is characterized by a huge solution space with a highly multimodal landscape. State-of-the-art approaches all involve using deterministic approaches such as the expectation-maximization algorithm. These approaches are guaranteed to find local maxima, but do not explore the landscape for other modes. Our approach evolves structure and the missing data. We compare our stochastic algorithms and show they all produce accurate results. Qualitative probabilistic networks have been introduced as qualitative abstractions of Bayesian belief networks. One of the major drawbacks of these qualitative networks is their coarse level of detail, which may lead to unresolved trade-offs during inference. We present an enhanced formalism for qualitative networks with a finer level of detail. An enhanced qualitative probabilistic network differs from a regular qualitative network in that it distinguishes between strong and weak influences. Enhanced qualitative probabilistic networks are purely qualitative in nature, as regular qualitative networks are, yet allow for efficiently resolving trade-offs during inference. In this article we propose a qualitative (ordinal) counterpart for the Partially Observable Markov Decision Processes model (POMDP) in which the uncertainty, as well as the preferences of the agent, are modeled by possibility distributions. This qualitative counterpart of the POMDP model relies on a possibilistic theory of decision under uncertainty, recently developed. One advantage of such a qualitative framework is its ability to escape from the classical obstacle of stochastic POMDPs, in which even with a finite state space, the obtained belief state space of the POMDP is infinite. Instead, in the possibilistic framework even if exponentially larger than the state space, the belief state space remains finite. One of the most useful sensitivity analysis techniques of decision analysis is the computation of value of information (or clairvoyance), the difference in value obtained by changing the decisions by which some of the uncertainties are observed. In this paper, some simple but powerful extensions to previous algorithms are introduced which allow an efficient value of information calculation on the rooted cluster tree (or strong junction tree) used to solve the original decision problem. The noisy-or and its generalization noisy-max have been utilized to reduce the complexity of knowledge acquisition. In this paper, we present a new representation of noisy-max that allows for efficient inference in general Bayesian networks. Empirical studies show that our method is capable of computing queries in well-known large medical networks, QMR-DT and CPCS, for which no previous exact inference method has been shown to perform well. We present a technique for speeding up the convergence of value iteration for partially observable Markov decisions processes (POMDPs). The underlying idea is similar to that behind modified policy iteration for fully observable Markov decision processes (MDPs). The technique can be easily incorporated into any existing POMDP value iteration algorithms. Experiments have been conducted on several test problems with one POMDP value iteration algorithm called incremental pruning. We find that the technique can make incremental pruning run several orders of magnitude faster. Argumentation is a promising model for reasoning with uncertain knowledge. The key concept of acceptability enables to differentiate arguments and counterarguments: The certainty of a proposition can then be evaluated through the most acceptable arguments for that proposition. In this paper, we investigate different complementary points of view: - an acceptability based on the existence of direct counterarguments, - an acceptability based on the existence of defenders. Pursuing previous work on preference-based argumentation principles, we enforce both points of view by taking into account preference orderings for comparing arguments. Our approach is illustrated in the context of reasoning with stratified knowldge bases. This paper addresses the problem of merging uncertain information in the framework of possibilistic logic. It presents several syntactic combination rules to merge possibilistic knowledge bases, provided by different sources, into a new possibilistic knowledge base. These combination rules are first described at the meta-level outside the language of possibilistic logic. Next, an extension of possibilistic logic, where the combination rules are inside the language, is proposed. A proof system in a sequent form, which is sound and complete with respect to the possibilistic logic semantics, is given. Given an undirected graph G or hypergraph X model for a given set of variables V, we introduce two marginalization operators for obtaining the undirected graph GA or hypergraph HA associated with a given subset A c V such that the marginal distribution of A factorizes according to GA or HA, respectively. Finally, we illustrate the method by its application to some practical examples. With them we show that hypergraph models allow defining a finer factorization or performing a more precise conditional independence analysis than undirected graph models. This paper analyzes irrelevance and independence relations in graphical models associated with convex sets of probability distributions (called Quasi-Bayesian networks). The basic question in Quasi-Bayesian networks is, How can irrelevance/independence relations in Quasi-Bayesian networks be detected, enforced and exploited? This paper addresses these questions through Walley's definitions of irrelevance and independence. Novel algorithms and results are presented for inferences with the so-called natural extensions using fractional linear programming, and the properties of the so-called type-1 extensions are clarified through a new generalization of d-separation. The variability of structure in a finite Markov equivalence class of causally sufficient models represented by directed acyclic graphs has been fully characterized. Without causal sufficiency, an infinite semi-Markov equivalence class of models has only been characterized by the fact that each model in the equivalence class entails the same marginal statistical dependencies. In this paper, we study the variability of structure of causal models within a semi-Markov equivalence class and propose a systematic approach to construct models entailing any specific marginal statistical dependencies. This paper relates comparative belief structures and a general view of belief management in the setting of deductively closed logical representations of accepted beliefs. We show that the range of compatibility between the classical deductive closure and uncertain reasoning covers precisely the nonmonotonic 'preferential' inference system of Kraus, Lehmann and Magidor and nothing else. In terms of uncertain reasoning any possibility or necessity measure gives birth to a structure of accepted beliefs. The classes of probability functions and of Shafer's belief functions which yield belief sets prove to be very special ones. This paper presents an axiomatic framework for qualitative decision under uncertainty in a finite setting. The corresponding utility is expressed by a sup-min expression, called Sugeno (or fuzzy) integral. Technically speaking, Sugeno integral is a median, which is indeed a qualitative counterpart to the averaging operation underlying expected utility. The axiomatic justification of Sugeno integral-based utility is expressed in terms of preference between acts as in Savage decision theory. Pessimistic and optimistic qualitative utilities, based on necessity and possibility measures, previously introduced by two of the authors, can be retrieved in this setting by adding appropriate axioms. Dynamic probabilistic networks are a compact representation of complex stochastic processes. In this paper we examine how to learn the structure of a DPN from data. We extend structure scoring rules for standard probabilistic networks to the dynamic case, and show how to search for structure when some of the variables are hidden. Finally, we examine two applications where such a technology might be useful: predicting and classifying dynamic behaviors, and learning causal orderings in biological processes. We provide empirical results that demonstrate the applicability of our methods in both domains. Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state. We take another look at the general problem of selecting a preferred probability measure among those that comply with some given constraints. The dominant role that entropy maximization has obtained in this context is questioned by arguing that the minimum information principle on which it is based could be supplanted by an at least as plausible "likelihood of evidence" principle. We then review a method for turning given selection functions into representation independent variants, and discuss the tradeoffs involved in this transformation. In the literature on graphical models, there has been increased attention paid to the problems of learning hidden structure (see Heckerman [H96] for survey) and causal mechanisms from sample data [H96, P88, S93, P95, F98]. In most settings we should expect the former to be difficult, and the latter potentially impossible without experimental intervention. In this work, we examine some restricted settings in which perfectly reconstruct the hidden structure solely on the basis of observed sample data. We exploit qualitative probabilistic relationships among variables for computing bounds of conditional probability distributions of interest in Bayesian networks. Using the signs of qualitative relationships, we can implement abstraction operations that are guaranteed to bound the distributions of interest in the desired direction. By evaluating incrementally improved approximate networks, our algorithm obtains monotonically tightening bounds that converge to exact distributions. For supermodular utility functions, the tightening bounds monotonically reduce the set of admissible decision alternatives as well. This paper describes a process for constructing situation-specific belief networks from a knowledge base of network fragments. A situation-specific network is a minimal query complete network constructed from a knowledge base in response to a query for the probability distribution on a set of target variables given evidence and context variables. We present definitions of query completeness and situation-specific networks. We describe conditions on the knowledge base that guarantee query completeness. The relationship of our work to earlier work on KBMC is also discussed. I present a parallel algorithm for exact probabilistic inference in Bayesian networks. For polytree networks with n variables, the worst-case time complexity is O(log n) on a CREW PRAM (concurrent-read, exclusive-write parallel random-access machine) with n processors, for any constant number of evidence variables. For arbitrary networks, the time complexity is O(r^{3w}*log n) for n processors, or O(w*log n) for r^{3w}*n processors, where r is the maximum range of any variable, and w is the induced width (the maximum clique size), after moralizing and triangulating the network. This paper is about reducing influence diagram (ID) evaluation into Bayesian network (BN) inference problems. Such reduction is interesting because it enables one to readily use one's favorite BN inference algorithm to efficiently evaluate IDs. Two such reduction methods have been proposed previously (Cooper 1988, Shachter and Peot 1992). This paper proposes a new method. The BN inference problems induced by the mew method are much easier to solve than those induced by the two previous methods. Much recent research in decision theoretic planning has adopted Markov decision processes (MDPs) as the model of choice, and has attempted to make their solution more tractable by exploiting problem structure. One particular algorithm, structured policy construction achieves this by means of a decision theoretic analog of goal regression using action descriptions based on Bayesian networks with tree-structured conditional probability tables. The algorithm as presented is not able to deal with actions with correlated effects. We describe a new decision theoretic regression operator that corrects this weakness. While conceptually straightforward, this extension requires a somewhat more complicated technical approach. Decomposable dependency models and their graphical counterparts, i.e., chordal graphs, possess a number of interesting and useful properties. On the basis of two characterizations of decomposable models in terms of independence relationships, we develop an exact algorithm for recovering the chordal graphical representation of any given decomposable model. We also propose an algorithm for learning chordal approximations of dependency models isomorphic to general undirected graphs. Most exact algorithms for general partially observable Markov decision processes (POMDPs) use a form of dynamic programming in which a piecewise-linear and convex representation of one value function is transformed into another. We examine variations of the "incremental pruning" method for solving this problem and compare them to earlier algorithms from theoretical and empirical perspectives. We find that incremental pruning is presently the most efficient exact method for solving POMDPs. As probabilistic systems gain popularity and are coming into wider use, the need for a mechanism that explains the system's findings and recommendations becomes more critical. The system will also need a mechanism for ordering competing explanations. We examine two representative approaches to explanation in the literature - one due to G\"ardenfors and one due to Pearl - and show that both suffer from significant problems. We propose an approach to defining a notion of "better explanation" that combines some of the features of both together with more recent work by Pearl and others on causality. This paper introduces a new algorithm for the induction if complex finite state automata from samples of behavior. The algorithm is based on information theoretic principles. The algorithm reduces the search space by many orders of magnitude over what was previously thought possible. We compare the algorithm with some existing induction techniques for finite state automata and show that the algorithm is much superior in both run time and quality of inductions. This paper proposes a novel, algorithm-independent approach to optimizing belief network inference. rather than designing optimizations on an algorithm by algorithm basis, we argue that one should use an unoptimized algorithm to generate a Q-DAG, a compiled graphical representation of the belief network, and then optimize the Q-DAG and its evaluator instead. We present a set of Q-DAG optimizations that supplant optimizations designed for traditional inference algorithms, including zero compression, network pruning and caching. We show that our Q-DAG optimizations require time linear in the Q-DAG size, and significantly simplify the process of designing algorithms for optimizing belief network inference. Stochastic algorithms are among the best for solving computationally hard search and reasoning problems. The runtime of such procedures is characterized by a random variable. Different algorithms give rise to different probability distributions. One can take advantage of such differences by combining several algorithms into a portfolio, and running them in parallel or interleaving them on a single processor. We provide a detailed evaluation of the portfolio approach on distributions of hard combinatorial search problems. We show under what conditions the protfolio approach can have a dramatic computational advantage over the best traditional methods. Valuation based systems verifying an idempotent property are studied. A partial order is defined between the valuations giving them a lattice structure. Then, two different strategies are introduced to represent valuations: as infimum of the most informative valuations or as supremum of the least informative ones. It is studied how to carry out computations with both representations in an efficient way. The particular cases of finite sets and convex polytopes are considered. We review the problem of time-critical action and discuss a reformulation that shifts knowledge acquisition from the assessment of complex temporal probabilistic dependencies to the direct assessment of time-dependent utilities over key outcomes of interest. We dwell on a class of decision problems characterized by the centrality of diagnosing and reacting in a timely manner to pathological processes. We motivate key ideas in the context of trauma-care triage and transportation decisions. A new method is developed to represent probabilistic relations on multiple random events. Where previously knowledge bases containing probabilistic rules were used for this purpose, here a probability distribution over the relations is directly represented by a Bayesian network. By using a powerful way of specifying conditional probability distributions in these networks, the resulting formalism is more expressive than the previous ones. Particularly, it provides for constraints on equalities of events, and it allows to define complex, nested combination functions. The idea of fully accepting statements when the evidence has rendered them probable enough faces a number of difficulties. We leave the interpretation of probability largely open, but attempt to suggest a contextual approach to full belief. We show that the difficulties of probabilistic acceptance are not as severe as they are sometimes painted, and that though there are oddities associated with probabilistic acceptance they are in some instances less awkward than the difficulties associated with other nonmonotonic formalisms. We show that the structure at which we arrive provides a natural home for statistical inference. In this paper we present some results obtained with a troupe of low-cost robots designed to cooperatively explore and adquire the map of unknown structured orthogonal environments. In order to improve the covering of the explored zone, the robots show different behaviours and cooperate by transferring each other the perceived environment when they meet. The returning robots deliver to a host computer their partial maps and the host incrementally generates the map of the environment by means of apossibility/ necessity grid. This paper discusses causal independence models and a generalization of these models called causal interaction models. Causal interaction models are models that have independent mechanisms where a mechanism can have several causes. In addition to introducing several particular types of causal interaction models, we show how we can apply the Bayesian approach to learning causal interaction models obtaining approximate posterior distributions for the models and obtain MAP and ML estimates for the parameters. We illustrate the approach with a simulation study of learning model posteriors. Bayesian approaches to learn the graphical structure of Bayesian Belief Networks (BBNs) from databases share the assumption that the database is complete, that is, no entry is reported as unknown. Attempts to relax this assumption involve the use of expensive iterative methods to discriminate among different structures. This paper introduces a deterministic method to learn the graphical structure of a BBN from a possibly incomplete database. Experimental evaluations show a significant robustness of this method and a remarkable independence of its execution time from the number of missing data. This paper explores the role of independence of causal influence (ICI) in Bayesian network inference. ICI allows one to factorize a conditional probability table into smaller pieces. We describe a method for exploiting the factorization in clique tree propagation (CTP) - the state-of-the-art exact inference algorithm for Bayesian networks. We also present empirical results showing that the resulting algorithm is significantly more efficient than the combination of CTP and previous techniques for exploiting ICI. Planning problems where effects of actions are non-deterministic can be modeled as Markov decision processes. Planning problems are usually goal-directed. This paper proposes several techniques for exploiting the goal-directedness to accelerate value iteration, a standard algorithm for solving Markov decision processes. Empirical studies have shown that the techniques can bring about significant speedups. The computational complexity of reasoning within the Dempster-Shafer theory of evidence is one of the main points of criticism this formalism has to face. To overcome this difficulty various approximation algorithms have been suggested that aim at reducing the number of focal elements in the belief functions involved. Besides introducing a new algorithm using this method, this paper describes an empirical study that examines the appropriateness of these approximation procedures in decision making situations. It presents the empirical findings and discusses the various tradeoffs that have to be taken into account when actually applying one of these methods. We develop a qualitative model of decision making with two aims: to describe how people make simple decisions and to enable computer programs to do the same. Current approaches based on Planning or Decisions Theory either ignore uncertainty and tradeoffs, or provide languages and algorithms that are too complex for this task. The proposed model provides a language based on rules, a semantics based on high probabilities and lexicographical preferences, and a transparent decision procedure where reasons for and against decisions interact. The model is no substitude for Decision Theory, yet for decisions that people find easy to explain it may provide an appealing alternative. Real-time Decision algorithms are a class of incremental resource-bounded [Horvitz, 89] or anytime [Dean, 93] algorithms for evaluating influence diagrams. We present a test domain for real-time decision algorithms, and the results of experiments with several Real-time Decision Algorithms in this domain. The results demonstrate high performance for two algorithms, a decision-evaluation variant of Incremental Probabilisitic Inference [D'Ambrosio 93] and a variant of an algorithm suggested by Goldszmidt, [Goldszmidt, 95], PK-reduced. We discuss the implications of these experimental results and explore the broader applicability of these algorithms. Probabilistic inference algorithms for finding the most probable explanation, the maximum aposteriori hypothesis, and the maximum expected utility and for updating belief are reformulated as an elimination--type algorithm called bucket elimination. This emphasizes the principle common to many of the algorithms appearing in that literature and clarifies their relationship to nonserial dynamic programming algorithms. We also present a general way of combining conditioning and elimination within this framework. Bounds on complexity are given for all the algorithms as a function of the problem's structure. We report on work towards flexible algorithms for solving decision problems represented as influence diagrams. An algorithm is given to construct a tree structure for each decision node in an influence diagram. Each tree represents a decision function and is constructed incrementally. The improvements to the tree converge to the optimal decision function (neglecting computational costs) and the asymptotic behaviour is only a constant factor worse than dynamic programming techniques, counting the number of Bayesian network queries. Empirical results show how expected utility increases with the size of the tree and the number of Bayesian net calculations. Inference algorithms for arbitrary belief networks are impractical for large, complex belief networks. Inference algorithms for specialized classes of belief networks have been shown to be more efficient. In this paper, we present a search-based algorithm for approximate inference on arbitrary, noisy-OR belief networks, generalizing earlier work on search-based inference for two-level, noisy-OR belief networks. Initial experimental results appear promising. We present a prototype of a decision support system for management of the fungal disease mildew in winter wheat. The prototype is based on an influence diagram which is used to determine the optimal time and dose of mildew treatments. This involves multiple decision opportunities over time, stochasticity, inaccurate information and incomplete knowledge. The paper describes the practical and theoretical problems encountered during the construction of the influence diagram, and also the experience with the prototype. We present a methodology for representing probabilistic relationships in a general-equilibrium economic model. Specifically, we define a precise mapping from a Bayesian network with binary nodes to a market price system where consumers and producers trade in uncertain propositions. We demonstrate the correspondence between the equilibrium prices of goods in this economy and the probabilities represented by the Bayesian network. A computational market model such as this may provide a useful framework for investigations of belief aggregation, distributed probabilistic inference, resource allocation under uncertainty, and other problems of decentralized uncertainty. We derive qualitative relationships about the informational relevance of variables in graphical decision models based on a consideration of the topology of the models. Specifically, we identify dominance relations for the expected value of information on chance variables in terms of their position and relationships in influence diagrams. The qualitative relationships can be harnessed to generate nonnumerical procedures for ordering uncertain variables in a decision model by their informational relevance. We present two Monte Carlo sampling algorithms for probabilistic inference that guarantee polynomial-time convergence for a larger class of network than current sampling algorithms provide. These new methods are variants of the known likelihood weighting algorithm. We use of recent advances in the theory of optimal stopping rules for Monte Carlo simulation to obtain an inference approximation with relative error epsilon and a small failure probability delta. We present an empirical evaluation of the algorithms which demonstrates their improved performance. SPIRIT is an expert system shell for probabilistic knowledge bases. Knowledge acquisition is performed by processing facts and rules on discrete variables in a rich syntax. The shell generates a probability distribution which respects all acquired facts and rules and which maximizes entropy. The user-friendly devices of SPIRIT to define variables, formulate rules and create the knowledge base are revealed in detail. Inductive learning is possible. Medium sized applications show the power of the system. We propose a decision-analytical approach to comparing the flexibility of decision situations from the perspective of a decision-maker who exhibits constant risk-aversion over a monetary value model. Our approach is simple yet seems to be consistent with a variety of flexibility concepts, including robust and adaptive alternatives. We try to compensate within the model for uncertainty that was not anticipated or not modeled. This approach not only allows one to compare the flexibility of plans, but also guides the search for new, more flexible alternatives. Axiomatization has been widely used for testing logical implications. This paper suggests a non-axiomatic method, the chase, to test if a new dependency follows from a given set of probabilistic dependencies. Although the chase computation may require exponential time in some cases, this technique is a powerful tool for establishing nontrivial theoretical results. More importantly, this approach provides valuable insight into the intriguing connection between relational databases and probabilistic reasoning systems. The article looks at mass market artificial intelligence tools in the context of their ever-growing sophistication, availability and market penetration. The subject is especially relevant today for these exact reasons - if a few years ago AI was the subject of high tech research and science fiction novels, today, we increasingly rely on cloud robotics to cater to our daily needs - to trade stock, predict weather, manage diaries, find friends and buy presents online. Evaluation of counterfactual queries (e.g., "If A were true, would C have been true?") is important to fault diagnosis, planning, determination of liability, and policy analysis. We present a method of revaluating counterfactuals when the underlying causal model is represented by structural models - a nonlinear generalization of the simultaneous equations models commonly used in econometrics and social sciences. This new method provides a coherent means for evaluating policies involving the control of variables which, prior to enacting the policy were influenced by other variables in the system. We present a new approach to dealing with default information based on the theory of belief functions. Our semantic structures, inspired by Adams' epsilon-semantics, are epsilon-belief assignments, where values committed to focal elements are either close to 0 or close to 1. We define two systems based on these structures, and relate them to other non-monotonic systems presented in the literature. We show that our second system correctly addresses the well-known problems of specificity, irrelevance, blocking of inheritance, ambiguity, and redundancy. Chain graphs combine directed and undirected graphs and their underlying mathematics combines properties of the two. This paper gives a simplified definition of chain graphs based on a hierarchical combination of Bayesian (directed) and Markov (undirected) networks. Examples of a chain graph are multivariate feed-forward networks, clustering with conditional interaction between variables, and forms of Bayes classifiers. Chain graphs are then extended using the notation of plates so that samples and data analysis problems can be represented in a graphical model as well. Implications for learning are discussed in the conclusion. We present a simple characterization of equivalent Bayesian network structures based on local transformations. The significance of the characterization is twofold. First, we are able to easily prove several new invariant properties of theoretical interest for equivalent structures. Second, we use the characterization to derive an efficient algorithm that identifies all of the compelled edges in a structure. Compelled edge identification is of particular importance for learning Bayesian network structures from data because these edges indicate causal relationships when certain assumptions hold. We present two algorithms for exact and approximate inference in causal networks. The first algorithm, dynamic conditioning, is a refinement of cutset conditioning that has linear complexity on some networks for which cutset conditioning is exponential. The second algorithm, B-conditioning, is an algorithm for approximate inference that allows one to trade-off the quality of approximations with the computation time. We also present some experimental results illustrating the properties of the proposed algorithms. Bayesian networks provide a method of representing conditional independence between random variables and computing the probability distributions associated with these random variables. In this paper, we extend Bayesian network structures to compute probability density functions for continuous random variables. We make this extension by approximating prior and conditional densities using sums of weighted Gaussian distributions and then finding the propagation rules for updating the densities in terms of these weights. We present a simple example that illustrates the Bayesian network for continuous variables; this example shows the effect of the network structure and approximation errors on the computation of densities for variables in the network. This paper concerns the probabilistic evaluation of the effects of actions in the presence of unmeasured variables. We show that the identification of causal effect between a singleton variable X and a set of variables Y can be accomplished systematically, in time polynomial in the number of variables in the graph. When the causal effect is identifiable, a closed-form expression can be obtained for the probability that the action will achieve a specified goal, or a set of goals. This paper discusses techniques for performing efficient decision-theoretic planning. We give an overview of the DRIPS decision-theoretic refinement planning system, which uses abstraction to efficiently identify optimal plans. We present techniques for automatically generating search control information, which can significantly improve the planner's performance. We evaluate the efficiency of DRIPS both with and without the search control rules on a complex medical planning problem and compare its performance to that of a branch-and-bound decision tree algorithm. We examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data. In particular, we unify the approaches we presented at last year's conference for discrete and Gaussian domains. We derive a general Bayesian scoring metric, appropriate for both domains. We then use this metric in combination with well-known statistical facts about the Dirichlet and normal--Wishart distributions to derive our metrics for discrete and Gaussian domains. In earlier work, we introduced flexible inference and decision-theoretic metareasoning to address the intractability of normative inference. Here, rather than pursuing the task of computing beliefs and actions with decision models composed of distinctions about uncertain events, we examine methods for inferring beliefs about mathematical truth before an automated theorem prover completes a proof. We employ a Bayesian analysis to update belief in truth, given theorem-proving progress, and show how decision-theoretic methods can be used to determine the value of continuing to deliberate versus taking immediate action in time-critical situations. Bayesian networks offer great potential for use in automating large scale diagnostic reasoning tasks. Gibbs sampling is the main technique used to perform diagnostic reasoning in large richly interconnected Bayesian networks. Unfortunately Gibbs sampling can take an excessive time to generate a representative sample. In this paper we describe and test a number of heuristic strategies for improving sampling in noisy-or Bayesian networks. The strategies include Monte Carlo Markov chain sampling techniques other than Gibbs sampling. Emphasis is put on strategies that can be implemented in distributed systems. Consider the situation where some evidence e has been entered to a Bayesian network. When performing conflict analysis, sensitivity analysis, or when answering questions like "What if the finding on X had been y instead of x?" you need probabilities P (e'| h), where e' is a subset of e, and h is a configuration of a (possibly empty) set of variables. Cautious propagation is a modification of HUGIN propagation into a Shafer-Shenoy-like architecture. It is less efficient than HUGIN propagation; however, it provides easy access to P (e'| h) for a great deal of relevant subsets e'. Dawid, Kjaerulff and Lauritzen (1994) provided a preliminary description of a hybrid between Monte-Carlo sampling methods and exact local computations in junction trees. Utilizing the strengths of both methods, such hybrid inference methods has the potential of expanding the class of problems which can be solved under bounded resources as well as solving problems which otherwise resist exact solutions. The paper provides a detailed description of a particular instance of such a hybrid scheme; namely, combination of exact inference and Gibbs sampling in discrete Bayesian networks. We argue that this combination calls for an extension of the usual message passing scheme of ordinary junction trees. Classically, risk is characterized by a point value probability indicating the likelihood of occurrence of an adverse effect. However, there are domains where the attainability of objective numerical risk characterizations is increasingly being questioned. This paper reviews the arguments in favour of extending classical techniques of risk assessment to incorporate meaningful qualitative and weak quantitative risk characterizations. A technique in which linguistic uncertainty terms are defined in terms of patterns of argument is then proposed. The technique is demonstrated using a prototype computer-based system for predicting the carcinogenic risk due to novel chemical compounds. Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI researchers studying automated planning and reinforcement learning. In this paper, we summarize results regarding the complexity of solving MDPs and the running time of MDP solution algorithms. We argue that, although MDPs can be solved efficiently in theory, more study is needed to reveal practical algorithms for solving large problems quickly. To encourage future research, we sketch some alternative methods of analysis that rely on the structure of MDPs. We define a context-sensitive temporal probability logic for representing classes of discrete-time temporal Bayesian networks. Context constraints allow inference to be focused on only the relevant portions of the probabilistic knowledge. We provide a declarative semantics for our language. We present a Bayesian network construction algorithm whose generated networks give sound and complete answers to queries. We use related concepts in logic programming to justify our approach. We have implemented a Bayesian network construction algorithm for a subset of the theory and demonstrate it's application to the problem of evaluating the effectiveness of treatments for acute cardiac conditions. In recent years there has been a spate of papers describing systems for probabilisitic reasoning which do not use numerical probabilities. In some cases the simple set of values used by these systems make it impossible to predict how a probability will change or which hypothesis is most likely given certain evidence. This paper concentrates on such situations, and suggests a number of ways in which they may be resolved by refining the representation. Certain causal models involving unmeasured variables induce no independence constraints among the observed variables but imply, nevertheless, inequality contraints on the observed distribution. This paper derives a general formula for such instrumental variables, that is, exogenous variables that directly affect some variables but not all. With the help of this formula, it is possible to test whether a model involving instrumental variables may account for the data, or, conversely, whether a given variables can be deemed instrumental. The paper concerns the probabilistic evaluation of plans in the presence of unmeasured variables, each plan consisting of several concurrent or sequential actions. We establish a graphical criterion for recognizing when the effects of a given plan can be predicted from passive observations on measured variables only. When the criterion is satisfied, a closed-form expression is provided for the probability that the plan will achieve a specified goal. We show that there is a general, informative and reliable procedure for discovering causal relations when, for all the investigator knows, both latent variables and selection bias may be at work. Given information about conditional independence and dependence relations between measured variables, even when latent variables and selection bias may be present, there are sufficient conditions for reliably concluding that there is a causal path from one variable to another, and sufficient conditions for reliably concluding when no such causal path exists. This paper develops a simple calculus for order of magnitude reasoning. A semantics is given with soundness and completeness results. Order of magnitude probability functions are easily defined and turn out to be equivalent to kappa functions, which are slight generalizations of Spohn's Natural Conditional Functions. The calculus also gives rise to an order of magnitude decision theory, which can be used to justify an amended version of Pearl's decision theory for kappa functions, although the latter is weaker and less expressive. This paper discusses a method for implementing a probabilistic inference system based on an extended relational data model. This model provides a unified approach for a variety of applications such as dynamic programming, solving sparse linear equations, and constraint propagation. In this framework, the probability model is represented as a generalized relational database. Subsequent probabilistic requests can be processed as standard relational queries. Conventional database management systems can be easily adopted for implementing such an approximate reasoning system. In this paper, we present two methods to provide explanations for reasoning with belief functions in the valuation-based systems. One approach, inspired by Strat's method, is based on sensitivity analysis, but its computation is simpler thus easier to implement than Strat's. The other one is to examine the impact of evidence on the conclusion based on the measure of the information content in the evidence. We show the property of additivity for the pieces of evidence that are conditional independent within the context of the valuation-based systems. We will give an example to show how these approaches are applied in an evidential network. This paper reports experiments with the causal independence inference algorithm proposed by Zhang and Poole (1994b) on the CPSC network created by Pradhan et al. (1994). It is found that the algorithm is able to answer 420 of the 422 possible zero-observation queries, 94 of 100 randomly generated five-observation queries, 87 of 100 randomly generated ten-observation queries, and 69 of 100 randomly generated twenty-observation queries. We discuss the problem of construction of inference procedures which can manipulate with uncertainties measured in ordinal scales and fulfill to the property of strict monotonicity of conclusion. The class of A-valuations of plausibility is considered where operations based only on information about linear ordering of plausibility values are used. In this class the modus ponens generating function fulfiling to the property of strict monotonicity of conclusions is introduced. We show how to find a small loop curser in a Bayesian network. Finding such a loop cutset is the first step in the method of conditioning for inference. Our algorithm for finding a loop cutset, called MGA, finds a loop cutset which is guaranteed in the worst case to contain less than twice the number of variables contained in a minimum loop cutset. We test MGA on randomly generated graphs and find that the average ratio between the number of instances associated with the algorithms' output and the number of instances associated with a minimum solution is 1.22. Some instances of creative thinking require an agent to build and test hypothetical theories. Such a reasoner needs to explore the space of not only those situations that have occurred in the past, but also those that are rationally conceivable. In this paper we present a formalism for exploring the space of conceivable situation-models for those domains in which the knowledge is primarily probabilistic in nature. The formalism seeks to construct consistent, minimal, and desirable situation-descriptions by selecting suitable domain-attributes and dependency relationships from the available domain knowledge. Simulation schemes for probabilistic inference in Bayesian belief networks offer many advantages over exact algorithms; for example, these schemes have a linear and thus predictable runtime while exact algorithms have exponential runtime. Experiments have shown that likelihood weighting is one of the most promising simulation schemes. In this paper, we present a new simulation scheme that generates samples more evenly spread in the sample space than the likelihood weighting scheme. We show both theoretically and experimentally that the stratified scheme outperforms likelihood weighting in average runtime and error in estimates of beliefs. A BN2O network is a two level belief net in which the parent interactions are modeled using the noisy-or interaction model. In this paper we discuss application of the SPI local expression language to efficient inference in large BN2O networks. In particular, we show that there is significant structure, which can be exploited to improve over the Quickscore result. We further describe how symbolic techniques can provide information which can significantly reduce the computation required for computing all cause posterior marginals. Finally, we present a novel approximation technique with preliminary experimental results. We investigate planning in time-critical domains represented as Markov Decision Processes, showing that search based techniques can be a very powerful method for finding close to optimal plans. To reduce the computational cost of planning in these domains, we execute actions as we construct the plan, and sacrifice optimality by searching to a fixed depth and using a heuristic function to estimate the value of states. Although this paper concentrates on the search algorithm, we also discuss ways of constructing heuristic functions suitable for this approach. Our results show that by interleaving search and execution, close to optimal policies can be found without the computational requirements of other approaches. Penalty logic, introduced by Pinkas, associates to each formula of a knowledge base the price to pay if this formula is violated. Penalties may be used as a criterion for selecting preferred consistent subsets in an inconsistent knowledge base, thus inducing a non-monotonic inference relation. A precise formalization and the main properties of penalty logic and of its associated non-monotonic inference relation are given in the first part. We also show that penalty logic and Dempster-Shafer theory are related, especially in the infinitesimal case. Possibilistic conditional independence is investigated: we propose a definition of this notion similar to the one used in probability theory. The links between independence and non-interactivity are investigated, and properties of these relations are given. The influence of the conjunction used to define a conditional measure of possibility is also highlighted: we examine three types of conjunctions: Lukasiewicz - like T-norms, product-like T-norms and the minimum operator. Backward simulation is an approximate inference technique for Bayesian belief networks. It differs from existing simulation methods in that it starts simulation from the known evidence and works backward (i.e., contrary to the direction of the arcs). The technique's focus on the evidence leads to improved convergence in situations where the posterior beliefs are dominated by the evidence rather than by the prior probabilities. Since this class of situations is large, the technique may make practical the application of approximate inference in Bayesian belief networks to many real-world problems. This paper discusses the problem of abstracting conditional probabilistic actions. We identify two distinct types of abstraction: intra-action abstraction and inter-action abstraction. We define what it means for the abstraction of an action to be correct and then derive two methods of intra-action abstraction and two methods of inter-action abstraction which are correct according to this criterion. We illustrate the developed techniques by applying them to actions described with the temporal action representation used in the DRIPS decision-theoretic planner and we describe how the planner uses abstraction to reduce the complexity of planning. Within the possibilistic approach to uncertainty modeling, the paper presents a modal logical system to reason about qualitative (comparative) statements of the possibility (and necessity) of fuzzy propositions. We relate this qualitative modal logic to the many--valued analogues MVS5 and MVKD45 of the well known modal logics of knowledge and belief S5 and KD45 respectively. Completeness results are obtained for such logics and therefore, they extend previous existing results for qualitative possibilistic logics in the classical non-fuzzy setting. A logic is defined that allows to express information about statistical probabilities and about degrees of belief in specific propositions. By interpreting the two types of probabilities in one common probability space, the semantics given are well suited to model the influence of statistical information on the formation of subjective beliefs. Cross entropy minimization is a key element in these semantics, the use of which is justified by showing that the resulting logic exhibits some very reasonable properties. The paper deals with optimality issues in connection with updating beliefs in networks. We address two processes: triangulation and construction of junction trees. In the first part, we give a simple algorithm for constructing an optimal junction tree from a triangulated network. In the second part, we argue that any exact method based on local calculations must either be less efficient than the junction tree method, or it has an optimality problem equivalent to that of triangulation. This paper examines the problem of constructing belief networks to evaluate plans produced by an knowledge-based planner. Techniques are presented for handling various types of complicating plan features. These include plans with context-dependent consequences, indirect consequences, actions with preconditions that must be true during the execution of an action, contingencies, multiple levels of abstraction multiple execution agents with partially-ordered and temporally overlapping actions, and plans which reference specific times and time durations. This paper examines methods of decision making that are able to accommodate limitations on both the form in which uncertainty pertaining to a decision problem can be realistically represented and the amount of computing time available before a decision must be made. The methods are anytime algorithms in the sense of Boddy and Dean 1991. Techniques are presented for use with Frisch and Haddawy's [1992] anytime deduction system, with an anytime adaptation of Nilsson's [1986] probabilistic logic, and with a probabilistic database model. While influence diagrams have many advantages as a representation framework for Bayesian decision problems, they have a serious drawback in handling asymmetric decision problems. To be represented in an influence diagram, an asymmetric decision problem must be symmetrized. A considerable amount of unnecessary computation may be involved when a symmetrized influence diagram is evaluated by conventional algorithms. In this paper we present an approach for avoiding such unnecessary computation in influence diagram evaluation. Model-based diagnosis reasons backwards from a functional schematic of a system to isolate faults given observations of anomalous behavior. We develop a fully probabilistic approach to model based diagnosis and extend it to support hierarchical models. Our scheme translates the functional schematic into a Bayesian network and diagnostic inference takes place in the Bayesian network. A Bayesian network diagnostic inference algorithm is modified to take advantage of the hierarchy to give computational gains. The semigraphoid closure of every couple of CI-statements (GI=conditional independence) is a stochastic CI-model. As a consequence of this result it is shown that every probabilistically sound inference rule for CI-model, having at most two antecedents, is derivable from the semigraphoid inference rules. This justifies the use of semigraphoids as approximations of stochastic CI-models in probabilistic reasoning. The list of all 19 potential dominant elements of the mentioned semigraphoid closure is given as a byproduct. System Z+ [Goldszmidt and Pearl, 1991, Goldszmidt, 1992] is a formalism for reasoning with normality defaults of the form "typically if phi then + (with strength cf)" where 6 is a positive integer. The system has a critical shortcoming in that it does not sanction inheritance across exceptional subclasses. In this paper we propose an extension to System Z+ that rectifies this shortcoming by extracting additional conditions between worlds from the defaults database. We show that the additional constraints do not change the notion of the consistency of a database. We also make comparisons with competing default reasoning systems. By analyzing the relationships among chance, weight of evidence and degree of beliefwe show that the assertion "probability functions are special cases of belief functions" and the assertion "Dempster's rule can be used to combine belief functions based on distinct bodies of evidence" together lead to an inconsistency in Dempster-Shafer theory. To solve this problem, we must reject some fundamental postulates of the theory. We introduce a new approach for uncertainty management that shares many intuitive ideas with D-S theory, while avoiding this problem. One important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners. We take a general approach to uncertainty on product spaces, and give sufficient conditions for the independence structures of uncertainty measures to satisfy graphoid properties. Since these conditions are arguably more intuitive than some of the graphoid properties, they can be viewed as explanations why probability and certain other formalisms generate graphoids. The conditions include a sufficient condition for the Intersection property which can still apply even if there is a strong logical relations hip between the variables. We indicate how these results can be used to produce theories of qualitative conditional probability which are semi-graphoids and graphoids. In the existing evidential networks with belief functions, the relations among the variables are always represented by joint belief functions on the product space of the involved variables. In this paper, we use conditional belief functions to represent such relations in the network and show some relations of these two kinds of representations. We also present a propagation algorithm for such networks. By analyzing the properties of some special evidential networks with conditional belief functions, we show that the reasoning process can be simplified in such kinds of networks. It is well known that conditional independence can be used to factorize a joint probability into a multiplication of conditional probabilities. This paper proposes a constructive definition of inter-causal independence, which can be used to further factorize a conditional probability. An inference algorithm is developed, which makes use of both conditional independence and inter-causal independence to reduce inference complexity in Bayesian networks. We address the problem of causal interpretation of the graphical structure of Bayesian belief networks (BBNs). We review the concept of causality explicated in the domain of structural equations models and show that it is applicable to BBNs. In this view, which we call mechanism-based, causality is defined within models and causal asymmetries arise when mechanisms are placed in the context of a system. We lay the link between structural equations models and BBNs models and formulate the conditions under which the latter can be given causal interpretation. The primary theme of this investigation is a decision theoretic account of conditional ought statements (e.g., "You ought to do A, if C") that rectifies glaring deficiencies in classical deontic logic. The resulting account forms a sound basis for qualitative decision theory, thus providing a framework for qualitative planning under uncertainty. In particular, we show that adding causal relationships (in the form of a single graph) as part of an epistemic state is sufficient to facilitate the analysis of action sequences, their consequences, their interaction with observations, their expected utilities and, hence, the synthesis of plans and strategies under uncertainty. Spiegelhalter and Lauritzen [15] studied sequential learning in Bayesian networks and proposed three models for the representation of conditional probabilities. A forth model, shown here, assumes that the parameter distribution is given by a product of Gaussian functions and updates them from the _ and _r messages of evidence propagation. We also generalize the noisy OR-gate for multivalued variables, develop the algorithm to compute probability in time proportional to the number of parents (even in networks with loops) and apply the learning model to this gate. I introduce a temporal belief-network representation of causal independence that a knowledge engineer can use to elicit probabilistic models. Like the current, atemporal belief-network representation of causal independence, the new representation makes knowledge acquisition tractable. Unlike the atemproal representation, however, the temporal representation can simplify inference, and does not require the use of unobservable variables. The representation is less general than is the atemporal representation, but appears to be useful for many practical applications. When eliciting probability models from experts, knowledge engineers may compare the results of the model with expert judgment on test scenarios, then adjust model parameters to bring the behavior of the model more in line with the expert's intuition. This paper presents a methodology for analytic computation of sensitivity values to measure the impact of small changes in a network parameter on a target probability value or distribution. These values can be used to guide knowledge elicitation. They can also be used in a gradient descent algorithm to estimate parameter values that maximize a measure of goodness-of-fit to both local and holistic probability assessments. Causal Models are like Dependency Graphs and Belief Nets in that they provide a structure and a set of assumptions from which a joint distribution can, in principle, be computed. Unlike Dependency Graphs, Causal Models are models of hierarchical and/or parallel processes, rather than models of distributions (partially) known to a model builder through some sort of gestalt. As such, Causal Models are more modular, easier to build, more intuitive, and easier to understand than Dependency Graph Models. Causal Models are formally defined and Dependency Graph Models are shown to be a special case of them. Algorithms supporting inference are presented. Parsimonious methods for eliciting dependent probabilities are presented. We investigate the value of extending the completeness of a decision model along different dimensions of refinement. Specifically, we analyze the expected value of quantitative, conceptual, and structural refinement of decision models. We illustrate the key dimensions of refinement with examples. The analyses of value of model refinement can be used to focus the attention of an analyst or an automated reasoning system on extensions of a decision model associated with the greatest expected value. Valuation networks have been proposed as graphical representations of valuation-based systems (VBSs). The VBS framework is able to capture many uncertainty calculi including probability theory, Dempster-Shafer's belief-function theory, Spohn's epistemic belief theory, and Zadeh's possibility theory. In this paper, we show how valuation networks encode conditional independence relations. For the probabilistic case, the class of probability models encoded by valuation networks includes undirected graph models, directed acyclic graph models, directed balloon graph models, and recursive causal graph models. The Noisy-Or model is convenient for describing a class of uncertain relationships in Bayesian networks [Pearl 1988]. Pearl describes the Noisy-Or model for Boolean variables. Here we generalize the model to nary input and output variables and to arbitrary functions other than the Boolean OR function. This generalization is a useful modeling aid for construction of Bayesian networks. We illustrate with some examples including digital circuit diagnosis and network reliability analysis. One of the most difficult aspects of modeling complex dilemmas in decision-analytic terms is composing a diagram of relevance relations from a set of domain concepts. Decision models in domains such as medicine, however, exhibit certain prototypical patterns that can guide the modeling process. Medical concepts can be classified according to semantic types that have characteristic positions and typical roles in an influence-diagram model. We have developed a graph-grammar production system that uses such inherent interrelationships among medical terms to facilitate the modeling of medical decisions. Previous algorithms for the construction of Bayesian belief network structures from data have been either highly dependent on conditional independence (CI) tests, or have required an ordering on the nodes to be supplied by the user. We present an algorithm that integrates these two approaches - CI tests are used to generate an ordering on the nodes from the database which is then used to recover the underlying Bayesian network structure using a non CI based method. Results of preliminary evaluation of the algorithm on two networks (ALARM and LED) are presented. We also discuss some algorithm performance issues and open problems. We describe the integration of logical and uncertain reasoning methods to identify the likely source and location of software problems. To date, software engineers have had few tools for identifying the sources of error in complex software packages. We describe a method for diagnosing software problems through combining logical and uncertain reasoning analyses. Our preliminary results suggest that such methods can be of value in directing the attention of software engineers to paths of an algorithm that have the highest likelihood of harboring a programming error. Propositional representation services such as truth maintenance systems offer powerful support for incremental, interleaved, problem-model construction and evaluation. Probabilistic inference systems, in contrast, have lagged behind in supporting this incrementality typically demanded by problem solvers. The problem, we argue, is that the basic task of probabilistic inference is typically formulated at too large a grain-size. We show how a system built around a smaller grain-size inference task can have the desired incrementality and serve as the basis for a low-level (propositional) probabilistic representation service. Intercausal reasoning is a common inference pattern involving probabilistic dependence of causes of an observed common effect. The sign of this dependence is captured by a qualitative property called product synergy. The current definition of product synergy is insufficient for intercausal reasoning where there are additional uninstantiated causes of the common effect. We propose a new definition of product synergy and prove its adequacy for intercausal reasoning with direct and indirect evidence for the common effect. The new definition is based on a new property matrix half positive semi-definiteness, a weakened form of matrix positive semi-definiteness. We examine two types of similarity networks each based on a distinct notion of relevance. For both types of similarity networks we present an efficient inference algorithm that works under the assumption that every event has a nonzero probability of occurrence. Another inference algorithm is developed for type 1 similarity networks that works under no restriction, albeit less efficiently. Tree structures have been shown to provide an efficient framework for propagating beliefs [Pearl,1986]. This paper studies the problem of finding an optimal approximating tree. The star decomposition scheme for sets of three binary variables [Lazarsfeld,1966; Pearl,1986] is shown to enhance the class of probability distributions that can support tree structures; such structures are called tree-decomposable structures. The logarithm scoring rule is found to be an appropriate optimality criterion to evaluate different tree-decomposable structures. Characteristics of such structures closest to the actual belief network are identified using the logarithm rule, and greedy and exact techniques are developed to find the optimal approximation. The potential influence diagram is a generalization of the standard "conditional" influence diagram, a directed network representation for probabilistic inference and decision analysis [Ndilikilikesha, 1991]. It allows efficient inference calculations corresponding exactly to those on undirected graphs. In this paper, we explore the relationship between potential and conditional influence diagrams and provide insight into the properties of the potential influence diagram. In particular, we show how to convert a potential influence diagram into a conditional influence diagram, and how to view the potential influence diagram operations in terms of the conditional influence diagram. To determine the value of perfect information in an influence diagram, one needs first to modify the diagram to reflect the change in information availability, and then to compute the optimal expected values of both the original diagram and the modified diagram. The value of perfect information is the difference between the two optimal expected values. This paper is about how to speed up the computation of the optimal expected value of the modified diagram by making use of the intermediate computation results obtained when computing the optimal expected value of the original diagram. A major reason behind the success of probability calculus is that it possesses a number of valuable tools, which are based on the notion of probabilistic independence. In this paper, I identify a notion of logical independence that makes some of these tools available to a class of propositional databases, called argument databases. Specifically, I suggest a graphical representation of argument databases, called argument networks, which resemble Bayesian networks. I also suggest an algorithm for reasoning with argument networks, which resembles a basic algorithm for reasoning with Bayesian networks. Finally, I show that argument networks have several applications: Nonmonotonic reasoning, truth maintenance, and diagnosis. In this paper some initial work towards a new approach to qualitative reasoning under uncertainty is presented. This method is not only applicable to qualitative probabilistic reasoning, as is the case with other methods, but also allows the qualitative propagation within networks of values based upon possibility theory and Dempster-Shafer evidence theory. The method is applied to two simple networks from which a large class of directed graphs may be constructed. The results of this analysis are used to compare the qualitative behaviour of the three major quantitative uncertainty handling formalisms, and to demonstrate that the qualitative integration of the formalisms is possible under certain assumptions. The classical propositional assumption-based model is extended to incorporate probabilities for the assumptions. Then it is placed into the framework of evidence theory. Several authors like Laskey, Lehner (1989) and Provan (1990) already proposed a similar point of view, but the first paper is not as much concerned with mathematical foundations, and Provan's paper develops into a different direction. Here we thoroughly develop and present the mathematical foundations of this theory, together with computational methods adapted from Reiter, De Kleer (1987) and Inoue (1992). Finally, recently proposed techniques for computing degrees of support are presented. This paper presents a procedure to determine a complete belief function from the known values of belief for some of the subsets of the frame of discerment. The method is based on the principle of minimum commitment and a new principle called the focusing principle. This additional principle is based on the idea that belief is specified for the most relevant sets: the focal elements. The resulting procedure is compared with existing methods of building complete belief functions: the minimum specificity principle and the least commitment principle. In a probability-based reasoning system, Bayes' theorem and its variations are often used to revise the system's beliefs. However, if the explicit conditions and the implicit conditions of probability assignments `me properly distinguished, it follows that Bayes' theorem is not a generally applicable revision rule. Upon properly distinguishing belief revision from belief updating, we see that Jeffrey's rule and its variations are not revision rules, either. Without these distinctions, the limitation of the Bayesian approach is often ignored or underestimated. Revision, in its general form, cannot be done in the Bayesian approach, because a probability distribution function alone does not contain the information needed by the operation. In this paper, we present a decision support system based on belief functions and the pignistic transformation. The system is an integration of an evidential system for belief function propagation and a valuation-based system for Bayesian decision analysis. The two subsystems are connected through the pignistic transformation. The system takes as inputs the user's "gut feelings" about a situation and suggests what, if any, are to be tested and in what order, and it does so with a user friendly interface. In this paper we describe a novel method for evidential reasoning [1]. It involves modelling the process of evidential reasoning in three steps, namely, evidence structure construction, evidence accumulation, and decision making. The proposed method, called RES, is novel in that evidence strength is associated with an evidential support relationship (an argument) between a pair of statements and such strength is carried by comparison between arguments. This is in contrast to the onventional approaches, where evidence strength is represented numerically and is associated with a statement. An algorithm for generating the structure of a directed acyclic graph from data using the notion of causal input lists is presented. The algorithm manipulates the ordering of the variables with operations which very much resemble arc reversal. Operations are only applied if the DAG after the operation represents at least the independencies represented by the DAG before the operation until no more arcs can be removed from the DAG. The resulting DAG is a minimal l-map. Possibilistic logic has been proposed as a numerical formalism for reasoning with uncertainty. There has been interest in developing qualitative accounts of possibility, as well as an explanation of the relationship between possibility and modal logics. We present two modal logics that can be used to represent and reason with qualitative statements of possibility and necessity. Within this modal framework, we are able to identify interesting relationships between possibilistic logic, beliefs and conditionals. In particular, the most natural conditional definable via possibilistic means for default reasoning is identical to Pearl's conditional for e-semantics. Experts do not always feel very, comfortable when they have to give precise numerical estimations of certainty degrees. In this paper we present a qualitative approach which allows for attaching partially ordered symbolic grades to logical formulas. Uncertain information is expressed by means of parameterized modal operators. We propose a semantics for this multimodal logic and give a sound and complete axiomatization. We study the links with related approaches and suggest how this framework might be used to manage both uncertain and incomplere knowledge. We have developed a probabilistic forecasting methodology through a synthesis of belief network models and classical time-series analysis. We present the dynamic network model (DNM) and describe methods for constructing, refining, and performing inference with this representation of temporal probabilistic knowledge. The DNM representation extends static belief-network models to more general dynamic forecasting models by integrating and iteratively refining contemporaneous and time-lagged dependencies. We discuss key concepts in terms of a model for forecasting U.S. car sales in Japan. We report on an experimental investigation into opportunities for parallelism in beliefnet inference. Specifically, we report on a study performed of the available parallelism, on hypercube style machines, of a set of randomly generated belief nets, using factoring (SPI) style inference algorithms. Our results indicate that substantial speedup is available, but that it is available only through parallelization of individual conformal product operations, and depends critically on finding an appropriate factoring. We find negligible opportunity for parallelism at the topological, or clustering tree, level. This article offers a modification of Chow and Liu's learning algorithm in the context of handwritten digit recognition. The modified algorithm directs the user to group digits into several classes consisting of digits that are hard to distinguish and then constructing an optimal conditional tree representation for each class of digits instead of for each single digit as done by Chow and Liu (1968). Advantages and extensions of the new method are discussed. Related works of Wong and Wang (1977) and Wong and Poon (1989) which offer a different entropy-based learning algorithm are shown to rest on inappropriate assumptions. In the probabilistic approach to uncertainty management the input knowledge is usually represented by means of some probability distributions. In this paper we assume that the input knowledge is given by two discrete conditional probability distributions, represented by two stochastic matrices P and Q. The consistency of the knowledge base is analyzed. Coherence conditions and explicit formulas for the extension to marginal distributions are obtained in some special cases. A computational scheme for reasoning about dynamic systems using (causal) probabilistic networks is presented. The scheme is based on the framework of Lauritzen and Spiegelhalter (1988), and may be viewed as a generalization of the inference methods of classical time-series analysis in the sense that it allows description of non-linear, multivariate dynamic systems with complex conditional independence structures. Further, the scheme provides a method for efficient backward smoothing and possibilities for efficient, approximate forecasting methods. The scheme has been implemented on top of the HUGIN shell. The fundamental updating process in the transferable belief model is related to the concept of specialization and can be described by a specialization matrix. The degree of belief in the truth of a proposition is a degree of justified support. The Principle of Minimal Commitment implies that one should never give more support to the truth of a proposition than justified. We show that Dempster's rule of conditioning corresponds essentially to the least committed specialization, and that Dempster's rule of combination results essentially from commutativity requirements. The concept of generalization, dual to thc concept of specialization, is described. We discuss problems for convex Bayesian decision making and uncertainty representation. These include the inability to accommodate various natural and useful constraints and the possibility of an analog of the classical Dutch Book being made against an agent behaving in accordance with convex Bayesian prescriptions. A more general set-based Bayesianism may be as tractable and would avoid the difficulties we raise. This paper presents a Bayesian framework for assessing the adequacy of a model without the necessity of explicitly enumerating a specific alternate model. A test statistic is developed for tracking the performance of the model across repeated problem instances. Asymptotic methods are used to derive an approximate distribution for the test statistic. When the model is rejected, the individual components of the test statistic can be used to guide search for an alternate model. The ideal Bayesian agent reasons from a global probability model, but real agents are restricted to simplified models which they know to be adequate only in restricted circumstances. Very little formal theory has been developed to help fallibly rational agents manage the process of constructing and revising small world models. The goal of this paper is to present a theoretical framework for analyzing model management approaches. For a probability forecasting problem, a search process over small world models is analyzed as an approximation to a larger-world model which the agent cannot explicitly enumerate or compute. Conditions are given under which the sequence of small-world models converges to the larger-world probabilities. Automated decision making is often complicated by the complexity of the knowledge involved. Much of this complexity arises from the context sensitive variations of the underlying phenomena. We propose a framework for representing descriptive, context-sensitive knowledge. Our approach attempts to integrate categorical and uncertain knowledge in a network formalism. This paper outlines the basic representation constructs, examines their expressiveness and efficiency, and discusses the potential applications of the framework. Bayesian networks are directed acyclic graphs representing independence relationships among a set of random variables. A random variable can be regarded as a set of exhaustive and mutually exclusive propositions. We argue that there are several drawbacks resulting from the propositional nature and acyclic structure of Bayesian networks. To remedy these shortcomings, we propose a probabilistic network where nodes represent unary predicates and which may contain directed cycles. The proposed representation allows us to represent domain knowledge in a single static network even though we cannot determine the instantiations of the predicates before hand. The ability to deal with cycles also enables us to handle cyclic causal tendencies and to recognize recursive plans. We address the problem of supporting empirical probabilities in monadic logic databases. Though the semantics of multivalued logic programs has been studied extensively, the treatment of probabilities as results of statistical findings has not been studied in logic programming/deductive databases. We develop a model-theoretic characterization of logic databases that facilitates such a treatment. We present an algorithm for checking consistency of such databases and prove its total correctness. We develop a sound and complete query processing procedure for handling queries to such databases. The paper describes aHUGIN, a tool for creating adaptive systems. aHUGIN is an extension of the HUGIN shell, and is based on the methods reported by Spiegelhalter and Lauritzen (1990a). The adaptive systems resulting from aHUGIN are able to adjust the C011ditional probabilities in the model. A short analysis of the adaptation task is given and the features of aHUGIN are described. Finally a session with experiments is reported and the results are discussed. Probabilistic reasoning systems combine different probabilistic rules and probabilistic facts to arrive at the desired probability values of consequences. In this paper we describe the MESA-algorithm (Maximum Entropy by Simulated Annealing) that derives a joint distribution of variables or propositions. It takes into account the reliability of probability values and can resolve conflicts between contradictory statements. The joint distribution is represented in terms of marginal distributions and therefore allows to process large inference networks and to determine desired probability values with high precision. The procedure derives a maximum entropy distribution subject to the given constraints. It can be applied to inference networks of arbitrary topology and may be extended into a number of directions. An expert classification system having statistical information about the prior probabilities of the different classes should be able to use this knowledge to reduce the amount of additional information that it must collect, e.g., through questions, in order to make a correct classification. This paper examines how best to use such prior information and additional information-collection opportunities to reduce uncertainty about the class to which a case belongs, thus minimizing the average cost or effort required to correctly classify new cases. The analysis of decision making under uncertainty is closely related to the analysis of probabilistic inference. Indeed, much of the research into efficient methods for probabilistic inference in expert systems has been motivated by the fundamental normative arguments of decision theory. In this paper we show how the developments underlying those efficient methods can be applied immediately to decision problems. In addition to general approaches which need know nothing about the actual probabilistic inference method, we suggest some simple modifications to the clustering family of algorithms in order to efficiently incorporate decision making capabilities. This paper introduces the notions of independence and conditional independence in valuation-based systems (VBS). VBS is an axiomatic framework capable of representing many different uncertainty calculi. We define independence and conditional independence in terms of factorization of the joint valuation. The definitions of independence and conditional independence in VBS generalize the corresponding definitions in probability theory. Our definitions apply not only to probability theory, but also to Dempster-Shafer's belief-function theory, Spohn's epistemic-belief theory, and Zadeh's possibility theory. In fact, they apply to any uncertainty calculi that fit in the framework of valuation-based systems. This paper discusses a target tracking problem in which no dynamic mathematical model is explicitly assumed. A nonlinear filter based on the fuzzy If-then rules is developed. A comparison with a Kalman filter is made, and empirical results show that the performance of the fuzzy filter is better. Intensive simulations suggest that theoretical justification of the empirical results is possible. The DUCK-calculus presented here is a recent approach to cope with probabilistic uncertainty in a sound and efficient way. Uncertain rules with bounds for probabilities and explicit conditional independences can be maintained incrementally. The basic inference mechanism relies on local bounds propagation, implementable by deductive databases with a bottom-up fixpoint evaluation. In situations, where no precise bounds are deducible, it can be combined with simple operations research techniques on a local scope. In particular, we provide new precise analytical bounds for probabilistic entailment. Jeffrey's rule has been generalized by Wagner to the case in which new evidence bounds the possible revisions of a prior probability below by a Dempsterian lower probability. Classical probability kinematics arises within this generalization as the special case in which the evidentiary focal elements of the bounding lower probability are pairwise disjoint. We discuss a twofold extension of this generalization, first allowing the lower bound to be any two-monotone capacity and then allowing the prior to be a lower envelope. In this paper, a unified framework for representing uncertain information based on the notion of an interval structure is proposed. It is shown that the lower and upper approximations of the rough-set model, the lower and upper bounds of incidence calculus, and the belief and plausibility functions all obey the axioms of an interval structure. An interval structure can be used to synthesize the decision rules provided by the experts. An efficient algorithm to find the desirable set of rules is developed from a set of sound and complete inference axioms. This paper examines the interdependence generated between two parent nodes with a common instantiated child node, such as two hypotheses sharing common evidence. The relation so generated has been termed "intercausal." It is shown by construction that inter-causal independence is possible for binary distributions at one state of evidence. For such "CICI" distributions, the two measures of inter-causal effect, "multiplicative synergy" and "additive synergy" are equal. The well known "noisy-or" model is an example of such a distribution. This introduces novel semantics for the noisy-or, as a model of the degree of conflict among competing hypotheses of a common observation. In this paper, we consider several types of information and methods of combination associated with incomplete probabilistic systems. We discriminate between 'a priori' and evidential information. The former one is a description of the whole population, the latest is a restriction based on observations for a particular case. Then, we propose different combination methods for each one of them. We also consider conditioning as the heterogeneous combination of 'a priori' and evidential information. The evidential information is represented as a convex set of likelihood functions. These will have an associated possibility distribution with behavior according to classical Possibility Theory. This paper presents a Bayesian method for constructing Bayesian belief networks from a database of cases. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. Results are presented of a preliminary evaluation of an algorithm for constructing a belief network from a database of cases. We relate the methods in this paper to previous work, and we discuss open problems. This paper discuses multiple Bayesian networks representation paradigms for encoding asymmetric independence assertions. We offer three contributions: (1) an inference mechanism that makes explicit use of asymmetric independence to speed up computations, (2) a simplified definition of similarity networks and extensions of their theory, and (3) a generalized representation scheme that encodes more types of asymmetric independence assertions than do similarity networks. We discuss representing and reasoning with knowledge about the time-dependent utility of an agent's actions. Time-dependent utility plays a crucial role in the interaction between computation and action under bounded resources. We present a semantics for time-dependent utility and describe the use of time-dependent information in decision contexts. We illustrate our discussion with examples of time-pressured reasoning in Protos, a system constructed to explore the ideal control of inference by reasoners with limit abilities. Traditional approaches to non-monotonic reasoning fail to satisfy a number of plausible axioms for belief revision and suffer from conceptual difficulties as well. Recent work on ranked preferential models (RPMs) promises to overcome some of these difficulties. Here we show that RPMs are not adequate to handle iterated belief change. Specifically, we show that RPMs do not always allow for the reversibility of belief change. This result indicates the need for numerical strengths of belief. The concept of movable evidence masses that flow from supersets to subsets as specified by experts represents a suitable framework for reasoning under uncertainty. The mass flow is controlled by specialization matrices. New evidence is integrated into the frame of discernment by conditioning or revision (Dempster's rule of conditioning), for which special specialization matrices exist. Even some aspects of non-monotonic reasoning can be represented by certain specialization matrices. Any probabilistic model of a problem is based on assumptions which, if violated, invalidate the model. Users of probability based decision aids need to be alerted when cases arise that are not covered by the aid's model. Diagnosis of model failure is also necessary to control dynamic model construction and revision. This paper presents a set of decision theoretically motivated heuristics for diagnosing situations in which a model is likely to provide an inadequate representation of the process being modeled. A series of monte carlo studies were performed to compare the behavior of some alternative procedures for reasoning under uncertainty. The behavior of several Bayesian, linear model and default reasoning procedures were examined in the context of increasing levels of calibration error. The most interesting result is that Bayesian procedures tended to output more extreme posterior belief values (posterior beliefs near 0.0 or 1.0) than other techniques, but the linear models were relatively less likely to output strong support for an erroneous conclusion. Also, accounting for the probabilistic dependencies between evidence items was important for both Bayesian and linear updating procedures. Selecting the right reference class and the right interval when faced with conflicting candidates and no possibility of establishing subset style dominance has been a problem for Kyburg's Evidential Probability system. Various methods have been proposed by Loui and Kyburg to solve this problem in a way that is both intuitively appealing and justifiable within Kyburg's framework. The scheme proposed in this paper leads to stronger statistical assertions without sacrificing too much of the intuitive appeal of Kyburg's latest proposal. In this paper we study the uses and the semantics of non-monotonic negation in probabilistic deductive data bases. Based on the stable semantics for classical logic programming, we introduce the notion of stable formula, functions. We show that stable formula, functions are minimal fixpoints of operators associated with probabilistic deductive databases with negation. Furthermore, since a. probabilistic deductive database may not necessarily have a stable formula function, we provide a stable class semantics for such databases. Finally, we demonstrate that the proposed semantics can handle default reasoning naturally in the context of probabilistic deduction. The EM-algorithm is a general procedure to get maximum likelihood estimates if part of the observations on the variables of a network are missing. In this paper a stochastic version of the algorithm is adapted to probabilistic neural networks describing the associative dependency of variables. These networks have a probability distribution, which is a special case of the distribution generated by probabilistic inference networks. Hence both types of networks can be combined allowing to integrate probabilistic rules as well as unspecified associations in a sound way. The resulting network may have a number of interesting features including cycles of probabilistic rules, hidden 'unobservable' variables, and uncertain and contradictory evidence. This paper presents a simple framework for Horn clause abduction, with probabilities associated with hypotheses. It is shown how this representation can represent any probabilistic knowledge representable in a Bayesian belief network. The main contributions are in finding a relationship between logical and probabilistic notions of evidential reasoning. This can be used as a basis for a new way to implement Bayesian Networks that allows for approximations to the value of the posterior probabilities, and also points to a way that Bayesian networks can be extended beyond a propositional language. We present PULCinella and its use in comparing uncertainty theories. PULCinella is a general tool for Propagating Uncertainty based on the Local Computation technique of Shafer and Shenoy. It may be specialized to different uncertainty theories: at the moment, Pulcinella can propagate probabilities, belief functions, Boolean values, and possibilities. Moreover, Pulcinella allows the user to easily define his own specializations. To illustrate Pulcinella, we analyze two examples by using each of the four theories above. In the first one, we mainly focus on intrinsic differences between theories. In the second one, we take a knowledge engineer viewpoint, and check the adequacy of each theory to a given problem. In general, the best explanation for a given observation makes no promises on how good it is with respect to other alternative explanations. A major deficiency of message-passing schemes for belief revision in Bayesian networks is their inability to generate alternatives beyond the second best. In this paper, we present a general approach based on linear constraint systems that naturally generates alternative explanations in an orderly and highly efficient manner. This approach is then applied to cost-based abduction problems as well as belief revision in Bayesian net works. Survey of several forms of updating, with a practical illustrative example. We study several updating (conditioning) schemes that emerge naturally from a common scenarion to provide some insights into their meaning. Updating is a subtle operation and there is no single method, no single 'good' rule. The choice of the appropriate rule must always be given due consideration. Planchet (1989) presents a mathematical survey of many rules. We focus on the practical meaning of these rules. After summarizing the several rules for conditioning, we present an illustrative example in which the various forms of conditioning can be explained. In probabilistic logic entailments, even moderate size problems can yield linear constraint systems with so many variables that exact methods are impractical. This difficulty can be remedied in many cases of interest by introducing a three valued logic (true, false, and "don't care"). The three-valued approach allows the construction of "compressed" constraint systems which have the same solution sets as their two-valued counterparts, but which may involve dramatically fewer variables. Techniques to calculate point estimates for the posterior probabilities of entailed sentences are discussed. The presence of latent variables can greatly complicate inferences about causal relations between measured variables from statistical data. In many cases, the presence of latent variables makes it impossible to determine for two measured variables A and B, whether A causes B, B causes A, or there is some common cause. In this paper I present several theorems that state conditions under which it is possible to reliably infer the causal relation between two measured variables, regardless of whether latent variables are acting or not. The local computation technique (Shafer et al. 1987, Shafer and Shenoy 1988, Shenoy and Shafer 1986) is used for propagating belief functions in so called a Markov Tree. In this paper, we describe an efficient implementation of belief function propagation on the basis of the local computation technique. The presented method avoids all the redundant computations in the propagation process, and so makes the computational complexity decrease with respect to other existing implementations (Hsia and Shenoy 1989, Zarley et al. 1988). We also give a combined algorithm for both propagation and re-propagation which makes the re-propagation process more efficient when one or more of the prior belief functions is changed. Surely we want solid foundations. What kind of castle can we build on sand? What is the point of devoting effort to balconies and minarets, if the foundation may be so weak as to allow the structure to collapse of its own weight? We want our foundations set on bedrock, designed to last for generations. Who would want an architect who cannot certify the soundness of the foundations of his buildings? In this work we are interested in the problem of energy management in Mobile Ad-hoc Network (MANET). The solving and optimization of MANET allow assisting the users to efficiently use their devices in order to minimize the batteries power consumption. In this framework, we propose a modelling of the MANET in form of a Constraint Optimization Problem called COMANET. Then, in the objective to minimize the consumption of batteries power, we present an approach based on an adaptation of the A star algorithm to the MANET problem called MANED. Finally, we expose some experimental results showing utility of this approach. In order to represent the preferences of a group of individuals, we introduce Probabilistic CP-nets (PCP-nets). PCP-nets provide a compact language for representing probability distributions over preference orderings. We argue that they are useful for aggregating preferences or modelling noisy preferences. Then we give efficient algorithms for the main reasoning problems, namely for computing the probability that a given outcome is preferred to another one, and the probability that a given outcome is optimal. As a by-product, we obtain an unexpected linear-time algorithm for checking dominance in a standard, tree-structured CP-net. We consider the problem of learning Bayesian networks (BNs) from complete discrete data. This problem of discrete optimisation is formulated as an integer program (IP). We describe the various steps we have taken to allow efficient solving of this IP. These are (i) efficient search for cutting planes, (ii) a fast greedy algorithm to find high-scoring (perhaps not optimal) BNs and (iii) tightening the linear relaxation of the IP. After relating this BN learning problem to set covering and the multidimensional 0-1 knapsack problem, we present our empirical results. These show improvements, sometimes dramatic, over earlier results. The PC algorithm learns maximally oriented causal Bayesian networks. However, there is no equivalent complete algorithm for learning the structure of relational models, a more expressive generalization of Bayesian networks. Recent developments in the theory and representation of relational models support lifted reasoning about conditional independence. This enables a powerful constraint for orienting bivariate dependencies and forms the basis of a new algorithm for learning structure. We present the relational causal discovery (RCD) algorithm that learns causal relational models. We prove that RCD is sound and complete, and we present empirical results that demonstrate effectiveness. We propose a kernel method to identify finite mixtures of nonparametric product distributions. It is based on a Hilbert space embedding of the joint distribution. The rank of the constructed tensor is equal to the number of mixture components. We present an algorithm to recover the components by partitioning the data points into clusters such that the variables are jointly conditionally independent given the cluster. This method can be used to identify finite confounders. This paper proposes an approach for the adaptation of spatial or temporal cases in a case-based reasoning system. Qualitative algebras are used as spatial and temporal knowledge representation languages. The intuition behind this adaptation approach is to apply a substitution and then repair potential inconsistencies, thanks to belief revision on qualitative algebras. A temporal example from the cooking domain is given. (The paper on which this extended abstract is based was the recipient of the best paper award of the 2012 International Conference on Case-Based Reasoning.) In Artificial Intelligence, planning refers to an area of research that proposes to develop systems that can automatically generate a result set, in the form of an integrated decision-making system through a formal procedure, known as plan. Instead of resorting to the scheduling algorithms to generate plans, it is proposed to operate the automatic learning by decision tree to optimize time. In this paper, we propose to build a classification model by induction graph from a learning sample containing plans that have an associated set of descriptors whose values change depending on each plan. This model will then operate for classifying new cases by assigning the appropriate plan. Artificial Intelligence - what is this? That is the question! In earlier papers we already gave a formal definition for AI, but if one desires to build an actual AI implementation, the following issues require attention and are treated here: the data format to be used, the idea of Undef and Nothing symbols, various ways for defining the "meaning of life", and finally, a new notion of "incorrect move". These questions are of minor importance in the theoretical discussion, but we already know the answer of the question "Does AI exist?" Now we want to make the next step and to create this program. We develop a technique for deriving data-dependent error bounds for transductive learning algorithms based on transductive Rademacher complexity. Our technique is based on a novel general error bound for transduction in terms of transductive Rademacher complexity, together with a novel bounding technique for Rademacher averages for particular algorithms, in terms of their "unlabeled-labeled" representation. This technique is relevant to many advanced graph-based transductive algorithms and we demonstrate its effectiveness by deriving error bounds to three well known algorithms. Finally, we present a new PAC-Bayesian bound for mixtures of transductive algorithms based on our Rademacher bounds. This paper presents several new tractability results for planning based on macros. We describe an algorithm that optimally solves planning problems in a class that we call inverted tree reducible, and is provably tractable for several subclasses of this class. By using macros to store partial plans that recur frequently in the solution, the algorithm is polynomial in time and space even for exponentially long plans. We generalize the inverted tree reducible class in several ways and describe modifications of the algorithm to deal with these new classes. Theoretical results are validated in experiments. Binary Decision Diagram (BDD) based set bounds propagation is a powerful approach to solving set-constraint satisfaction problems. However, prior BDD based techniques in- cur the significant overhead of constructing and manipulating graphs during search. We present a set-constraint solver which combines BDD-based set-bounds propagators with the learning abilities of a modern SAT solver. Together with a number of improvements beyond the basic algorithm, this solver is highly competitive with existing propagation based set constraint solvers. Translations between different nonmonotonic formalisms always have been an important topic in the field, in particular to understand the knowledge-representation capabilities those formalisms offer. We provide such an investigation in terms of different semantics proposed for abstract argumentation frameworks, a nonmonotonic yet simple formalism which received increasing interest within the last decade. Although the properties of these different semantics are nowadays well understood, there are no explicit results about intertranslatability. We provide such translations wrt. different properties and also give a few novel complexity results which underlie some negative results. We describe Dr.Fill, a program that solves American-style crossword puzzles. From a technical perspective, Dr.Fill works by converting crosswords to weighted CSPs, and then using a variety of novel techniques to find a solution. These techniques include generally applicable heuristics for variable and value selection, a variant of limited discrepancy search, and postprocessing and partitioning ideas. Branch and bound is not used, as it was incompatible with postprocessing and was determined experimentally to be of little practical value. Dr.Fillls performance on crosswords from the American Crossword Puzzle Tournament suggests that it ranks among the top fifty or so crossword solvers in the world. We investigate a class of first-order temporal-epistemic logics for reasoning about multi-agent systems. We encode typical properties of systems including perfect recall, synchronicity, no learning, and having a unique initial state in terms of variants of quantified interpreted systems, a first-order extension of interpreted systems. We identify several monodic fragments of first-order temporal-epistemic logic and show their completeness with respect to their corresponding classes of quantified interpreted systems. A model of story generation recently proposed by Riedl and Young casts it as planning, with the additional condition that story characters behave intentionally. This means that characters have perceivable motivation for the actions they take. I show that this condition can be compiled away (in more ways than one) to produce a classical planning problem that can be solved by an off-the-shelf classical planner, more efficiently than by Riedl and Youngs specialised planner. This paper introduces the concept of rational countefactuals which is an idea of identifying a counterfactual from the factual (whether perceived or real) that maximizes the attainment of the desired consequent. In counterfactual thinking if we have a factual statement like: Saddam Hussein invaded Kuwait and consequently George Bush declared war on Iraq then its counterfactuals is: If Saddam Hussein did not invade Kuwait then George Bush would not have declared war on Iraq. The theory of rational counterfactuals is applied to identify the antecedent that gives the desired consequent necessary for rational decision making. The rational countefactual theory is applied to identify the values of variables Allies, Contingency, Distance, Major Power, Capability, Democracy, as well as Economic Interdependency that gives the desired consequent Peace. I am Joachim Jansen and this is my research summary, part of my application to the Doctoral Consortium at ICLP'14. I am a PhD student in the Knowledge Representation and Reasoning (KRR) research group, a subgroup of the Declarative Languages and Artificial Intelligence (DTAI) group at the department of Computer Science at KU Leuven. I started my PhD in September 2012. My promotor is prof. dr. ir. Gerda Janssens and my co-promotor is prof. dr. Marc Denecker. I can be contacted at joachim.jansen@cs.kuleuven.be or at: Room 01.167 Celestijnenlaan 200A 3001 Heverlee Belgium An extended abstract / full version of a paper accepted to be presented at the Doctoral Consortium of the 30th International Conference on Logic Programming (ICLP 2014), July 19-22, Vienna, Austria The psychological state of flow has been linked to optimizing human performance. A key condition of flow emergence is a match between the human abilities and complexity of the task. We propose a simple computational model of flow for Artificial Intelligence (AI) agents. The model factors the standard agent-environment state into a self-reflective set of the agent's abilities and a socially learned set of the environmental complexity. Maximizing the flow serves as a meta control for the agent. We show how to apply the meta-control policy to a broad class of AI control policies and illustrate our approach with a specific implementation. Results in a synthetic testbed are promising and open interesting directions for future work. We introduce a logic for reasoning about evidence, that essentially views evidence as a function from prior beliefs (before making an observation) to posterior beliefs (after making the observation). We provide a sound and complete axiomatization for the logic, and consider the complexity of the decision problem. Although the reasoning in the logic is mainly propositional, we allow variables representing numbers and quantification over them. This expressive power seems necessary to capture important properties of evidence We present a propositional logic to reason about the uncertainty of events, where the uncertainty is modeled by a set of probability measures assigning an interval of probability to each event. We give a sound and complete axiomatization for the logic, and show that the satisfiability problem is NP-complete, no harder than satisfiability for propositional logic. We present a heuristic search algorithm for solving first-order MDPs (FOMDPs). Our approach combines first-order state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, we apply state abstraction directly on the FOMDP avoiding propositionalization. Such kind of abstraction is referred to as firstorder state abstraction. Secondly, guided by an admissible heuristic, the search is restricted only to those states that are reachable from the initial state. We demonstrate the usefullness of the above techniques for solving FOMDPs on a system, referred to as FCPlanner, that entered the probabilistic track of the International Planning Competition (IPC'2004). We present a novel approach to detecting and utilizing symmetries in probabilistic graphical models with two main contributions. First, we present a scalable approach to computing generating sets of permutation groups representing the symmetries of graphical models. Second, we introduce orbital Markov chains, a novel family of Markov chains leveraging model symmetries to reduce mixing times. We establish an insightful connection between model symmetries and rapid mixing of orbital Markov chains. Thus, we present the first lifted MCMC algorithm for probabilistic graphical models. Both analytical and empirical results demonstrate the effectiveness and efficiency of the approach. Two interpretations about syllogistic statements are described in this paper. One is the so-called set-based interpretation, which assumes that quantified statements and syllogisms talk about quantity-relationships between sets. The other one, the so-called conditional interpretation, assumes that quantified propositions talk about conditional propositions and how strong are the links between the antecedent and the consequent. Both interpretations are compared attending to three different questions (existential import, singular statements and non-proportional quantifiers) from the point of view of their impact on the further development of this type of reasoning. Qsmodels is a novel application of Answer Set Programming to interactive gaming environment. We describe a software architecture by which the behavior of a bot acting inside the Quake 3 Arena can be controlled by a planner. The planner is written as an Answer Set Program and is interpreted by the Smodels solver. How could we solve the machine learning and the artificial intelligence problem if we had infinite computation? Solomonoff induction and the reinforcement learning agent AIXI are proposed answers to this question. Both are known to be incomputable. In this paper, we quantify this using the arithmetical hierarchy, and prove upper and corresponding lower bounds for incomputability. We show that AIXI is not limit computable, thus it cannot be approximated using finite computation. Our main result is a limit-computable {\epsilon}-optimal version of AIXI with infinite horizon that maximizes expected rewards. Sequential allocation is a simple and attractive mechanism for the allocation of indivisible goods. Agents take turns, according to a policy, to pick items. Sequential allocation is guaranteed to return an allocation which is efficient but may not have an optimal social welfare. We consider therefore the relation between welfare and efficiency. We study the (computational) questions of what welfare is possible or necessary depending on the choice of policy. We also consider a novel control problem in which the chair chooses a policy to improve social welfare. Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods. Social norms are powerful formalism in coordinating autonomous agents' behaviour to achieve certain objectives. In this paper, we propose a dynamic normative system to enable the reasoning of the changes of norms under different circumstances, which cannot be done in the existing static normative systems. We study two important problems (norm synthesis and norm recognition) related to the autonomy of the entire system and the agents, and characterise the computational complexities of solving these problems. Personalized recommendations for new users, also known as the cold-start problem, can be formulated as a contextual bandit problem. Existing contextual bandit algorithms generally rely on features alone to capture user variability. Such methods are inefficient in learning new users' interests. In this paper we propose Latent Contextual Bandits. We consider both the benefit of leveraging a set of learned latent user classes for new users, and how we can learn such latent classes from prior users. We show that our approach achieves a better regret bound than existing algorithms. We also demonstrate the benefit of our approach using a large real world dataset and a preliminary user study. This paper presents a procedural generation method that creates visually attractive levels for the Angry Birds game. Besides being an immensely popular mobile game, Angry Birds has recently become a test bed for various artificial intelligence technologies. We propose a new approach for procedurally generating Angry Birds levels using Chinese style and Japanese style building structures. A conducted experiment confirms the effectiveness of our approach with statistical significance. Logic-based abduction finds important applications in artificial intelligence and related areas. One application example is in finding explanations for observed phenomena. Propositional abduction is a restriction of abduction to the propositional domain, and complexity-wise is in the second level of the polynomial hierarchy. Recent work has shown that exploiting implicit hitting sets and propositional satisfiability (SAT) solvers provides an efficient approach for propositional abduction. This paper investigates this earlier work and proposes a number of algorithmic improvements. These improvements are shown to yield exponential reductions in the number of SAT solver calls. More importantly, the experimental results show significant performance improvements compared to the the best approaches for propositional abduction. "Natural Language," whether spoken and attended to by humans, or processed and generated by computers, requires networked structures that reflect creative processes in semantic, syntactic, phonetic, linguistic, social, emotional, and cultural modules. Being able to produce novel and useful behavior following repeated practice gets to the root of both artificial intelligence and human language. This paper investigates the modalities involved in language-like applications that computers -- and programmers -- engage with, and aims to fine tune the questions we ask to better account for context, self-awareness, and embodiment. We investigate learning heuristics for domain-specific planning. Prior work framed learning a heuristic as an ordinary regression problem. However, in a greedy best-first search, the ordering of states induced by a heuristic is more indicative of the resulting planner's performance than mean squared error. Thus, we instead frame learning a heuristic as a learning to rank problem which we solve using a RankSVM formulation. Additionally, we introduce new methods for computing features that capture temporal interactions in an approximate plan. Our experiments on recent International Planning Competition problems show that the RankSVM learned heuristics outperform both the original heuristics and heuristics learned through ordinary regression. We argue that there already exists de facto artificial intelligence policy - a patchwork of policies impacting the field of AI's development in myriad ways. The key question related to AI policy, then, is not whether AI should be governed at all, but how it is currently being governed, and how that governance might become more informed, integrated, effective, and anticipatory. We describe the main components of de facto AI policy and make some recommendations for how AI policy can be improved, drawing on lessons from other scientific and technological domains. This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness. Intractable distributions present a common difficulty in inference within the probabilistic knowledge representation framework and variational methods have recently been popular in providing an approximate solution. In this article, we describe a perturbational approach in the form of a cumulant expansion which, to lowest order, recovers the standard Kullback-Leibler variational bound. Higher-order terms describe corrections on the variational approach without incurring much further computational cost. The relationship to other perturbational approaches such as TAP is also elucidated. We demonstrate the method on a particular class of undirected graphical models, Boltzmann machines, for which our simulation results confirm improved accuracy and enhanced stability during learning. A previously developed quantum search algorithm for solving 1-SAT problems in a single step is generalized to apply to a range of highly constrained k-SAT problems. We identify a bound on the number of clauses in satisfiability problems for which the generalized algorithm can find a solution in a constant number of steps as the number of variables increases. This performance contrasts with the linear growth in the number of steps required by the best classical algorithms, and the exponential number required by classical and quantum methods that ignore the problem structure. In some cases, the algorithm can also guarantee that insoluble problems in fact have no solutions, unlike previously proposed quantum search algorithms. We study properties of programs with monotone and convex constraints. We extend to these formalisms concepts and results from normal logic programming. They include the notions of strong and uniform equivalence with their characterizations, tight programs and Fages Lemma, program completion and loop formulas. Our results provide an abstract account of properties of some recent extensions of logic programming with aggregates, especially the formalism of lparse programs. They imply a method to compute stable models of lparse programs by means of off-the-shelf solvers of pseudo-boolean constraints, which is often much faster than the smodels system. Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scale-up potential on several hybrid optimization problems. In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics. In this paper, we construct and investigate a hierarchy of spatio-temporal formalisms that result from various combinations of propositional spatial and temporal logics such as the propositional temporal logic PTL, the spatial logics RCC-8, BRCC-8, S4u and their fragments. The obtained results give a clear picture of the trade-off between expressiveness and computational realisability within the hierarchy. We demonstrate how different combining principles as well as spatial and temporal primitives can produce NP-, PSPACE-, EXPSPACE-, 2EXPSPACE-complete, and even undecidable spatio-temporal logics out of components that are at most NP- or PSPACE-complete. In this commentary I argue that although PDDL is a very useful standard for the planning competition, its design does not properly consider the issue of domain modeling. Hence, I would not advocate its use in specifying planning domains outside of the context of the planning competition. Rather, the field needs to explore different approaches and grapple more directly with the problem of effectively modeling and utilizing all of the diverse pieces of knowledge we typically have about planning domains. We study auctions with severe bounds on the communication allowed: each bidder may only transmit t bits of information to the auctioneer. We consider both welfare- and profit-maximizing auctions under this communication restriction. For both measures, we determine the optimal auction and show that the loss incurred relative to unconstrained auctions is mild. We prove non-surprising properties of these kinds of auctions, e.g., that in optimal mechanisms bidders simply report the interval in which their valuation lies in, as well as some surprising properties, e.g., that asymmetric auctions are better than symmetric ones and that multi-round auctions reduce the communication complexity only by a linear factor. This paper describes Marvin, a planner that competed in the Fourth International Planning Competition (IPC 4). Marvin uses action-sequence-memoisation techniques to generate macro-actions, which are then used during search for a solution plan. We provide an overview of its architecture and search behaviour, detailing the algorithms used. We also empirically demonstrate the effectiveness of its features in various planning domains; in particular, the effects on performance due to the use of macro-actions, the novel features of its search behaviour, and the native support of ADL and Derived Predicates. In this paper we apply computer-aided theorem discovery technique to discover theorems about strongly equivalent logic programs under the answer set semantics. Our discovered theorems capture new classes of strongly equivalent logic programs that can lead to new program simplification rules that preserve strong equivalence. Specifically, with the help of computers, we discovered exact conditions that capture the strong equivalence between a rule and the empty set, between two rules, between two rules and one of the two rules, between two rules and another rule, and between three rules and two of the three rules. The QXORSAT problem is the quantified version of the satisfiability problem XORSAT in which the connective exclusive-or is used instead of the usual or. We study the phase transition associated with random QXORSAT instances. We give a description of this phase transition in the case of one alternation of quantifiers, thus performing an advanced practical and theoretical study on the phase transition of a quantified roblem. Functional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables. An experiment replicated and extended recent findings on psychologically realistic ways of modeling propagation of uncertainty in rule based reasoning. Within a single production rule, the antecedent evidence can be summarized by taking the maximum of disjunctively connected antecedents and the minimum of conjunctively connected antecedents. The maximum certainty factor attached to each of the rule's conclusions can be sealed down by multiplication with this summarized antecedent certainty. Heckerman's modified certainty factor technique can be used to combine certainties for common conclusions across production rules. In this paper, we extend the QMRDT probabilistic model for the domain of internal medicine to include decisions about treatments. In addition, we describe how we can use the comprehensive decision model to construct a simpler decision model for a specific patient. In so doing, we transform the task of problem formulation to that of narrowing of a larger problem. We describe a method for incrementally constructing belief networks. We have developed a network-construction language similar to a forward-chaining language using data dependencies, but with additional features for specifying distributions. Using this language, we can define parameterized classes of probabilistic models. These parameterized models make it possible to apply probabilistic reasoning to problems for which it is impractical to have a single large static model. We describe an environment that considerably simplifies the process of generating Bayesian belief networks. The system has been implemented on readily available, inexpensive hardware, and provides clarity and high performance. We present an introduction to Bayesian belief networks, discuss algorithms for inference with these networks, and delineate the classes of problems that can be solved with this paradigm. We then describe the hardware and software that constitute the system, and illustrate Ergo's use with several example In this paper we present a framework for dynamically constructing Bayesian networks. We introduce the notion of a background knowledge base of schemata, which is a collection of parameterized conditional probability statements. These schemata explicitly separate the general knowledge of properties an individual may have from the specific knowledge of particular individuals that may have these properties. Knowledge of individuals can be combined with this background knowledge to create Bayesian networks, which can then be used in any propagation scheme. We discuss the theory and assumptions necessary for the implementation of dynamic Bayesian networks, and indicate where our approach may be useful. A series of monte carlo studies were performed to assess the extent to which different inference procedures robustly output reasonable belief values in the context of increasing levels of judgmental imprecision. It was found that, when compared to an equal-weights linear model, the Bayesian procedures are more likely to deduce strong support for a hypothesis. But, the Bayesian procedures are also more likely to strongly support the wrong hypothesis. Bayesian techniques are more powerful, but are also more error prone. This paper describes a generalization of previous methods for constructing tree-structured belief network with hidden variables. The major new feature of the described method is the ability to produce a tree decomposition even when there are errors in the correlation data among the input variables. This is an important extension of existing methods since the correlational coefficients usually cannot be measured with precision. The technique involves using a greedy search algorithm that locally minimizes an error function. IDEAL (Influence Diagram Evaluation and Analysis in Lisp) is a software environment for creation and evaluation of belief networks and influence diagrams. IDEAL is primarily a research tool and provides an implementation of many of the latest developments in belief network and influence diagram evaluation in a unified framework. This paper describes IDEAL and some lessons learned during its development. In this paper, optimum decomposition of belief networks is discussed. Some methods of decomposition are examined and a new method - the method of Minimum Total Number of States (MTNS) - is proposed. The problem of optimum belief network decomposition under our framework, as under all the other frameworks, is shown to be NP-hard. According to the computational complexity analysis, an algorithm of belief network decomposition is proposed in (Wee, 1990a) based on simulated annealing. We introduce a new heuristic algorithm for the problem of finding minimum size loop cutsets in multiply connected belief networks. We compare this algorithm to that proposed in [Suemmondt and Cooper, 1988]. We provide lower bounds on the performance of these algorithms with respect to one another and with respect to optimal. We demonstrate that no heuristic algorithm for this problem cam be guaranteed to produce loop cutsets within a constant difference from optimal. We discuss experimental results based on randomly generated networks, and discuss future work and open questions. Cutset conditioning and clique-tree propagation are two popular methods for performing exact probabilistic inference in Bayesian belief networks. Cutset conditioning is based on decomposition of a subset of network nodes, whereas clique-tree propagation depends on aggregation of nodes. We describe a means to combine cutset conditioning and clique- tree propagation in an approach called aggregation after decomposition (AD). We discuss the application of the AD method in the Pathfinder system, a medical expert system that offers assistance with diagnosis in hematopathology. In this paper, we suggest marrying Dempster-Shafer (DS) theory with Knowledge Representation (KR). Born out of this marriage is the definition of "Dempster-Shafer Belief Bases", abstract data types representing uncertain knowledge that use DS theory for representing strength of belief about our knowledge, and the linguistic structures of an arbitrary KR system for representing the knowledge itself. A formal result guarantees that both the properties of the given KR system and of DS theory are preserved. The general model is exemplified by defining DS Belief Bases where First Order Logic and (an extension of) KRYPTON are used as KR systems. The implementation problem is also touched upon. We point out the need to use probability amplitudes rather than probabilities to model evidence accumulation in decision processes involving real physical sensors. Optical information processing systems are given as typical examples of systems that naturally gather evidence in this manner. We derive a new, amplitude-based generalization of the Hough transform technique used for object recognition in machine vision. We argue that one should use complex Hough accumulators and square their magnitudes to get a proper probabilistic interpretation of the likelihood that an object is present. Finally, we suggest that probability amplitudes may have natural applications in connectionist models, as well as in formulating knowledge-based reasoning problems. A framework is presented for a computational theory of probabilistic argument. The Probabilistic Reasoning Environment encodes knowledge at three levels. At the deepest level are a set of schemata encoding the system's domain knowledge. This knowledge is used to build a set of second-level arguments, which are structured for efficient recapture of the knowledge used to construct them. Finally, at the top level is a Bayesian network constructed from the arguments. The system is designed to facilitate not just propagation of beliefs and assimilation of evidence, but also the dynamic process of constructing a belief network, evaluating its adequacy, and revising it when necessary. Probability estimation by maximum entropy reconstruction of an initial relative frequency estimate from its projection onto a hypergraph model of the approximate conditional independence relations exhibited by it is investigated. The results of this study suggest that use of this estimation technique may improve the quality of decisions that must be made on the basis of limited observations over a decomposable finite product space. This paper describes a natural framework for rules, based on belief functions, which includes a repre- sentation of numerical rules, default rules and rules allowing and rules not allowing contraposition. In particular it justifies the use of the Dempster-Shafer Theory for representing a particular class of rules, Belief calculated being a lower probability given certain independence assumptions on an underlying space. It shows how a belief function framework can be generalised to other logics, including a general Monte-Carlo algorithm for calculating belief, and how a version of Reiter's Default Logic can be seen as a limiting case of a belief function formalism. Many AI researchers argue that probability theory is only capable of dealing with uncertainty in situations where a full specification of a joint probability distribution is available, and conclude that it is not suitable for application in knowledge-based systems. Probability intervals, however, constitute a means for expressing incompleteness of information. We present a method for computing such probability intervals for probabilities of interest from a partial specification of a joint probability distribution. Our method improves on earlier approaches by allowing for independency relationships between statistical variables to be exploited. We analyzed the convergence properties of likelihood- weighting algorithms on a two-level, multiply connected, belief-network representation of the QMR knowledge base of internal medicine. Specifically, on two difficult diagnostic cases, we examined the effects of Markov blanket scoring, importance sampling, demonstrating that the Markov blanket scoring and self-importance sampling significantly improve the convergence of the simulation on our model. A scientific reasoning system makes decisions using objective evidence in the form of independent experimental trials, propositional axioms, and constraints on the probabilities of events. As a first step towards this goal, we propose a system that derives probability intervals from objective evidence in those forms. Our reasoning system can manage uncertainty about data and rules in a rule based expert system. We expect that our system will be particularly applicable to diagnosis and analysis in domains with a wealth of experimental evidence such as medicine. We discuss limitations of this solution and propose future directions for this research. This work can be considered a generalization of Nilsson's "probabilistic logic" [Nil86] to intervals and experimental observations. Plan recognition does not work the same way in stories and in "real life" (people tend to jump to conclusions more in stories). We present a theory of this, for the particular case of how objects in stories (or in life) influence plan recognition decisions. We provide a Bayesian network formalization of a simple first-order theory of plans, and show how a particular network parameter seems to govern the difference between "life-like" and "story-like" response. We then show why this parameter would be influenced (in the desired way) by a model of speaker (or author) topic selection which assumes that facts in stories are typically "relevant". Unaided human decision making appears to systematically violate consistency constraints imposed by normative theories; these biases in turn appear to justify the application of formal decision-analytic models. It is argued that both claims are wrong. In particular, we will argue that the "confirmation bias" is premised on an overly narrow view of how conflicting evidence is and ought to be handled. Effective decision aiding should focus on supporting the contral processes by means of which knowledge is extended into novel situations and in which assumptions are adopted, utilized, and revised. The Non- Monotonic Probabilist represents initial work toward such an aid. We propose a norm of consistency for a mixed set of defeasible and strict sentences, based on a probabilistic semantics. This norm establishes a clear distinction between knowledge bases depicting exceptions and those containing outright contradictions. We then define a notion of entailment based also on probabilistic considerations and provide a characterization of the relation between consistency and entailment. We derive necessary and sufficient conditions for consistency, and provide a simple decision procedure for testing consistency and deciding whether a sentence is entailed by a database. Finally, it is shown that if al1 sentences are Horn clauses, consistency and entailment can be tested in polynomial time. BPS, the Bayesian Problem Solver, applies probabilistic inference and decision-theoretic control to flexible, resource-constrained problem-solving. This paper focuses on the Bayesian inference mechanism in BPS, and contrasts it with those of traditional heuristic search techniques. By performing sound inference, BPS can outperform traditional techniques with significantly less computational effort. Empirical tests on the Eight Puzzle show that after only a few hundred node expansions, BPS makes better decisions than does the best existing algorithm after several million node expansions It is suggested that an AI inference system should reflect an inference policy that is tailored to the domain of problems to which it is applied -- and furthermore that an inference policy need not conform to any general theory of rational inference or induction. We note, for instance, that Bayesian reasoning about the probabilistic characteristics of an inference domain may result in the specification of an nonBayesian procedure for reasoning within the inference domain. In this paper, the idea of an inference policy is explored in some detail. To support this exploration, the characteristics of some standard and nonstandard inference policies are examined. Bayesian Belief Networks have been largely overlooked by Expert Systems practitioners on the grounds that they do not correspond to the human inference mechanism. In this paper, we introduce an explanation mechanism designed to generate intuitive yet probabilistically sound explanations of inferences drawn by a Bayesian Belief Network. In particular, our mechanism accounts for the results obtained due to changes in the causal and the evidential support of a node. The arc reversal/node reduction approach to probabilistic inference is extended to include the case of instantiated evidence by an operation called "evidence reversal." This not only provides a technique for computing posterior joint distributions on general belief networks, but also provides insight into the methods of Pearl [1986b] and Lauritzen and Spiegelhalter [1988]. Although it is well understood that the latter two algorithms are closely related, in fact all three algorithms are identical whenever the belief network is a forest. This paper discusses a new measure that is adaptable to certain intervalic probability frameworks, possibility theory, and belief theory. As such, it has the potential for wide use in knowledge engineering, expert systems, and related problems in the human sciences. This measure (denoted here by F) has been introduced in Smithson (1988) and is more formally discussed in Smithson (1989a)o Here, I propose to outline the conceptual basis for F and compare its properties with other measures of second-order uncertainty. I will argue that F is an indicator of nonspecificity or alternatively, of freedom, as distinguished from either ambiguity or vagueness. We present a new, deterministic, distributed MAP estimation algorithm for Markov Random Fields called Local Highest Confidence First (Local HCF). The algorithm has been applied to segmentation problems in computer vision and its performance compared with stochastic algorithms. The experiments show that Local HCF finds better estimates than stochastic algorithms with much less computation. In this paper, the feasibility of using finite totally ordered probability models under Alelinnas's Theory of Probabilistic Logic [Aleliunas, 1988] is investigated. The general form of the probability algebra of these models is derived and the number of possible algebras with given size is deduced. Based on this analysis, we discuss problems of denominator-indifference and ambiguity-generation that arise in reasoning by cases and abductive reasoning. An example is given that illustrates how these problems arise. The investigation shows that a finite probability model may be of very limited usage. By probabilistic logic I mean a normative theory of belief that explains how a body of evidence affects one's degree of belief in a possible hypothesis. A new axiomatization of such a theory is presented which avoids a finite additivity axiom, yet which retains many useful inference rules. Many of the examples of this theory--its models do not use numerical probabilities. Put another way, this article gives sharper answers to the two questions: 1.What kinds of sets can used as the range of a probability function? 2.Under what conditions is the range set of a probability function isomorphic to the set of real numbers in the interval 10,1/ with the usual arithmetical operations? Computational mechanisms for uncertainty management must support interactive and incremental problem formulation, inference, hypothesis testing, and decision making. However, most current uncertainty inference systems concentrate primarily on inference, and provide no support for the larger issues. We present a computational approach to uncertainty management which provides direct support for the dynamic, incremental aspect of this task, while at the same time permitting direct representation of the structure of evidential relationships. At the same time, we show that this approach responds to the modularity concerns of Heckerman and Horvitz [Heck87]. This paper emphasizes examples of the capabilities of this approach. Another paper [D'Am89] details the representations and algorithms involved. There is uncertainty associated with the occurrence of many events in real life. In this paper we develop a temporal logic to deal with such uncertain events and outline a possible implementation in an extension of PROLOG. Events are represented as fuzzy sets with the membership function giving the possibility of occurrence of the event in a given interval of time. The developed temporal logic is simple but powerful. It can determine effectively the various temporal relations between uncertain events or their combinations. PROLOG provides a uniform substrate on which to effectively implement such a temporal logic for uncertain events This paper argues for a modal view of probability. The syntax and semantics of one particularly strong probability logic are discussed and some examples of the use of the logic are provided. We show that it is both natural and useful to think of probability as a modal operator. Contrary to popular belief in AI, a probability ranging between 0 and 1 represents a continuum between impossibility and necessity, not between simple falsity and truth. The present work provides a clear semantics for quantification into the scope of the probability operator and for higher-order probabilities. Probability logic is a language for expressing both probabilistic and logical concepts. This paper explores the role of Directed Acyclic Graphs (DAGs) as a representation of conditional independence relationships. We show that DAGs offer polynomially sound and complete inference mechanisms for inferring conditional independence relationships from a given causal set of such relationships. As a consequence, d-separation, a graphical criterion for identifying independencies in a DAG, is shown to uncover more valid independencies then any other criterion. In addition, we employ the Armstrong property of conditional independence to show that the dependence relationships displayed by a DAG are inherently consistent, i.e. for every DAG D there exists some probability distribution P that embodies all the conditional independencies displayed in D and none other. In this paper, an empirical evaluation of three inference methods for uncertain reasoning is presented in the context of Pathfinder, a large expert system for the diagnosis of lymph-node pathology. The inference procedures evaluated are (1) Bayes' theorem, assuming evidence is conditionally independent given each hypothesis; (2) odds-likelihood updating, assuming evidence is conditionally independent given each hypothesis and given the negation of each hypothesis; and (3) a inference method related to the Dempster-Shafer theory of belief. Both expert-rating and decision-theoretic metrics are used to compare the diagnostic accuracy of the inference methods. This paper describes a formal system of belief revision developed by Wolfgang Spohn and shows that this system has a parallel implementation that can be derived from an influence diagram in a manner similar to that in which Bayesian networks are derived. The proof rests upon completeness results for an axiomatization of the notion of conditional independence, with the Spohn system being used as a semantics for the relation of conditional independence. Nonmonotonic reasoning is a pattern of reasoning that allows an agent to make and retract (tentative) conclusions from inconclusive evidence. This paper gives a possible-worlds interpretation of the nonmonotonic reasoning problem based on standard decision theory and the emerging probability logic. The system's central principle is that a tentative conclusion is a decision to make a bet, not an assertion of fact. The system is rational, and as sound as the proof theory of its underlying probability log. For many years, at least since McCarthy and Hayes (1969), writers have lamented, and attempted to compensate for, the alleged fact that we often do not have adequate statistical knowledge for governing the uncertainty of belief, for making uncertain inferences, and the like. It is hardly ever spelled out what "adequate statistical knowledge" would be, if we had it, and how adequate statistical knowledge could be used to control and regulate epistemic uncertainty. When knowledge is obtained from a database, it is only possible to deduce confidence intervals for probability values. With confidence intervals replacing point values, the results in the set covering model include interval constraints for the probabilities of mutually exclusive and exhaustive explanations. The Principle of Interval Constraints ranks these explanations by determining the expected values of the probabilities based on distributions determined from the interval, constraints. This principle was developed using the Classical Approach to probability. This paper justifies the Principle of Interval Constraints with a more rigorous statement of the Classical Approach and by defending the concept of probabilities of probabilities. In this paper, we describe an abstract framework and axioms under which exact local computation of marginals is possible. The primitive objects of the framework are variables and valuations. The primitive operators of the framework are combination and marginalization. These operate on valuations. We state three axioms for these operators and we derive the possibility of local computation from the axioms. Next, we describe a propagation scheme for computing marginals of a valuation when we have a factorization of the valuation on a hypertree. Finally we show how the problem of computing marginals of joint probability distributions and joint belief functions fits the general framework. This paper focuses on probability updates in multiply-connected belief networks. Pearl has designed the method of conditioning, which enables us to apply his algorithm for belief updates in singly-connected networks to multiply-connected belief networks by selecting a loop-cutset for the network and instantiating these loop-cutset nodes. We discuss conditions that need to be satisfied by the selected nodes. We present a heuristic algorithm for finding a loop-cutset that satisfies these conditions. Dependency knowledge of the form "x is independent of y once z is known" invariably obeys the four graphoid axioms, examples include probabilistic and database dependencies. Often, such knowledge can be represented efficiently with graphical structures such as undirected graphs and directed acyclic graphs (DAGs). In this paper we show that the graphical criterion called d-separation is a sound rule for reading independencies from any DAG based on a causal input list drawn from a graphoid. The rule may be extended to cover DAGs that represent functional dependencies as well as conditional dependencies. A probabilistic method of reasoning under uncertainty is proposed based on the principle of Minimum Cross Entropy (MCE) and concept of Recursive Causal Model (RCM). The dependency and correlations among the variables are described in a special language BNDL (Belief Networks Description Language). Beliefs are propagated among the clauses of the BNDL programs representing the underlying probabilistic distributions. BNDL interpreters in both Prolog and C has been developed and the performance of the method is compared with those of the others. We introduce the operation of possibility qualification and show how. this modal-like operator can be used to represent "typical" or default knowledge in a theory of nonmonotonic reasoning. We investigate the representational power of this approach by looking at a number of prototypical problems from the nonmonotonic reasoning literature. In particular we look at the so called Yale shooting problem and its relation to priority in default reasoning. This paper examines the relationship between Shafer's belief functions and convex sets of probability distributions. Kyburg's (1986) result showed that belief function models form a subset of the class of closed convex probability distributions. This paper emphasizes the importance of Kyburg's result by looking at simple examples involving Bernoulli trials. Furthermore, it is shown that many convex sets of probability distributions generate the same belief function in the sense that they support the same lower and upper values. This has implications for a decision theoretic extension. Dempster's rule of combination is also compared with Bayes' rule of conditioning. Modifiable combining functions are a synthesis of two common approaches to combining evidence. They offer many of the advantages of these approaches and avoid some disadvantages. Because they facilitate the acquisition, representation, explanation, and modification of knowledge about combinations of evidence, they are proposed as a tool for knowledge engineers who build systems that reason under uncertainty, not as a normative theory of evidence. The combination of evidence in Dempster-Shafer theory is compared with the combination of evidence in probabilistic logic. Sufficient conditions are stated for these two methods to agree. It is then shown that these conditions are minimal in the sense that disagreement can occur when any one of them is removed. An example is given in which the traditional assumption of conditional independence of evidence on hypotheses holds and a uniform prior is assumed, but probabilistic logic and Dempster's rule give radically different results for the combination of two evidence events. In the canonical examples underlying Shafer-Dempster theory, beliefs over the hypotheses of interest are derived from a probability model for a set of auxiliary hypotheses. Beliefs are derived via a compatibility relation connecting the auxiliary hypotheses to subsets of the primary hypotheses. A belief function differs from a Bayesian probability model in that one does not condition on those parts of the evidence for which no probabilities are specified. The significance of this difference in conditioning assumptions is illustrated with two examples giving rise to identical belief functions but different Bayesian probability distributions. Dempster's rule of combination has been the most controversial part of the Dempster-Shafer (D-S) theory. In particular, Zadeh has reached a conjecture on the noncombinability of evidence from a relational model of the D-S theory. In this paper, we will describe another relational model where D-S masses are represented as conditional granular distributions. By comparing it with Zadeh's relational model, we will show how Zadeh's conjecture on combinability does not affect the applicability of Dempster's rule in our model. We present a program that manages a database of temporally scoped beliefs. The basic functionality of the system includes maintaining a network of constraints among time points, supporting a variety of fetches, mediating the application of causal rules, monitoring intervals of time for the addition of new facts, and managing data dependencies that keep the database consistent. At this level the system operates independent of any measure of belief or belief calculus. We provide an example of how an application program mi9ght use this functionality to implement a belief calculus. We present a representation of partial confidence in belief and preference that is consistent with the tenets of decision-theory. The fundamental insight underlying the representation is that if a person is not completely confident in a probability or utility assessment, additional modeling of the assessment may improve decisions to which it is relevant. We show how a traditional decision-analytic approach can be used to balance the benefits of additional modeling with associated costs. The approach can be used during knowledge acquisition to focus the attention of a knowledge engineer or expert on parts of a decision model that deserve additional refinement. Bayes belief networks and influence diagrams are tools for constructing coherent probabilistic representations of uncertain knowledge. The process of constructing such a network to represent an expert's knowledge is used to illustrate a variety of techniques which can facilitate the process of structuring and quantifying uncertain relationships. These include some generalizations of the "noisy OR gate" concept. Sensitivity analysis of generic elements of Bayes' networks provides insight into when rough probability assessments are sufficient and when greater precision may be important. A distinction is sometimes made between "statistical" and "subjective" probabilities. This is based on a distinction between "unique" events and "repeatable" events. We argue that this distinction is untenable, since all events are "unique" and all events belong to "kinds", and offer a conception of probability for A1 in which (1) all probabilities are based on -- possibly vague -- statistical knowledge, and (2) every statement in the language has a probability. This conception of probability can be applied to very rich languages. Decision tree induction systems are being used for knowledge acquisition in noisy domains. This paper develops a subjective Bayesian interpretation of the task tackled by these systems and the heuristic methods they use. It is argued that decision tree systems implicitly incorporate a prior belief that the simpler (in terms of decision tree complexity) of two hypotheses be preferred, all else being equal, and that they perform a greedy search of the space of decision rules to find one in which there is strong posterior belief. A number of improvements to these systems are then suggested. The use of numerical uncertainty representations allows better modeling of some aspects of human evidential reasoning. It also makes knowledge acquisition and system development, test, and modification more difficult. We propose that where possible, the assignment and/or refinement of rule weights should be performed automatically. We present one approach to performing this training - numerical optimization - and report on the results of some preliminary tests in training rule bases. We also show that truth maintenance can be used to make training more efficient and ask some epistemological questions raised by training rule weights. An inductive logic can be formulated in which the elements are not propositions or probability distributions, but information systems. The logic is complete for information systems with binary hypotheses, i.e., it applies to all such systems. It is not complete for information systems with more than two hypotheses, but applies to a subset of such systems. The logic is inductive in that conclusions are more informative than premises. Inferences using the formalism have a strong justification in terms of the expected value of the derived information system. Poly-trees are singly connected causal networks in which variables may arise from multiple causes. This paper develops a method of recovering ply-trees from empirically measured probability distributions of pairs of variables. The method guarantees that, if the measured distributions are generated by a causal process structured as a ply-tree then the topological structure of such tree can be recovered precisely and, in addition, the causal directionality of the branches can be determined up to the maximum extent possible. The method also pinpoints the minimum (if any) external semantics required to determine the causal relationships among the variables considered. This paper describes a heuristic Bayesian method for computing probability distributions from experimental data, based upon the multivariate normal form of the influence diagram. An example illustrates its use in medical technology assessment. This approach facilitates the integration of results from different studies, and permits a medical expert to make proper assessments without considerable statistical training. Evidential reasoning is cast as the problem of simplifying the evidence-hypothesis relation and constructing combination formulas that possess certain testable properties. Important classes of evidence as identifiers, annihilators, and idempotents and their roles in determining binary operations on intervals of reals are discussed. The appropriate way of constructing formulas for combining evidence and their limitations, for instance, in robustness, are presented. This paper discusses the semantics and proof theory of Nilsson's probabilistic logic, outlining both the benefits of its well-defined model theory and the drawbacks of its proof theory. Within Nilsson's semantic framework, we derive a set of inference rules which are provably sound. The resulting proof system, in contrast to Nilsson's approach, has the important feature of convergence - that is, the inference process proceeds by computing increasingly narrow probability intervals which converge from above and below on the smallest entailed probability interval. Thus the procedure can be stopped at any time to yield partial information concerning the smallest entailed interval. The comparisons of uncertainty calculi from the last two Uncertainty Workshops have all used theoretical probabilistic accuracy as the sole metric. While mathematical correctness is important, there are other factors which should be considered when developing reasoning systems. These other factors include, among other things, the error in uncertainty measures obtainable for the problem and the effect of this error on the performance of the resulting system. In our previous series of studies to investigate the role of evidential reasoning in the RUBRIC system for full-text document retrieval (Tong et al., 1985; Tong and Shapiro, 1985; Tong and Appelbaum, 1987), we identified the important role that problem structure plays in the overall performance of the system. In this paper, we focus on these structural elements (which we now call "semantic structure") and show how explicit consideration of their properties reduces what previously were seen as difficult evidential reasoning problems to more tractable questions. This study examined the effects of "tuning" the parameters of the incremental function of MYCIN, the independent function of PROSPECTOR, a probability model that assumes independence, and a simple additive linear equation. me parameters of each of these models were optimized to provide solutions which most nearly approximated those from a full probability model for a large set of simple networks. Surprisingly, MYCIN, PROSPECTOR, and the linear equation performed equivalently; the independence model was clearly more accurate on the networks studied. We describe a representation and a set of inference methods that combine logic programming techniques with probabilistic network representations for uncertainty (influence diagrams). The techniques emphasize the dynamic construction and solution of probabilistic and decision-theoretic models for complex and uncertain domains. Given a query, a logical proof is produced if possible; if not, an influence diagram based on the query and the knowledge of the decision domain is produced and subsequently solved. A uniform declarative, first-order, knowledge representation is combined with a set of integrated inference procedures for logical, probabilistic, and decision-theoretic reasoning. A method for computing probabilistic propositions is presented. It assumes the availability of a single external routine for computing the probability of one instantiated variable, given a conjunction of other instantiated variables. In particular, the method allows belief network algorithms to calculate general probabilistic propositions over nodes in the network. Although in the worst case the time complexity of the method is exponential in the size of a query, it is polynomial in the size of a number of common types of queries. The fundamental elements of evidential reasoning problems are described, followed by a discussion of the structure of various types of problems. Bayesian inference networks and state space formalism are used as the tool for problem representation. A human-oriented decision making cycle for solving evidential reasoning problems is described and illustrated for a military situation assessment problem. The implementation of this cycle may serve as the basis for an expert system shell for evidential reasoning; i.e. a situation assessment processor. The ability to predict the future in a given domain can be acquired by discovering empirically from experience certain temporal patterns that tend to repeat unerringly. Previous works in time series analysis allow one to make quantitative predictions on the likely values of certain linear variables. Since certain types of knowledge are better expressed in symbolic forms, making qualitative predictions based on symbolic representations require a different approach. A domain independent methodology called TIM (Time based Inductive Machine) for discovering potentially uncertain temporal patterns from real time observations using the technique of inductive inference is described here. A model of knowledge representation is described in which propositional facts and the relationships among them can be supported by other facts. The set of knowledge which can be supported is called the set of cognitive units, each having associated descriptions of their explicit and implicit support structures, summarizing belief and reliability of belief. This summary is precise enough to be useful in a computational model while remaining descriptive of the underlying symbolic support structure. When a fact supports another supportive relationship between facts we call this meta-support. This facilitates reasoning about both the propositional knowledge. and the support structures underlying it. To develop an approach to utilizing continuous statistical information within the Dempster- Shafer framework, we combine methods proposed by Strat and by Shafero We first derive continuous possibility and mass functions from probability-density functions. Then we propose a rule for combining such evidence that is simpler and more efficiently computed than Dempster's rule. We discuss the relationship between Dempster's rule and our proposed rule for combining evidence over continuous frames. We start by defining an approach to non-monotonic probabilistic reasoning in terms of non-monotonic categorical (true-false) reasoning. We identify a type of non-monotonic probabilistic reasoning, akin to default inheritance, that is commonly found in practice, especially in "evidential" and "Bayesian" reasoning. We formulate this in terms of the Maximization of Conditional Independence (MCI), and identify a variety of applications for this sort of default. We propose a formalization using Pointwise Circumscription. We compare MCI to Maximum Entropy, another kind of non-monotonic principle, and conclude by raising a number of open questions The investigations reported in this paper center on the process of dynamic uncertainty assessment during interpretation tasks in real domain. In particular, we are interested here in the nature of the control structure of computer programs that can support multiple interpretation and smooth transitions between them, in real time. Each step of the processing involves the interpretation of one input item and the appropriate re-establishment of the system's confidence of the correctness of its interpretation(s). We describe a viewpoint on the Dempster/Shafer 'Theory of Evidence', and provide an interpretation which regards the combination formulas as statistics of the opinions of "experts". This is done by introducing spaces with binary operations that are simpler to interpret or simpler to implement than the standard combination formula, and showing that these spaces can be mapped homomorphically onto the Dempster/Shafer theory of evidence space. The experts in the space of "opinions of experts" combine information in a Bayesian fashion. We present alternative spaces for the combination of evidence suggested by this viewpoint. We are interested in creating an automated or semi-automated system with the capability of taking a set of radar imagery, collection parameters and a priori map and other tactical data, and producing likely interpretations of the possible military situations given the available evidence. This paper is concerned with the problem of the interpretation and computation of certainty or belief in the conclusions reached by such a system. This paper presents an efficient adaptation and application of the Dempster-Shafer theory of evidence, one that can be used effectively in a massively parallel hierarchical system for visual pattern perception. It describes the techniques used, and shows in an extended example how they serve to improve the system's performance as it applies a multiple-level set of processes. For any system with limited statistical knowledge, the combination of evidence and the interpretation of sampling information require the determination of the right reference class (or of an adequate one). The present note (1) discusses the use of reference classes in evidential reasoning, and (2) discusses implementations of Kyburg's rules for reference classes. This paper contributes the first frank discussion of how much of Kyburg's system is needed to be powerful, how much can be computed effectively, and how much is philosophical fat. MINDS is a distributed system of cooperating query engines that customize, document retrieval for each user in a dynamic environment. It improves its performance and adapts to changing patterns of document distribution by observing system-user interactions and modifying the appropriate certainty factors, which act as search control parameters. It argued here that the uncertainty management calculus must account for temporal precedence, reliability of evidence, degree of support for a proposition, and saturation effects. The calculus presented here possesses these features. Some results obtained with this scheme are discussed. In this paper, we describe a representation for spatial information, called the stochastic map, and associated procedures for building it, reading information from it, and revising it incrementally as new information is obtained. The map contains the estimates of relationships among objects in the map, and their uncertainties, given all the available information. The procedures provide a general solution to the problem of estimating uncertain relative spatial relationships. The estimates are probabilistic in nature, an advance over the previous, very conservative, worst-case approaches to the problem. Finally, the procedures are developed in the context of state-estimation and filtering theory, which provides a solid basis for numerous extensions. This paper examines the accuracy of the PROSPECTOR model for uncertain reasoning. PROSPECTOR's solutions for a large number of computer-generated inference networks were compared to those obtained from probability theory and minimum cross-entropy calculations. PROSPECTOR's answers were generally accurate for a restricted subset of problems that are consistent with its assumptions. However, even within this subset, we identified conditions under which PROSPECTOR's performance deteriorates. Attempts to replicate probabilistic reasoning in expert systems have typically overlooked a critical ingredient of that process. Probabilistic analysis typically requires extensive judgments regarding interdependencies among hypotheses and data, and regarding the appropriateness of various alternative models. The application of such models is often an iterative process, in which the plausibility of the results confirms or disconfirms the validity of assumptions made in building the model. In current expert systems, by contrast, probabilistic information is encapsulated within modular rules (involving, for example, "certainty factors"), and there is no mechanism for reviewing the overall form of the probability argument or the validity of the judgments entering into it. This paper examines some methods and ideas underlying the author's successful probabilistic learning systems(PLS), which have proven uniquely effective and efficient in generalization learning or induction. While the emerging principles are generally applicable, this paper illustrates them in heuristic search, which demands noise management and incremental learning. In our approach, both task performance and learning are guided by probability. Probabilities are incrementally normalized and revised, and their errors are located and corrected. In a real expert system, one may have unreliable, unconfident, conflicting estimates of the value for a particular parameter. It is important for decision making that the information present in this aggregate somehow find its way into use. We cast the problem of representing and combining uncertain estimates as selection of two kinds of functions, one to determine an estimate, the other its uncertainty. The paper includes a long list of properties that such functions should satisfy, and it presents one method that satisfies them. Mechanisms for the automation of uncertainty are required for expert systems. Sometimes these mechanisms need to obey the properties of probabilistic reasoning. A purely numeric mechanism, like those proposed so far, cannot provide a probabilistic logic with truth functional connectives. We propose an alternative mechanism, Incidence Calculus, which is based on a representation of uncertainty using sets of points, which might represent situations, models or possible worlds. Incidence Calculus does provide a probabilistic logic with truth functional connectives. This paper focuses on designing expert systems to support decision making in complex, uncertain environments. In this context, our research indicates that strictly probabilistic representations, which enable the use of decision-theoretic reasoning, are highly preferable to recently proposed alternatives (e.g., fuzzy set theory and Dempster-Shafer theory). Furthermore, we discuss the language of influence diagrams and a corresponding methodology -decision analysis -- that allows decision theory to be used effectively and efficiently as a decision-making aid. Finally, we use RACHEL, a system that helps infertile couples select medical treatments, to illustrate the methodology of decision analysis as basis for expert decision systems. The last few years has seen a growing debate about techniques for managing uncertainty in AI systems. Unfortunately this debate has been cast as a rivalry between AI methods and classical probability based ones. Three arguments for extending the probability framework of uncertainty are presented, none of which imply a challenge to classical methods. These are (1) explicit representation of several types of uncertainty, specifically possibility and plausibility, as well as probability, (2) the use of weak methods for uncertainty management in problems which are poorly defined, and (3) symbolic representation of different uncertainty calculi and methods for choosing between them. It is proposed to apply modern methods of nonlinear nonequilibrium statistical mechanics to develop software algorithms that will optimally respond to targets within short response times with minimal computer resources. This Statistical Mechanics Algorithm for Response to Targets (SMART) can be developed with a view towards its future implementation into a hardwired Statistical Algorithm Multiprocessor (SAM) to enhance the efficiency and speed of response to targets (SMART_SAM). The roles played by decision factors in making complex subject are decisions are characterized by how these factors affect the overall decision. Evidence that partially matches a factor is evaluated, and then effective computational rules are applied to these roles to form an appropriate aggregation of the evidence. The use of this technique supports the expression of deeper levels of causality, and may also preserve the cognitive structure of the decision maker better than the usual weighting methods, certainty-factor or other probabilistic models can. We consider a simple sequential allocation procedure for sharing indivisible items between agents in which agents take turns to pick items. Supposing additive utilities and independence between the agents, we show that the expected utility of each agent is computable in polynomial time. Using this result, we prove that the expected utilitarian social welfare is maximized when agents take alternate turns. We also argue that this mechanism remains optimal when agents behave strategically We introduce statistical constraints, a declarative modelling tool that links statistics and constraint programming. We discuss two statistical constraints and some associated filtering algorithms. Finally, we illustrate applications to standard problems encountered in statistics and to a novel inspection scheduling problem in which the aim is to find inspection plans with desirable statistical properties. We frame the question of what kind of subjective experience a brain simulation would have in contrast to a biological brain. We discuss the brain prosthesis thought experiment. We evaluate how the experience of the brain simulation might differ from the biological, according to a number of hypotheses about experience and the properties of simulation. Then, we identify finer questions relating to the original inquiry, and answer them from both a general physicalist, and panexperientialist perspective. The debts' clearing problem is about clearing all the debts in a group of n entities (persons, companies etc.) using a minimal number of money transaction operations. The problem is known to be NP-hard in the strong sense. As for many intractable problems, techniques from the field of artificial intelligence are useful in finding solutions close to optimum for large inputs. An evolutionary algorithm for solving the debts' clearing problem is proposed. Space and time are two critical components of many real world systems. For this reason, analysis of anomalies in spatiotemporal data has been a great of interest. In this work, application of tensor decomposition and eigenspace techniques on spatiotemporal hotspot detection is investigated. An algorithm called SST-Hotspot is proposed which accounts for spatiotemporal variations in data and detect hotspots using matching of eigenvector elements of two cases and population tensors. The experimental results reveal the interesting application of tensor decomposition and eigenvector-based techniques in hotspot analysis. Agent programming is mostly a symbolic discipline and, as such, draws little benefits from probabilistic areas as machine learning and graphical models. However, the greatest objective of agent research is the achievement of autonomy in dynamical and complex environments --- a goal that implies embracing uncertainty and therefore the entailed representations, algorithms and techniques. This paper proposes an innovative and conflict free two layer approach to agent programming that uses already established methods and tools from both symbolic and probabilistic artificial intelligence. Moreover, this framework is illustrated by means of a widely used agent programming example, GoldMiners. It is well known that for certain tasks, quantum computing outperforms classical computing. A growing number of contributions try to use this advantage in order to improve or extend classical machine learning algorithms by methods of quantum information theory. This paper gives a brief introduction into quantum machine learning using the example of pattern classification. We introduce a quantum pattern classification algorithm that draws on Trugenberger's proposal for measuring the Hamming distance on a quantum computer (CA Trugenberger, Phys Rev Let 87, 2001) and discuss its advantages using handwritten digit recognition as from the MNIST database. We study in this paper the consequences of using the Mean Absolute Percentage Error (MAPE) as a measure of quality for regression models. We show that finding the best model under the MAPE is equivalent to doing weighted Mean Absolute Error (MAE) regression. We show that universal consistency of Empirical Risk Minimization remains possible using the MAPE instead of the MAE. We compare in this paper several feature selection methods for the Naive Bayes Classifier (NBC) when the data under study are described by a large number of redundant binary indicators. Wrapper approaches guided by the NBC estimation of the classification error probability out-perform filter approaches while retaining a reasonable computational cost. We consider graphs that represent pairwise marginal independencies amongst a set of variables (for instance, the zero entries of a covariance matrix for normal data). We characterize the directed acyclic graphs (DAGs) that faithfully explain a given set of independencies, and derive algorithms to efficiently enumerate such structures. Our results map out the space of faithful causal models for a given set of pairwise marginal independence relations. This allows us to show the extent to which causal inference is possible without using conditional independence tests. Using the recently introduced universal computing model, called orchestrated machine, that represents computations in a dissipative environment, we consider a new kind of interpretation of Turing's Imitation Game. In addition we raise the question whether the intelligence may show fractal properties. Then we sketch a vision of what robotic cars are going to do in the future. Finally we give the specification of an artificial life game based on the concept of orchestrated machines. The purpose of this paper is to start the search for possible relationships between these different topics. This paper explores the performance of fitted neural Q iteration for reinforcement learning in several partially observable environments, using three recurrent neural network architectures: Long Short-Term Memory, Gated Recurrent Unit and MUT1, a recurrent neural architecture evolved from a pool of several thousands candidate architectures. A variant of fitted Q iteration, based on Advantage values instead of Q values, is also explored. The results show that GRU performs significantly better than LSTM and MUT1 for most of the problems considered, requiring less training episodes and less CPU time before learning a very good policy. Advantage learning also tends to produce better results. Probabilistic inference procedures are usually coded painstakingly from scratch, for each target model and each inference algorithm. We reduce this effort by generating inference procedures from models automatically. We make this code generation modular by decomposing inference algorithms into reusable program-to-program transformations. These transformations perform exact inference as well as generate probabilistic programs that compute expectations, densities, and MCMC samples. The resulting inference procedures are about as accurate and fast as other probabilistic programming systems on real-world problems. Here, I review current state-of-the-arts in many areas of AI to estimate when it's reasonable to expect human level AI development. Predictions of prominent AI researchers vary broadly from very pessimistic predictions of Andrew Ng to much more moderate predictions of Geoffrey Hinton and optimistic predictions of Shane Legg, DeepMind cofounder. Given huge rate of progress in recent years and this broad range of predictions of AI experts, AI safety questions are also discussed. Automatic extraction of cause-effect relationships from natural language texts is a challenging open problem in Artificial Intelligence. Most of the early attempts at its solution used manually constructed linguistic and syntactic rules on small and domain-specific data sets. However, with the advent of big data, the availability of affordable computing power and the recent popularization of machine learning, the paradigm to tackle this problem has slowly shifted. Machines are now expected to learn generic causal extraction rules from labelled data with minimal supervision, in a domain independent-manner. In this paper, we provide a comprehensive survey of causal relation extraction techniques from both paradigms, and analyse their relative strengths and weaknesses, with recommendations for future work. The 15th Conference on Theoretical Aspects of Rationality and Knowledge (TARK) took place in Carnegie Mellon University, Pittsburgh, USA from June 4 to 6, 2015. The mission of the TARK conferences is to bring together researchers from a wide variety of fields, including Artificial Intelligence, Cryptography, Distributed Computing, Economics and Game Theory, Linguistics, Philosophy, and Psychology, in order to further our understanding of interdisciplinary issues involving reasoning about rationality and knowledge. These proceedings consist of a subset of the papers / abstracts presented at the TARK conference. We present a system for generating and understanding of dynamic and static spatial relations in robotic interaction setups. Robots describe an environment of moving blocks using English phrases that include spatial relations such as "across" and "in front of". We evaluate the system in robot-robot interactions and show that the system can robustly deal with visual perception errors, language omissions and ungrammatical utterances. Probabilistic modeling is one of the foundations of modern machine learning and artificial intelligence. In this paper, we propose a novel type of probabilistic models named latent dependency forest models (LDFMs). A LDFM models the dependencies between random variables with a forest structure that can change dynamically based on the variable values. It is therefore capable of modeling context-specific independence. We parameterize a LDFM using a first-order non-projective dependency grammar. Learning LDFMs from data can be formulated purely as a parameter learning problem, and hence the difficult problem of model structure learning is circumvented. Our experimental results show that LDFMs are competitive with existing probabilistic models. A modification of the neo-fuzzy neuron is proposed (an extended neo-fuzzy neuron (ENFN)) that is characterized by improved approximating properties. An adaptive learning algorithm is proposed that has both tracking and smoothing properties. An ENFN distinctive feature is its computational simplicity compared to other artificial neural networks and neuro-fuzzy systems. Multi-agent path finding (MAPF) is well-studied in artificial intelligence, robotics, theoretical computer science and operations research. We discuss issues that arise when generalizing MAPF methods to real-world scenarios and four research directions that address them. We emphasize the importance of addressing these issues as opposed to developing faster methods for the standard formulation of the MAPF problem. Motivated by the idea that criticality and universality of phase transitions might play a crucial role in achieving and sustaining learning and intelligent behaviour in biological and artificial networks, we analyse a theoretical and a pragmatic experimental set up for critical phenomena in deep learning. On the theoretical side, we use results from statistical physics to carry out critical point calculations in feed-forward/fully connected networks, while on the experimental side we set out to find traces of criticality in deep neural networks. This is our first step in a series of upcoming investigations to map out the relationship between criticality and learning in deep networks. This paper presents minimax rates for density estimation when the data dimension $d$ is allowed to grow with the number of observations $n$ rather than remaining fixed as in previous analyses. We prove a non-asymptotic lower bound which gives the worst-case rate over standard classes of smooth densities, and we show that kernel density estimators achieve this rate. We also give oracle choices for the bandwidth and derive the fastest rate $d$ can grow with $n$ to maintain estimation consistency. RoboCup offers a set of benchmark problems for Artificial Intelligence in form of official world championships since 1997. The most tactical advanced and richest in terms of behavioural complexity of these is the 2D Soccer Simulation League, a simulated robotic soccer competition. BetaRun is a new attempt combining both machine learning and manual programming approaches, with the ultimate goal to arrive at a team that is trained entirely from observing and playing games, and a new development based on agent2D. The AGM model is the most remarkable framework for modeling belief revision. However, it is not perfect in all aspects. Paraconsistent belief revision, multi-agent belief revision and non-prioritized belief revision are three different extensions to AGM to address three important criticisms applied to it. In this article, we propose a framework based on AGM that takes a position in each of these categories. Also, we discuss some features of our framework and study the satisfiability of AGM postulates in this new context. Games have always been popular testbeds for Artificial Intelligence (AI). In the last decade, we have seen the rise of the Multiple Online Battle Arena (MOBA) games, which are the most played games nowadays. In spite of this, there are few works that explore MOBA as a testbed for AI Research. In this paper we present and discuss the main features and opportunities offered by MOBA games to Game AI Research. We describe the various challenges faced along the game and also propose a discrete model that can be used to better understand and explore the game. With this, we aim to encourage the use of MOBA as a novel research platform for Game AI. There are many goals for an AI that could become dangerous if the AI becomes superintelligent or otherwise powerful. Much work on the AI control problem has been focused on constructing AI goals that are safe even for such AIs. This paper looks at an alternative approach: defining a general concept of `low impact'. The aim is to ensure that a powerful AI which implements low impact will not modify the world extensively, even if it is given a simple or dangerous goal. The paper proposes various ways of defining and grounding low impact, and discusses methods for ensuring that the AI can still be allowed to have a (desired) impact despite the restriction. The end of the paper addresses known issues with this approach and avenues for future research. Digital games have become a key player in the entertainment industry, attracting millions of new players each year. In spite of that, novice players may have a hard time when playing certain types of games, such as MOBAs and MMORPGs, due to their steep learning curves and not so friendly online communities. In this paper, we present an approach to help novice players in MOBA games overcome these problems. An artificial intelligence agent plays alongside the player analyzing his/her performance and giving tips about the game. Experiments performed with the game {\em League of Legends} show the potential of this approach. Drawing an inspiration from behavioral studies of human decision making, we propose here a general parametric framework for multi-armed bandit problem, which extends the standard Thompson Sampling approach to incorporate reward processing biases associated with several neurological and psychiatric conditions, including Parkinson's and Alzheimer's diseases, attention-deficit/hyperactivity disorder (ADHD), addiction, and chronic pain. We demonstrate empirically that the proposed parametric approach can often outperform the baseline Thompson Sampling on a variety of datasets. Moreover, from the behavioral modeling perspective, our parametric framework can be viewed as a first step towards a unifying computational model capturing reward processing abnormalities across multiple mental conditions. Constrained counting is important in domains ranging from artificial intelligence to software analysis. There are already a few approaches for counting models over various types of constraints. Recently, hashing-based approaches achieve both theoretical guarantees and scalability, but still rely on solution enumeration. In this paper, a new probabilistic polynomial time approximate model counter is proposed, which is also a hashing-based universal framework, but with only satisfiability queries. A variant with a dynamic stopping criterion is also presented. Empirical evaluation over benchmarks on propositional logic formulas and SMT(BV) formulas shows that the approach is promising. This paper gives an overview of impersonation bots that generate output in one, or possibly, multiple modalities. We also discuss rapidly advancing areas of machine learning and artificial intelligence that could lead to frighteningly powerful new multi-modal social bots. Our main conclusion is that most commonly known bots are one dimensional (i.e., chatterbot), and far from deceiving serious interrogators. However, using recent advances in machine learning, it is possible to unleash incredibly powerful, human-like armies of social bots, in potentially well coordinated campaigns of deception and influence. The paper investigates navigability with imperfect information. It shows that the properties of navigability with perfect recall are exactly those captured by Armstrong's axioms from the database theory. If the assumption of perfect recall is omitted, then Armstrong's transitivity axiom is not valid, but it can be replaced by two new weaker principles. The main technical results are soundness and completeness theorems for the logical systems describing properties of navigability with and without perfect recall. The paper proposes a bimodal logic that describes an interplay between distributed knowledge modality and coalition know-how modality. Unlike other similar systems, the one proposed here assumes perfect recall by all agents. Perfect recall is captured in the system by a single axiom. The main technical results are the soundness and the completeness theorems for the proposed logical system. This volume consists of papers presented at the Sixteenth Conference on Theoretical Aspects of Rationality and Knowledge (TARK) held at the University of Liverpool, UK, from July 24 to 26, 2017. TARK conferences bring together researchers from a wide variety of fields, including Computer Science (especially, Artificial Intelligence, Cryptography, Distributed Computing), Economics (especially, Decision Theory, Game Theory, Social Choice Theory), Linguistics, Philosophy (especially, Philosophical Logic), and Cognitive Psychology, in order to further understand the issues involving reasoning about rationality and knowledge. Sequential pattern mining algorithms are widely used to explore care pathways database, but they generate a deluge of patterns, mostly redundant or useless. Clinicians need tools to express complex mining queries in order to generate less but more significant patterns. These algorithms are not versatile enough to answer complex clinician queries. This article proposes to apply a declarative pattern mining approach based on Answer Set Programming paradigm. It is exemplified by a pharmaco-epidemiological study investigating the possible association between hospitalization for seizure and antiepileptic drug switch from a french medico-administrative database. Probabilistic Inference Modulo Theories (PIMT) is a recent framework that expands exact inference on graphical models to use richer languages that include arithmetic, equalities, and inequalities on both integers and real numbers. In this paper, we expand PIMT to a lifted version that also processes random functions and relations. This enhancement is achieved by adapting Inversion, a method from Lifted First-Order Probabilistic Inference literature, to also be modulo theories. This results in the first algorithm for exact probabilistic inference that efficiently and simultaneously exploits random relations and functions, arithmetic, equalities and inequalities. We present a commonsense, qualitative model for the semantic grounding of embodied visuo-spatial and locomotive interactions. The key contribution is an integrative methodology combining low-level visual processing with high-level, human-centred representations of space and motion rooted in artificial intelligence. We demonstrate practical applicability with examples involving object interactions, and indoor movement. With the use of ontologies in several domains such as semantic web, information retrieval, artificial intelligence, the concept of similarity measuring has become a very important domain of research. Therefore, in the current paper, we propose our method of similarity measuring which uses the Dijkstra algorithm to define and compute the shortest path. Then, we use this one to compute the semantic distance between two concepts defined in the same hierarchy of ontology. Afterward, we base on this result to compute the semantic similarity. Finally, we present an experimental comparison between our method and other methods of similarity measuring. Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability. The present document is an excerpt of an essay that I wrote as part of my application material to graduate school in Computer Science (with a focus on Artificial Intelligence), in 1986. I was not invited by any of the schools that received it, so I became a theoretical physicist instead. The essay's full title was "Some Topics in Philosophy and Computer Science". I am making this text (unchanged from 1985, preserving the typesetting as much as possible) available now in memory of Jerry Fodor, whose writings had influenced me significantly at the time (even though I did not always agree). Cloud computing relies on sharing computing resources rather than having local servers or personal devices to handle applications. Nowadays, cloud computing has become one of the fastest growing fields in information technology. However, several new security issues of cloud computing have emerged due to its service delivery models. In this paper, we discuss the case of distributed denial-of-service (DDoS) attack using Cloud resources. First, we show how such attack using a cloud platform could not be detected by previous techniques. Then we present a tricky solution based on the cloud as well. Using Deep Reinforcement Learning (DRL) can be a promising approach to handle various tasks in the field of (simulated) autonomous driving. However, recent publications mainly consider learning in unusual driving environments. This paper presents Driving School for Autonomous Agents (DSA^2), a software for validating DRL algorithms in more usual driving environments based on artificial and realistic road networks. We also present the results of applying DSA^2 for handling the task of driving on a straight road while regulating the velocity of one vehicle according to different speed limits. Although deep learning has historical roots going back decades, neither the term "deep learning" nor the approach was popular just over five years ago, when the field was reignited by papers such as Krizhevsky, Sutskever and Hinton's now classic (2012) deep network model of Imagenet. What has the field discovered in the five subsequent years? Against a background of considerable progress in areas such as speech recognition, image recognition, and game playing, and considerable enthusiasm in the popular press, I present ten concerns for deep learning, and suggest that deep learning must be supplemented by other techniques if we are to reach artificial general intelligence. This paper is to explore the possibility to use alternative data and artificial intelligence techniques to trade stocks. The efficacy of the daily Twitter sentiment on predicting the stock return is examined using machine learning methods. Reinforcement learning(Q-learning) is applied to generate the optimal trading policy based on the sentiment signal. The predicting power of the sentiment signal is more significant if the stock price is driven by the expectation of the company growth and when the company has a major event that draws the public attention. The optimal trading strategy based on reinforcement learning outperforms the trading strategy based on the machine learning prediction. This paper builds on an existing notion of group responsibility and proposes two ways to define the degree of group responsibility: structural and functional degrees of responsibility. These notions measure the potential responsibilities of (agent) groups for avoiding a state of affairs. According to these notions, a degree of responsibility for a state of affairs can be assigned to a group of agents if, and to the extent that, the group has the potential to preclude the state of affairs. We present Etymo (https://etymo.io), a discovery engine to facilitate artificial intelligence (AI) research and development. It aims to help readers navigate a large number of AI-related papers published every week by using a novel form of search that finds relevant papers and displays related papers in a graphical interface. Etymo constructs and maintains an adaptive similarity-based network of research papers as an all-purpose knowledge graph for ranking, recommendation, and visualisation. The network is constantly evolving and can learn from user feedback to adjust itself. Several tasks in artificial intelligence require to be able to find models about knowledge dynamics. They include belief revision, fusion and belief merging, and abduction. In this paper we exploit the algebraic framework of mathematical morphology in the context of propositional logic, and define operations such as dilation or erosion of a set of formulas. We derive concrete operators, based on a semantic approach, that have an intuitive interpretation and that are formally well behaved, to perform revision, fusion and abduction. Computation and tractability are addressed, and simple examples illustrate the typical results that can be obtained. Just as semantic hashing can accelerate information retrieval, binary valued embeddings can significantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin flips of varying bias, continuous optimization techniques can be applied to the approximate expected loss. Embeddings optimized in this fashion consistently outperform the quantization of both spectral graph embeddings and various learned real-valued embeddings, on both ranking and pre-ranking tasks for a variety of datasets. This paper introduces the settlement generation competition for Minecraft, the first part of the Generative Design in Minecraft challenge. The settlement generation competition is about creating Artificial Intelligence (AI) agents that can produce functional, aesthetically appealing and believable settlements adapted to a given Minecraft map - ideally at a level that can compete with human created designs. The aim of the competition is to advance procedural content generation for games, especially in overcoming the challenges of adaptive and holistic PCG. The paper introduces the technical details of the challenge, but mostly focuses on what challenges this competition provides and why they are scientifically relevant. The theory of grey systems plays an important role in science,engineering and in the everyday life in general for handling approximate data. In the present paper grey numbers are used as a tool for assessing with linguistic expressions the mean performance of a group of objects participating in a certain activity. Two applications to student and football player assessment are also presented illustrating our results. Recently, deep learning has been advancing the state of the art in artificial intelligence to a new level, and humans rely on artificial intelligence techniques more than ever. However, even with such unprecedented advancements, the lack of explanation regarding the decisions made by deep learning models and absence of control over their internal processes act as major drawbacks in critical decision-making processes, such as precision medicine and law enforcement. In response, efforts are being made to make deep learning interpretable and controllable by humans. In this paper, we review visual analytics, information visualization, and machine learning perspectives relevant to this aim, and discuss potential challenges and future research directions. In this study, under general frame of MAny Connected Intelligent Particles Systems (MACIPS), we reproduce two new simple subsets of such intelligent complex network, namely hybrid intelligent systems, involved a few prominent intelligent computing and approximate reasoning methods: self organizing feature map (SOM), Neuro-Fuzzy Inference System and Rough Set Theory (RST). Over this, we show how our algorithms can be construed as a linkage of government-society interaction, where government catches various fashions of behavior: solid (absolute) or flexible. So, transition of such society, by changing of connectivity parameters (noise) from order to disorder is inferred. Add to this, one may find an indirect mapping among financial systems and eventual market fluctuations with MACIPS. Knowledge representation (KR) and inference mechanism are most desirable thing to make the system intelligent. System is known to an intelligent if its intelligence is equivalent to the intelligence of human being for a particular domain or general. Because of incomplete ambiguous and uncertain information the task of making intelligent system is very difficult. The objective of this paper is to present the hybrid KR technique for making the system effective & Optimistic. The requirement for (effective & optimistic) is because the system must be able to reply the answer with a confidence of some factor. This paper also presents the comparison between various hybrid KR techniques with the proposed one. There has been an increasing interest in inferring some personality traits from users and players in social networks and games, respectively. This goes beyond classical sentiment analysis, and also much further than customer profiling. The purpose here is to have a characterisation of users in terms of personality traits, such as openness, conscientiousness, extraversion, agreeableness, and neuroticism. While this is an incipient area of research, we ask the question of whether cognitive abilities, and intelligence in particular, are also measurable from user profiles. However, we pose the question as broadly as possible in terms of subjects, in the context of universal psychometrics, including humans, machines and hybrids. Namely, in this paper we analyse the following question: is it possible to measure the intelligence of humans and (non-human) bots in a social network or a game just from their user profiles, i.e., by observation, without the use of interactive tests, such as IQ tests, the Turing test or other more principled machine intelligence tests? What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models Two-dimensional (2D) materials and their heterostructures, with wafer-scale synthesis methods and fascinating properties, have attracted numerous interest and triggered revolutions of corresponding device applications. However, facile methods to realize accurate, intelligent and large-area characterizations of these 2D structures are still highly desired. Here, we report a successful application of machine-learning strategy in the optical identification of 2D structure. The machine-learning optical identification method (MOI method) endows optical microscopy with intelligent insight into the characteristic colour information in the optical photograph. Experimental results indicate that the MOI method enables accurate, intelligent and large-area characterizations of graphene, molybdenum disulphide (MoS2) and their heterostructures, including identifications of the thickness, the existence of impurities, and even the stacking order. Thanks to the convergence of artificial intelligence and nanoscience, this intelligent identification method can certainly promote the fundamental research and wafer-scale device application of 2D structures. We discuss properties of recursive schemas related to McCarthy's ``91 function'' and to Takeuchi's triple recursion. Several theorems are proposed as interesting candidates for machine verification, and some intriguing open questions are raised. We introduce a simple generalization of Gardenfors and Makinson's epistemic entrenchment called partial entrenchment. We show that preferential inference can be generated as the sceptical counterpart of an inference mechanism defined directly on partial entrenchment. This is a tutorial on logic programming and Prolog appropriate for a course on programming languages for students familiar with imperative programming. This article aims at clarifying the language and practice of scientific experiment, mainly by hooking observability on calculability. This paper introduces the notion of value-based argumentation frameworks, an extension of the standard argumentation frameworks proposed by Dung, which are able toshow how rational decision is possible in cases where arguments derive their force from the social values their acceptance would promote. This work analyses main features that should be present in knowledge representation. It suggests a model for representation and a way to implement this model in software. Representation takes care of both low-level sensor information and high-level concepts. This document describes the functions as they are treated in the DLV system. We give first the language, then specify the main implementation issues. This research paper gives an overview of quantum computers - description of their operation, differences between quantum and silicon computers, major construction problems of a quantum computer and many other basic aspects. No special scientific knowledge is necessary for the reader. Self-organizing neural networks are used for brick finding in OPERA experiment. Self-organizing neural networks and wavelet analysis used for recognition and extraction of car numbers from images. We prove general exponential moment inequalities for averages of [0,1]-valued iid random variables and use them to tighten the PAC Bayesian Theorem. The logarithmic dependence on the sample count in the enumerator of the PAC Bayesian bound is halved. In this paper, we present a rich semantic network based on a differential analysis. We then detail implemented measures that take into account common and differential features between words. In a last section, we describe some industrial applications. The principles of self-organizing the neural networks of optimal complexity is considered under the unrepresentative learning set. The method of self-organizing the multi-layered neural networks is offered and used to train the logical neural networks which were applied to the medical diagnostics. Results about the redundancy of circumscriptive and default theories are presented. In particular, the complexity of establishing whether a given theory is redundant is establihsed. The unification algorithm is at the core of the logic programming paradigm, the first unification algorithm being developed by Robinson [5]. More efficient algorithms were developed later [3] and I introduce here yet another efficient unification algorithm centered on a specific data structure, called the Unification Table. In this note we introduce the notion of islands for restricting local search. We show how we can construct islands for CNF SAT problems, and how much search space can be eliminated by restricting search to the island. We show that solving planning domains on binary variables with polytree causal graph is \NP-complete. This is in contrast to a polynomial-time algorithm of Domshlak and Brafman that solves these planning domains for polytree causal graphs of bounded indegree. This report introduces researchers in AI to some of the concepts in quantum heurisitics and quantum AI. In these notes we formally describe the functionality of Calculating Valid Domains from the BDD representing the solution space of valid configurations. The formalization is largely based on the CLab configuration framework. Computer model of a "sense of humour" suggested previously [arXiv:0711.2058, 0711.2061, 0711.2270] is raised to the level of a realistic algorithm. Review of: Brigitte Le Roux and Henry Rouanet, Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis, Kluwer, Dordrecht, 2004, xi+475 pp. This paper has been withdrawn by the author due to a crucial error in the submission action. The standard classification of emotions involves categorizing the expression of emotions. In this paper, parameters underlying some emotions are identified and a new classification based on these parameters is suggested. We show that a prominent counterexample for the completeness of first order RUE-resolution does not apply to the higher order RUE-resolution approach EXTRUE. A fuzzy mnesor space is a semimodule over the positive real numbers. It can be used as theoretical framework for fuzzy sets. Hence we can prove a great number of properties for fuzzy sets without refering to the membership functions. This research report presents an extension of Cumulative of Choco constraint solver, which is useful to encode over-constrained cumulative problems. This new global constraint uses sweep and task interval violation-based algorithms. In this paper, we propose a first-order ontology for generalized stratified order structure. We then classify the models of the theory using model-theoretic techniques. An ontology mapping from this ontology to the core theory of Process Specification Language is also discussed. This paper discusses "computational" systems capable of "computing" functions not computable by predefined Turing machines if the systems are not isolated from their environment. Roughly speaking, these systems can change their finite descriptions by interacting with their environment. We develop abc-logitboost, based on the prior work on abc-boost and robust logitboost. Our extensive experiments on a variety of datasets demonstrate the considerable improvement of abc-logitboost over logitboost and abc-mart. This short note reviews briefly three algorithms for finding the set of dispensable variables of a boolean formula. The presentation is light on proofs and heavy on intuitions. This paper proposes a design for a system to generate constraint solvers that are specialised for specific problem models. It describes the design in detail and gives preliminary experimental results showing the feasibility and effectiveness of the approach. The paper offers a mathematical formalization of the Turing test. This formalization makes it possible to establish the conditions under which some Turing machine will pass the Turing test and the conditions under which every Turing machine (or every Turing machine of the special class) will fail the Turing test. To maximize its success, an AGI typically needs to explore its initially unknown world. Is there an optimal way of doing so? Here we derive an affirmative answer for a broad class of environments. This study presents a method to predict the growth fluctuation of firms interdependent in a network economy. The risk of downward growth fluctuation of firms is calculated from the statistics on Japanese industry. This paper investigates under which conditions instantiation-based proof procedures can be combined in a nested way, in order to mechanically construct new instantiation procedures for richer theories. Interesting applications in the field of verification are emphasized, particularly for handling extensions of the theory of arrays. This paper introduces 'just enough' principles and 'systems engineering' approach to the practice of ontology development to provide a minimal yet complete, lightweight, agile and integrated development process, supportive of stakeholder management and implementation independence. This paper presents a kernel formulation of the recently introduced diff-hash algorithm for the construction of similarity-sensitive hash functions. Our kernel diff-hash algorithm that shows superior performance on the problem of image feature descriptor matching. We identify principles characterizing Solomonoff Induction by demands on an agent's external behaviour. Key concepts are rationality, computability, indifference and time consistency. Furthermore, we discuss extensions to the full AI case to derive AIXI. In this thesis I develop a variety of techniques to train, evaluate, and sample from intractable and high dimensional probabilistic models. Abstract exceeds arXiv space limitations -- see PDF. This document demonstrates that the efficient approach for diagnosis of Petri nets via integer linear programming may be unable to detect a fault even if the system is diagnosable. The problem of replicating the flexibility of human common-sense reasoning has captured the imagination of computer scientists since the early days of Alan Turing's foundational work on computation and the philosophy of artificial intelligence. In the intervening years, the idea of cognition as computation has emerged as a fundamental tenet of Artificial Intelligence (AI) and cognitive science. But what kind of computation is cognition? We describe a computational formalism centered around a probabilistic Turing machine called QUERY, which captures the operation of probabilistic conditioning via conditional simulation. Through several examples and analyses, we demonstrate how the QUERY abstraction can be used to cast common-sense reasoning as probabilistic inference in a statistical model of our observations and the uncertain structure of the world that generated that experience. This formulation is a recent synthesis of several research programs in AI and cognitive science, but it also represents a surprising convergence of several of Turing's pioneering insights in AI, the foundations of computation, and statistics. Modification of a conceptual clustering algorithm Cobweb for the purpose of its application for numerical data is offered. Keywords: clustering, algorithm Cobweb, numerical data, fuzzy membership function. We advance a doxastic interpretation for many of the logical connectives considered in Dependence Logic and in its extensions, and we argue that Team Semantics is a natural framework for reasoning about beliefs and belief updates. This is a purely pedagogical paper with no new results. The goal of the paper is to give a fairly self-contained introduction to Judea Pearl's do-calculus, including proofs of his 3 rules. This short note presents a new formal language, lambda dependency-based compositional semantics (lambda DCS) for representing logical forms in semantic parsing. By eliminating variables and making existential quantification implicit, lambda DCS logical forms are generally more compact than those in lambda calculus. In this note, we argue that the axiomatic requirement of range to the measure of aggregated total uncertainty (ATU) in Dempster-Shafer theory is not reasonable. This file summarizes the plenary talk on laboratory experiments on logic at the TARK 2013 - 14th Conference on Theoretical Aspects of Rationality and Knowledge. This document contains improved and updated proofs of convergence for the sampling method presented in our paper "Free-configuration Biased Sampling for Motion Planning". In the dawn of computer science and the eve of neuroscience we participate in rebirth of neuroscience due to new technology that allows us to deeply and precisely explore whole new world that dwells in our brains. We investigate cumulative scheduling in uncertain environments, using constraint programming. We detail in this paper the dynamic sweep filtering algorithm of the FlexC global constraint. Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify and foster progress in the discipline. We will describe and critically assess the different ways AI systems are evaluated. We first focus on the traditional task-oriented evaluation approach. We see that black-box (behavioural evaluation) is becoming more and more common, as AI systems are becoming more complex and unpredictable. We identify three kinds of evaluation: Human discrimination, problem benchmarks and peer confrontation. We describe the limitations of the many evaluation settings and competitions in these three categories and propose several ideas for a more systematic and robust evaluation. We then focus on a less customary (and challenging) ability-oriented evaluation approach, where a system is characterised by its (cognitive) abilities, rather than by the tasks it is designed to solve. We discuss several possibilities: the adaptation of cognitive tests used for humans and animals, the development of tests derived from algorithmic information theory or more general approaches under the perspective of universal psychometrics. Methods of applying neural networks to control plants are considered. Methods and schemes are described, their advantages and disadvantages are discussed. In this short note we address the issue of expressing norms (such as obligations and prohibitions) in temporal logic. In particular, we address the argument from [Governatori 2015] that norms cannot be expressed in Linear Time Temporal Logic (LTL). We introduce the Xapagy cognitive architecture: a software system designed to perform narrative reasoning. The architecture has been designed from scratch to model and mimic the activities performed by humans when witnessing, reading, recalling, narrating and talking about stories. This paper compares various optimization methods for fuzzy inference system optimization. The optimization methods compared are genetic algorithm, particle swarm optimization and simulated annealing. When these techniques were implemented it was observed that the performance of each technique within the fuzzy inference system classification was context dependent. Deontic modalities are here defined in terms of the preference relation explored in our previous work (Osherson and Weinstein, 2012). Some consequences of the system are discussed. This document serves as a brief technical report, detailing the processes used to represent and reconstruct simplified polygons using qualitative spatial descriptions, as defined by the eOPRAm qualitative spatial calculus. Tau-chain is a decentralized peer-to-peer network having three unified faces: Rules, Proofs, and Computer Programs, allowing a generalization of virtually any centralized or decentralized P2P network, together with many new abilities, as we present on this note. We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks. This note introduce three Bayesian style Multi-armed bandit algorithms: Information-directed sampling, Thompson Sampling and Generalized Thompson Sampling. The goal is to give an intuitive explanation for these three algorithms and their regret bounds, and provide some derivations that are omitted in the original papers. This short note discusses the role of syntax vs. semantics and the interplay between logic, philosophy, and language in computer science and game theory. The authors apply the Generalized Rectangular Model to assessing critical thinking skills and its relations with their language competency. This thesis investigates the generation of new concepts from combinations of existing concepts as a language evolves. We give a method for combining concepts, and will be investigating the utility of composite concepts in language evolution and thence the utility of concept generation. We present a method of training a differentiable function approximator for a regression task using negative examples. We effect this training using negative learning rates. We also show how this method can be used to perform direct policy learning in a reinforcement learning setting. This paper discusses obstacle avoidance using fuzzy logic and shortest path algorithm. This paper also introduces the sliding blades problem and illustrates how a drone can navigate itself through the swinging blade obstacles while tracing a semi-optimal path and also maintaining constant velocity We show how to adjust the coefficient of determination ($R^2$) when used for measuring predictive accuracy via leave-one-out cross-validation. This paper discusses the root cause of systems perceiving the self experience and how to exploit adaptive and learning features without introducing ethically problematic system properties. General game playing artificial intelligence has recently seen important advances due to the various techniques known as 'deep learning'. However the advances conceal equally important limitations in their reliance on: massive data sets; fortuitously constructed problems; and absence of any human-level complexity, including other human opponents. On the other hand, deep learning systems which do beat human champions, such as in Go, do not generalise well. The power of deep learning simultaneously exposes its weakness. Given that deep learning is mostly clever reconfigurations of well-established methods, moving beyond the state of art calls for forward-thinking visionary solutions, not just more of the same. I present the argument that general game playing artificial intelligence will require a generalised player model. This is because games are inherently human artefacts which therefore, as a class of problems, contain cases which require a human-style problem solving approach. I relate this argument to the performance of state of art general game playing agents. I then describe a concept for a formal category theoretic basis to a generalised player model. This formal model approach integrates my existing 'Behavlets' method for psychologically-derived player modelling: Cowley, B., Charles, D. (2016). Behavlets: a Method for Practical Player Modelling using Psychology-Based Player Traits and Domain Specific Features. User Modeling and User-Adapted Interaction, 26(2), 257-306. We formalize Simplified Boardgames language, which describes a subclass of arbitrary board games. The language structure is based on the regular expressions, which makes the rules easily machine-processable while keeping the rules concise and fairly human-readable. The main purpose of this paper is to study the lattice structure of variable precision rough sets. The notion of variation in precision of rough sets have been further extended to variable precision rough set with variable classification error and its algebraic properties are also studied. An algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several artitions, starting with the given pattern matrix or dissimilarity, similarity matrix. This paper investigates the application of consensus clustering and meta-clustering to the set of all possible partitions of a data set. We show that when using a "complement" of Rand Index as a measure of cluster similarity, the total-separation partition, putting each element in a separate set, is chosen. This thesis is in the area called computational social choice which is an intersection area of algorithms and social choice theory. I make some basic observations about hard takeoff, value alignment, and coherent extrapolated volition, concepts which have been central in analyses of superintelligent AI systems. This chapter describes the history of metaheuristics in five distinct periods, starting long before the first use of the term and ending a long time in the future. In this paper, we argue that the future of Artificial Intelligence research resides in two keywords: integration and embodiment. We support this claim by analyzing the recent advances of the field. Regarding integration, we note that the most impactful recent contributions have been made possible through the integration of recent Machine Learning methods (based in particular on Deep Learning and Recurrent Neural Networks) with more traditional ones (e.g. Monte-Carlo tree search, goal babbling exploration or addressable memory systems). Regarding embodiment, we note that the traditional benchmark tasks (e.g. visual classification or board games) are becoming obsolete as state-of-the-art learning algorithms approach or even surpass human performance in most of them, having recently encouraged the development of first-person 3D game platforms embedding realistic physics. Building upon this analysis, we first propose an embodied cognitive architecture integrating heterogenous sub-fields of Artificial Intelligence into a unified framework. We demonstrate the utility of our approach by showing how major contributions of the field can be expressed within the proposed framework. We then claim that benchmarking environments need to reproduce ecologically-valid conditions for bootstrapping the acquisition of increasingly complex cognitive skills through the concept of a cognitive arms race between embodied agents. In this work, answer-set programs that specify repairs of databases are used as a basis for solving computational and reasoning problems about causes for query answers from databases. We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000). This position paper formalises an abstract model for complex negotiation dialogue. This model is to be used for the benchmark of optimisation algorithms ranging from Reinforcement Learning to Stochastic Games, through Transfer Learning, One-Shot Learning or others. The end (for human scientists) is nigh? The posit of this discourse is that the majority, if not all, scientific research will eventually be undertaken by one, or a number of, weak artificial intelligences. MagNet and "Efficient Defenses..." were recently proposed as a defense to adversarial examples. We find that we can construct adversarial examples that defeat these defenses with only a slight increase in distortion. Safety critical systems strongly require the quality aspects of artificial intelligence including explainability. In this paper, we analyzed a trained network to extract features which mainly contribute the inference. Based on the analysis, we developed a simple solution to generate explanations of the inference processes. In this paper, we present Paranom, a parallel anomaly dataset generator. We discuss its design and provide brief experimental results demonstrating its usefulness in improving the classification correctness of LSTM-AD, a state-of-the-art anomaly detection model. This paper provides an overview of the SP theory of intelligence and its central idea that artificial intelligence, mainstream computing, and much of human perception and cognition, may be understood as information compression. The background and origins of the SP theory are described, and the main elements of the theory, including the key concept of multiple alignment, borrowed from bioinformatics but with important differences. Associated with the SP theory is the idea that redundancy in information may be understood as repetition of patterns, that compression of information may be achieved via the matching and unification (merging) of patterns, and that computing and information compression are both fundamentally probabilistic. It appears that the SP system is Turing-equivalent in the sense that anything that may be computed with a Turing machine may, in principle, also be computed with an SP machine. One of the main strengths of the SP theory and the multiple alignment concept is in modelling concepts and phenomena in artificial intelligence. Within that area, the SP theory provides a simple but versatile means of representing different kinds of knowledge, it can model both the parsing and production of natural language, with potential for the understanding and translation of natural languages, it has strengths in pattern recognition, with potential in computer vision, it can model several kinds of reasoning, and it has capabilities in planning, problem solving, and unsupervised learning. The paper includes two examples showing how alternative parsings of an ambiguous sentence may be modelled as multiple alignments, and another example showing how the concept of multiple alignment may be applied in medical diagnosis. Digital pathology is not only one of the most promising fields of diagnostic medicine, but at the same time a hot topic for fundamental research. Digital pathology is not just the transfer of histopathological slides into digital representations. The combination of different data sources (images, patient records, and *omics data) together with current advances in artificial intelligence/machine learning enable to make novel information accessible and quantifiable to a human expert, which is not yet available and not exploited in current medical settings. The grand goal is to reach a level of usable intelligence to understand the data in the context of an application task, thereby making machine decisions transparent, interpretable and explainable. The foundation of such an "augmented pathologist" needs an integrated approach: While machine learning algorithms require many thousands of training examples, a human expert is often confronted with only a few data points. Interestingly, humans can learn from such few examples and are able to instantly interpret complex patterns. Consequently, the grand goal is to combine the possibilities of artificial intelligence with human intelligence and to find a well-suited balance between them to enable what neither of them could do on their own. This can raise the quality of education, diagnosis, prognosis and prediction of cancer and other diseases. In this paper we describe some (incomplete) research issues which we believe should be addressed in an integrated and concerted effort for paving the way towards the augmented pathologist. Triggered by modern technologies, our possibilities may now expand beyond the unthinkable. Cars externally may look similar to decades ago, but a dramatic revolution happened inside the cabin as a result of their computation, communications, and storage capabilities. With the advent of Electric Autonomous Vehicles (EAVs), Artificial Intelligence and ecological technologies found the best synergy. Several transportation problems may be solved (accidents, emissions, and congestion among others), and the foundation of Machine-to-Machine (M2M) economy could be established, in addition to value-added services such as infotainment (information and entertainment). In the world where intelligent technologies are pervading everyday life, software and algorithms play a major role. Software has been lately introduced in virtually every technological product available on the market, from phones to television sets to cars and even housing. Artificial Intelligence is one of the consequences of this pervasive presence of algorithms. The role of software is becoming dominant and technology is, at times pervasive, of our existence. Concerns, such as privacy and security, demand high attention and have been already explored to some level of detail. However, intelligent agents and actors are often considered as perfect entities that will overcome human error-prone nature. This may not always be the case and we advocate that the notion of reputation is also applicable to intelligent artificial agents, in particular to EAVs. An important characteristic of many logics for Artificial Intelligence is their nonmonotonicity. This means that adding a formula to the premises can invalidate some of the consequences. There may, however, exist formulae that can always be safely added to the premises without destroying any of the consequences: we say they respect monotonicity. Also, there may be formulae that, when they are a consequence, can not be invalidated when adding any formula to the premises: we call them conservative. We study these two classes of formulae for preferential logics, and show that they are closely linked to the formulae whose truth-value is preserved along the (preferential) ordering. We will consider some preferential logics for illustration, and prove syntactic characterization results for them. The results in this paper may improve the efficiency of theorem provers for preferential logics. Randomized algorithms for deciding satisfiability were shown to be effective in solving problems with thousands of variables. However, these algorithms are not complete. That is, they provide no guarantee that a satisfying assignment, if one exists, will be found. Thus, when studying randomized algorithms, there are two important characteristics that need to be considered: the running time and, even more importantly, the accuracy --- a measure of likelihood that a satisfying assignment will be found, provided one exists. In fact, we argue that without a reference to the accuracy, the notion of the running time for randomized algorithms is not well-defined. In this paper, we introduce a formal notion of accuracy. We use it to define a concept of the running time. We use both notions to study the random walk strategy GSAT algorithm. We investigate the dependence of accuracy on properties of input formulas such as clause-to-variable ratio and the number of satisfying assignments. We demonstrate that the running time of GSAT grows exponentially in the number of variables of the input formula for randomly generated 3-CNF formulas and for the formulas encoding 3- and 4-colorability of graphs. Decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown distribution. We unify both theories and give strong arguments that the resulting universal AIXI model behaves optimal in any computable environment. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXI^tl, which is still superior to any other time t and space l bounded agent. The computation time of AIXI^tl is of the order t x 2^l. Revision programming is a formalism to describe and enforce updates of belief sets and databases. That formalism was extended by Fitting who assigned annotations to revision atoms. Annotations provide a way to quantify the confidence (probability) that a revision atom holds. The main goal of our paper is to reexamine the work of Fitting, argue that his semantics does not always provide results consistent with intuition, and to propose an alternative treatment of annotated revision programs. Our approach differs from that proposed by Fitting in two key aspects: we change the notion of a model of a program and we change the notion of a justified revision. We show that under this new approach fundamental properties of justified revisions of standard revision programs extend to the annotated case. Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way. Most traditional artificial intelligence (AI) systems of the past 50 years are either very limited, or based on heuristics, or both. The new millennium, however, has brought substantial progress in the field of theoretically optimal and practically feasible algorithms for prediction, search, inductive inference based on Occam's razor, problem solving, decision making, and reinforcement learning in environments of a very general type. Since inductive inference is at the heart of all inductive sciences, some of the results are relevant not only for AI and computer science but also for physics, provoking nontraditional predictions based on Zuse's thesis of the computer-generated universe. Fully-automatic general-purpose high-quality machine translation systems (FGH-MT) are extremely difficult to build. In fact, there is no system in the world for any pair of languages which qualifies to be called FGH-MT. The reasons are not far to seek. Translation is a creative process which involves interpretation of the given text by the translator. Translation would also vary depending on the audience and the purpose for which it is meant. This would explain the difficulty of building a machine translation system. Since, the machine is not capable of interpreting a general text with sufficient accuracy automatically at present - let alone re-expressing it for a given audience, it fails to perform as FGH-MT. FOOTNOTE{The major difficulty that the machine faces in interpreting a given text is the lack of general world knowledge or common sense knowledge.} Fusion of Artificial Neural Networks (ANN) and Fuzzy Inference Systems (FIS) have attracted the growing interest of researchers in various scientific and engineering areas due to the growing need of adaptive intelligent systems to solve the real world problems. ANN learns from scratch by adjusting the interconnections between layers. FIS is a popular computing framework based on the concept of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. The advantages of a combination of ANN and FIS are obvious. There are several approaches to integrate ANN and FIS and very often it depends on the application. We broadly classify the integration of ANN and FIS into three categories namely concurrent model, cooperative model and fully fused model. This paper starts with a discussion of the features of each model and generalize the advantages and deficiencies of each model. We further focus the review on the different types of fused neuro-fuzzy systems and citing the advantages and disadvantages of each model. Intelligent systems based on first-order logic on the one hand, and on artificial neural networks (also called connectionist systems) on the other, differ substantially. It would be very desirable to combine the robust neural networking machinery with symbolic knowledge representation and reasoning paradigms like logic programming in such a way that the strengths of either paradigm will be retained. Current state-of-the-art research, however, fails by far to achieve this ultimate goal. As one of the main obstacles to be overcome we perceive the question how symbolic knowledge can be encoded by means of connectionist systems: Satisfactory answers to this will naturally lead the way to knowledge extraction algorithms and to integrated neural-symbolic systems. W.C. Rounds and G.-Q. Zhang (2001) have proposed to study a form of disjunctive logic programming generalized to algebraic domains. This system allows reasoning with information which is hierarchically structured and forms a (suitable) domain. We extend this framework to include reasoning with default negation, giving rise to a new nonmonotonic reasoning framework on hierarchical knowledge which encompasses answer set programming with extended disjunctive logic programs. We also show that the hierarchically structured knowledge on which programming in this paradigm can be done, arises very naturally from formal concept analysis. Together, we obtain a default reasoning paradigm for conceptual knowledge which is in accordance with mainstream developments in nonmonotonic reasoning. Water plays a pivotal role in many physical processes, and most importantly in sustaining human life, animal life and plant life. Water supply entities therefore have the responsibility to supply clean and safe water at the rate required by the consumer. It is therefore necessary to implement mechanisms and systems that can be employed to predict both short-term and long-term water demands. The increasingly growing field of computational intelligence techniques has been proposed as an efficient tool in the modelling of dynamic phenomena. The primary objective of this paper is to compare the efficiency of two computational intelligence techniques in water demand forecasting. The techniques under comparison are the Artificial Neural Networks (ANNs) and the Support Vector Machines (SVMs). In this study it was observed that the ANNs perform better than the SVMs. This performance is measured against the generalisation ability of the two. When Kurt Goedel layed the foundations of theoretical computer science in 1931, he also introduced essential concepts of the theory of Artificial Intelligence (AI). Although much of subsequent AI research has focused on heuristics, which still play a major role in many practical AI applications, in the new millennium AI theory has finally become a full-fledged formal science, with important optimality results for embodied agents living in unknown environments, obtained through a combination of theory a la Goedel and probability theory. Here we look back at important milestones of AI history, mention essential recent results, and speculate about what we may expect from the next 25 years, emphasizing the significance of the ongoing dramatic hardware speedups, and discussing Goedel-inspired, self-referential, self-improving universal problem solvers. This paper introduces the concept of fitness cloud as an alternative way to visualize and analyze search spaces than given by the geographic notion of fitness landscape. It is argued that the fitness cloud concept overcomes several deficiencies of the landscape representation. Our analysis is based on the correlation between fitness of solutions and fitnesses of nearest solutions according to some neighboring. We focus on the behavior of local search heuristics, such as hill climber, on the well-known NK fitness landscape. In both cases the fitness vs. fitness correlation is shown to be related to the epistatic parameter K. The generation of meaningless "words" matching certain statistical and/or linguistic criteria is frequently needed for experimental purposes in Psycholinguistics. Such stimuli receive the name of pseudowords or nonwords in the Cognitive Neuroscience literatue. The process for building nonwords sometimes has to be based on linguistic units such as syllables or morphemes, resulting in a numerical explosion of combinations when the size of the nonwords is increased. In this paper, a reactive tabu search scheme is proposed to generate nonwords of variables size. The approach builds pseudowords by using a modified Metaheuristic algorithm based on a local search procedure enhanced by a feedback-based scheme. Experimental results show that the new algorithm is a practical and effective tool for nonword generation. Direction relations between extended spatial objects are important commonsense knowledge. Recently, Goyal and Egenhofer proposed a formal model, known as Cardinal Direction Calculus (CDC), for representing direction relations between connected plane regions. CDC is perhaps the most expressive qualitative calculus for directional information, and has attracted increasing interest from areas such as artificial intelligence, geographical information science, and image retrieval. Given a network of CDC constraints, the consistency problem is deciding if the network is realizable by connected regions in the real plane. This paper provides a cubic algorithm for checking consistency of basic CDC constraint networks, and proves that reasoning with CDC is in general an NP-Complete problem. For a consistent network of basic CDC constraints, our algorithm also returns a 'canonical' solution in cubic time. This cubic algorithm is also adapted to cope with cardinal directions between possibly disconnected regions, in which case currently the best algorithm is of time complexity O(n^5). Why did the big bang occur, why do the laws and constants of nature as well as the boundary conditions seem so fine-tuned for life, what is the role of intelligence and self-consciousness in the universe, and how can it escape cosmic doomsday? The hypothesis of Cosmological Artificial Selection (CAS) connects those questions and suggests a far-reaching answer: Our universe might be understood in terms of vast computer simulations and could even have been created and transcended by one. - This essay critically discusses some of the premises and implications of CAS and related problems both with the proposal itself and its possible physical realization: Is our universe really fine-tuned, does CAS deserve to be considered as a convincing explanation, and which other options are available to understand the physical laws, constants and boundary conditions? Is life incidental, and does CAS revalue it? And is intelligence and self-consciousness ultimately doomed, or might CAS rescue it? Keywords: origin of the universe, big bang, fine-tuning, laws of nature, physical constants, initial conditions, intelligent life, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis, deism, natural theology, far future of the universe, physical eschatology The aim of this work is to address the question of whether we can in principle design rational decision-making agents or artificial intelligences embedded in computable physics such that their decisions are optimal in reasonable mathematical senses. Recent developments in rare event probability estimation, recursive bayesian inference, neural networks, and probabilistic planning are sufficient to explicitly approximate reinforcement learners of the AIXI style with non-trivial model classes (here, the class of resource-bounded Turing machines). Consideration of the effects of resource limitations in a concrete implementation leads to insights about possible architectures for learning systems using optimal decision makers as components. The methodology of Bayesian Model Averaging (BMA) is applied for assessment of newborn brain maturity from sleep EEG. In theory this methodology provides the most accurate assessments of uncertainty in decisions. However, the existing BMA techniques have been shown providing biased assessments in the absence of some prior information enabling to explore model parameter space in details within a reasonable time. The lack in details leads to disproportional sampling from the posterior distribution. In case of the EEG assessment of brain maturity, BMA results can be biased because of the absence of information about EEG feature importance. In this paper we explore how the posterior information about EEG features can be used in order to reduce a negative impact of disproportional sampling on BMA performance. We use EEG data recorded from sleeping newborns to test the efficiency of the proposed BMA technique. Symmetry can be used to help solve many problems. For instance, Einstein's famous 1905 paper ("On the Electrodynamics of Moving Bodies") uses symmetry to help derive the laws of special relativity. In artificial intelligence, symmetry has played an important role in both problem representation and reasoning. I describe recent work on using symmetry to help solve constraint satisfaction problems. Symmetries occur within individual solutions of problems as well as between different solutions of the same problem. Symmetry can also be applied to the constraints in a problem to give new symmetric constraints. Reasoning about symmetry can speed up problem solving, and has led to the discovery of new results in both graph and number theory. The constraint satisfaction problem (CSP) is a central generic problem in computer science and artificial intelligence: it provides a common framework for many theoretical problems as well as for many real-life applications. Soft constraint problems are a generalisation of the CSP which allow the user to model optimisation problems. Considerable effort has been made in identifying properties which ensure tractability in such problems. In this work, we initiate the study of hybrid tractability of soft constraint problems; that is, properties which guarantee tractability of the given soft constraint problem, but which do not depend only on the underlying structure of the instance (such as being tree-structured) or only on the types of soft constraints in the instance (such as submodularity). We present several novel hybrid classes of soft constraint problems, which include a machine scheduling problem, constraint problems of arbitrary arities with no overlapping nogoods, and the SoftAllDiff constraint with arbitrary unary soft constraints. An important tool in our investigation will be the notion of forbidden substructures. The mathematical formalism of quantum mechanics has been successfully employed in the last years to model situations in which the use of classical structures gives rise to problematical situations, and where typically quantum effects, such as 'contextuality' and 'entanglement', have been recognized. This 'Quantum Interaction Approach' is briefly reviewed in this paper focusing, in particular, on the quantum models that have been elaborated to describe how concepts combine in cognitive science, and on the ensuing identification of a quantum structure in human thought. We point out that these results provide interesting insights toward the development of a unified theory for meaning and knowledge formalization and representation. Then, we analyze the technological aspects and implications of our approach, and a particular attention is devoted to the connections with symbolic artificial intelligence, quantum computation and robotics. In the recent literature of Artificial Intelligence, an intensive research effort has been spent, for various algebras of qualitative relations used in the representation of temporal and spatial knowledge, on the problem of classifying the computational complexity of reasoning problems for subsets of algebras. The main purpose of these researches is to describe a restricted set of maximal tractable subalgebras, ideally in an exhaustive fashion with respect to the hosting algebras. In this paper we introduce a novel algebra for reasoning about Spatial Congruence, show that the satisfiability problem in the spatial algebra MC-4 is NP-complete, and present a complete classification of tractability in the algebra, based on the individuation of three maximal tractable subclasses, one containing the basic relations. The three algebras are formed by 14, 10 and 9 relations out of 16 which form the full algebra. Supply chain formation is the process of determining the structure and terms of exchange relationships to enable a multilevel, multiagent production activity. We present a simple model of supply chains, highlighting two characteristic features: hierarchical subtask decomposition, and resource contention. To decentralize the formation process, we introduce a market price system over the resources produced along the chain. In a competitive equilibrium for this system, agents choose locally optimal allocations with respect to prices, and outcomes are optimal overall. To determine prices, we define a market protocol based on distributed, progressive auctions, and myopic, non-strategic agent bidding policies. In the presence of resource contention, this protocol produces better solutions than the greedy protocols common in the artificial intelligence and multiagent systems literature. The protocol often converges to high-value supply chains, and when competitive equilibria exist, typically to approximate competitive equilibria. However, complementarities in agent production technologies can cause the protocol to wastefully allocate inputs to agents that do not produce their outputs. A subsequent decommitment phase recovers a significant fraction of the lost surplus. Many researchers in artificial intelligence are beginning to explore the use of soft constraints to express a set of (possibly conflicting) problem requirements. A soft constraint is a function defined on a collection of variables which associates some measure of desirability with each possible combination of values for those variables. However, the crucial question of the computational complexity of finding the optimal solution to a collection of soft constraints has so far received very little attention. In this paper we identify a class of soft binary constraints for which the problem of finding the optimal solution is tractable. In other words, we show that for any given set of such constraints, there exists a polynomial time algorithm to determine the assignment having the best overall combined measure of desirability. This tractable class includes many commonly-occurring soft constraints, such as 'as near as possible' or 'as soon as possible after', as well as crisp constraints such as 'greater than'. Finally, we show that this tractable class is maximal, in the sense that adding any other form of soft binary constraint which is not in the class gives rise to a class of problems which is NP-hard. Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a simple approach to parallel best-first search that asynchronously distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in an optimal sequential version of the Fast Downward planner, as well as a 24-puzzle solver. The scaling behavior of HDA* is evaluated experimentally on a shared memory, multicore machine with 8 cores, a cluster of commodity machines using up to 64 cores, and large-scale high-performance clusters, using up to 2400 processors. We show that this approach scales well, allowing the effective utilization of large amounts of distributed memory to optimally solve problems which require terabytes of RAM. We also compare HDA* to Transposition-table Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that, in planning, HDA* significantly outperforms TDS. A simple hybrid which combines HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated. Many methods have been developed to secure the network infrastructure and communication over the Internet. Intrusion detection is a relatively new addition to such techniques. Intrusion detection systems (IDS) are used to find out if someone has intrusion into or is trying to get it the network. One big problem is amount of Intrusion which is increasing day by day. We need to know about network attack information using IDS, then analysing the effect. Due to the nature of IDSs which are solely signature based, every new intrusion cannot be detected; so it is important to introduce artificial intelligence (AI) methods / techniques in IDS. Introduction of AI necessitates the importance of normalization in intrusions. This work is focused on classification of AI based IDS techniques which will help better design intrusion detection systems in the future. We have also proposed a support vector machine for IDS to detect Smurf attack with much reliable accuracy. The rapid growth and diversity in service offerings and the ensuing complexity of information technology ecosystems present numerous management challenges (both operational and strategic). Instrumentation and measurement technology is, by and large, keeping pace with this development and growth. However, the algorithms, tools, and technology required to transform the data into relevant information for decision making are not. The claim in this paper (and the invited talk) is that the line of research conducted in Uncertainty in Artificial Intelligence is very well suited to address the challenges and close this gap. I will support this claim and discuss open problems using recent examples in diagnosis, model discovery, and policy optimization on three real life distributed systems. Inferring from inconsistency and making decisions are two problems which have always been treated separately by researchers in Artificial Intelligence. Consequently, different models have been proposed for each category. Different argumentation systems [2, 7, 10, 11] have been developed for handling inconsistency in knowledge bases. Recently, other argumentation systems [3, 4, 8] have been defined for making decisions under uncertainty. The aim of this paper is to present a general argumentation framework in which both inferring from inconsistency and decision making are captured. The proposed framework can be used for decision under uncertainty, multiple criteria decision, rule-based decision and finally case-based decision. Moreover, works on classical decision suppose that the information about environment is coherent, and this no longer required by this general framework. Decision-theoretic planning with risk-sensitive planning objectives is important for building autonomous agents or decision-support systems for real-world applications. However, this line of research has been largely ignored in the artificial intelligence and operations research communities since planning with risk-sensitive planning objectives is more complicated than planning with risk-neutral planning objectives. To remedy this situation, we derive conditions that guarantee that the optimal expected utilities of the total plan-execution reward exist and are finite for fully observable Markov decision process models with non-linear utility functions. In case of Markov decision process models with both positive and negative rewards, most of our results hold for stationary policies only, but we conjecture that they can be generalized to non stationary policies. Poker is a challenging problem for artificial intelligence, with non-deterministic dynamics, partial observability, and the added difficulty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of the opponent's strategy. We then describe approaches to two key subproblems: (i) inferring a posterior over opponent strategies given a prior distribution and observations of their play, and (ii) playing an appropriate response to that distribution. We demonstrate the overall approach on a reduced version of poker using Dirichlet priors and then on the full game of Texas hold'em using a more informed prior. We demonstrate methods for playing effective responses to the opponent, based on the posterior. Planning in partially observable Markov decision processes (POMDPs) remains a challenging topic in the artificial intelligence community, in spite of recent impressive progress in approximation techniques. Previous research has indicated that online planning approaches are promising in handling large-scale POMDP domains efficiently as they make decisions "on demand" instead of proactively for the entire state space. We present a Factored Hybrid Heuristic Online Planning (FHHOP) algorithm for large POMDPs. FHHOP gets its power by combining a novel hybrid heuristic search strategy with a recently developed factored state representation. On several benchmark problems, FHHOP substantially outperformed state-of-the-art online heuristic search approaches in terms of both scalability and quality. Intuitively, the concept of similarity is the notion to measure an inexact matching between two entities of the same reference set. The notions of similarity and its close relative dissimilarity are widely used in many fields of Artificial Intelligence. Yet they have many different and often partial definitions or properties, usually restricted to one field of application and thus incompatible with other uses. This paper contributes to the design and understanding of similarity and dissimilarity measures for Artificial Intelligence. A formal dual definition for each concept is proposed, joined with a set of fundamental properties. The behavior of the properties under several transformations is studied and revealed as an important matter to bear in mind. We also develop several practical examples that work out the proposed approach. We present four novel approximation algorithms for finding triangulation of minimum treewidth. Two of the algorithms improve on the running times of algorithms by Robertson and Seymour, and Becker and Geiger that approximate the optimum by factors of 4 and 3 2/3, respectively. A third algorithm is faster than those but gives an approximation factor of 4 1/2. The last algorithm is yet faster, producing factor-O(lg/k) approximations in polynomial time. Finding triangulations of minimum treewidth for graphs is central to many problems in computer science. Real-world problems in artificial intelligence, VLSI design and databases are efficiently solvable if we have an efficient approximation algorithm for them. We report on experimental results confirming the effectiveness of our algorithms for large graphs associated with real-world problems. There is available an ever-increasing variety of procedures for managing uncertainty. These methods are discussed in the literature of artificial intelligence, as well as in the literature of philosophy of science. Heretofore these methods have been evaluated by intuition, discussion, and the general philosophical method of argument and counterexample. Almost any method of uncertainty management will have the property that in the long run it will deliver numbers approaching the relative frequency of the kinds of events at issue. To find a measure that will provide a meaningful evaluation of these treatments of uncertainty, we must look, not at the long run, but at the short or intermediate run. Our project attempts to develop such a measure in terms of short or intermediate length performance. We represent the effects of practical choices by the outcomes of bets offered to agents characterized by two uncertainty management approaches: the subjective Bayesian approach and the Classical confidence interval approach. Experimental evaluation suggests that the confidence interval approach can outperform the subjective approach in the relatively short run. We introduce a new class of graphical representations, expected utility networks (EUNs), and discuss some of its properties and potential applications to artificial intelligence and economic theory. In EUNs not only probabilities, but also utilities enjoy a modular representation. EUNs are undirected graphs with two types of arc, representing probability and utility dependencies respectively. The representation of utilities is based on a novel notion of conditional utility independence, which we introduce and discuss in the context of other existing proposals. Just as probabilistic inference involves the computation of conditional probabilities, strategic inference involves the computation of conditional expected utilities for alternative plans of action. We define a new notion of conditional expected utility (EU) independence, and show that in EUNs node separation with respect to the probability and utility subgraphs implies conditional EU independence. We extend Gaussian networks - directed acyclic graphs that encode probabilistic relationships between variables - to its vector form. Vector Gaussian continuous networks consist of composite nodes representing multivariates, that take continuous values. These vector or composite nodes can represent correlations between parents, as opposed to conventional univariate nodes. We derive rules for inference in these networks based on two methods: message propagation and topology transformation. These two approaches lead to the development of algorithms, that can be implemented in either a centralized or a decentralized manner. The domain of application of these networks are monitoring and estimation problems. This new representation along with the rules for inference developed here can be used to derive current Bayesian algorithms such as the Kalman filter, and provide a rich foundation to develop new algorithms. We illustrate this process by deriving the decentralized form of the Kalman filter. This work unifies concepts from artificial intelligence and modern control theory. Sequential decision tasks with incomplete information are characterized by the exploration problem; namely the trade-off between further exploration for learning more about the environment and immediate exploitation of the accrued information for decision-making. Within artificial intelligence, there has been an increasing interest in studying planning-while-learning algorithms for these decision tasks. In this paper we focus on the exploration problem in reinforcement learning and Q-learning in particular. The existing exploration strategies for Q-learning are of a heuristic nature and they exhibit limited scaleability in tasks with large (or infinite) state and action spaces. Efficient experimentation is needed for resolving uncertainties when possible plans are compared (i.e. exploration). The experimentation should be sufficient for selecting with statistical significance a locally optimal plan (i.e. exploitation). For this purpose, we develop a probabilistic hill-climbing algorithm that uses a statistical selection procedure to decide how much exploration is needed for selecting a plan which is, with arbitrarily high probability, arbitrarily close to a locally optimal one. Due to its generality the algorithm can be employed for the exploration strategy of robust Q-learning. An experiment on a relatively complex control task shows that the proposed exploration strategy performs better than a typical exploration strategy. Several Artificial Intelligence schemes for reasoning under uncertainty explore either explicitly or implicitly asymmetries among probabilities of various states of their uncertain domain models. Even though the correct working of these schemes is practically contingent upon the existence of a small number of probable states, no formal justification has been proposed of why this should be the case. This paper attempts to fill this apparent gap by studying asymmetries among probabilities of various states of uncertain models. By rewriting the joint probability distribution over a model's variables into a product of individual variables' prior and conditional probability distributions, and applying central limit theorem to this product, we can demonstrate that the probabilities of individual states of the model can be expected to be drawn from highly skewed, log-normal distributions. With sufficient asymmetry in individual prior and conditional probability distributions, a small fraction of states can be expected to cover a large portion of the total probability space with the remaining states having practically negligible probability. Theoretical discussion is supplemented by simulation results and an illustrative real-world example. This paper compares three approaches to the problem of selecting among probability models to fit data (1) use of statistical criteria such as Akaike's information criterion and Schwarz's "Bayesian information criterion," (2) maximization of the posterior probability of the model, and (3) maximization of an effectiveness ratio? trading off accuracy and computational cost. The unifying characteristic of the approaches is that all can be viewed as maximizing a penalized likelihood function. The second approach with suitable prior distributions has been shown to reduce to the first. This paper shows that the third approach reduces to the second for a particular form of the effectiveness ratio, and illustrates all three approaches with the problem of selecting the number of components in a mixture of Gaussian distributions. Unlike the first two approaches, the third can be used even when the candidate models are chosen for computational efficiency, without regard to physical interpretation, so that the likelihood and the prior distribution over models cannot be interpreted literally. As the most general and computationally oriented of the approaches, it is especially useful for artificial intelligence applications. Bayesian networks have been used extensively in diagnostic tasks such as medicine, where they represent the dependency relations between a set of symptoms and a set of diseases. A criticism of this type of knowledge representation is that it is restricted to this kind of task, and that it cannot cope with the knowledge required in other artificial intelligence applications. For example, in computer vision, we require the ability to model complex knowledge, including temporal and relational factors. In this paper we extend Bayesian networks to model relational and temporal knowledge for high-level vision. These extended networks have a simple structure which permits us to propagate probability efficiently. We have applied them to the domain of endoscopy, illustrating how the general modelling principles can be used in specific cases. At last year?s Uncertainty in AI Conference, we reported the results of a sensitivity analysis study of Pathfinder. Our findings were quite unexpected-slight variations to Pathfinder?s parameters appeared to lead to substantial degradations in system performance. A careful look at our first analysis, together with the valuable feedback provided by the participants of last year?s conference, led us to conduct a follow-up study. Our follow-up differs from our initial study in two ways: (i) the probabilities 0.0 and 1.0 remained unchanged, and (ii) the variations to the probabilities that are close to both ends (0.0 or 1.0) were less than the ones close to the middle (0.5). The results of the follow-up study look more reasonable-slight variations to Pathfinder?s parameters now have little effect on its performance. Taken together, these two sets of results suggest a viable extension of a common decision analytic sensitivity analysis to the larger, more complex settings generally encountered in artificial intelligence. We study the model of projective simulation (PS), a novel approach to artificial intelligence based on stochastic processing of episodic memory which was recently introduced [H.J. Briegel and G. De las Cuevas. Sci. Rep. 2, 400, (2012)]. Here we provide a detailed analysis of the model and examine its performance, including its achievable efficiency, its learning times and the way both properties scale with the problems' dimension. In addition, we situate the PS agent in different learning scenarios, and study its learning abilities. A variety of new scenarios are being considered, thereby demonstrating the model's flexibility. Furthermore, to put the PS scheme in context, we compare its performance with those of Q-learning and learning classifier systems, two popular models in the field of reinforcement learning. It is shown that PS is a competitive artificial intelligence model of unique properties and strengths. An artificial Ant Colony System (ACS) algorithm to solve general-purpose combinatorial Optimization Problems (COP) that extends previous AC models [21] by the inclusion of a negative pheromone, is here described. Several Travelling Salesman Problem (TSP) were used as benchmark. We show that by using two different sets of pheromones, a second-order co-evolved compromise between positive and negative feedbacks achieves better results than single positive feedback systems. The algorithm was tested against known NP-complete combinatorial Optimization Problems, running on symmetrical TSP's. We show that the new algorithm compares favourably against these benchmarks, accordingly to recent biological findings by Robinson [26,27], and Gruter [28] where "No entry" signals and negative feedback allows a colony to quickly reallocate the majority of its foragers to superior food patches. This is the first time an extended ACS algorithm is implemented with these successful characteristics. In this paper, we present an illustration to the history of Artificial Intelligence(AI) with a statistical analysis of publish since 1940. We collected and mined through the IEEE publish data base to analysis the geological and chronological variance of the activeness of research in AI. The connections between different institutes are showed. The result shows that the leading community of AI research are mainly in the USA, China, the Europe and Japan. The key institutes, authors and the research hotspots are revealed. It is found that the research institutes in the fields like Data Mining, Computer Vision, Pattern Recognition and some other fields of Machine Learning are quite consistent, implying a strong interaction between the community of each field. It is also showed that the research of Electronic Engineering and Industrial or Commercial applications are very active in California. Japan is also publishing a lot of papers in robotics. Due to the limitation of data source, the result might be overly influenced by the number of published articles, which is to our best improved by applying network keynode analysis on the research community instead of merely count the number of publish. Bound propagation is an important Artificial Intelligence technique used in Constraint Programming tools to deal with numerical constraints. It is typically embedded within a search procedure ("branch and prune") and used at every node of the search tree to narrow down the search space, so it is critical that it be fast. The procedure invokes constraint propagators until a common fixpoint is reached, but the known algorithms for this have a pseudo-polynomial worst-case time complexity: they are fast indeed when the variables have a small numerical range, but they have the well-known problem of being prohibitively slow when these ranges are large. An important question is therefore whether strongly-polynomial algorithms exist that compute the common bound consistent fixpoint of a set of constraints. This paper answers this question. In particular we show that this fixpoint computation is in fact NP-complete, even when restricted to binary linear constraints. Many Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted. Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering problem, where there is a tradeoff between precision and recall based metrics and results are particularly sensitive to the weighting scheme used to combine them. Remarkably, our experiments show that UIR can be used as a predictor of how well differences between systems measured on a given test bed will also hold in a different test bed. Real Time Strategy (RTS) games provide complex domain to test the latest artificial intelligence (AI) research. In much of the literature, AI systems have been limited to playing one game. Although, this specialization has resulted in stronger AI gaming systems it does not address the key concerns of AI researcher. AI researchers seek the development of AI agents that can autonomously interpret learn, and apply new knowledge. To achieve human level performance, current AI systems rely on game specific knowledge of an expert. The paper presents the full RTS language in hopes of shifting the current research focus to the development of general RTS agents. General RTS agents are AI gaming systems that can play any RTS games, defined in the RTS language. This prevents game specific knowledge from being hard coded into the system, thereby facilitating research that addresses the fundamental concerns of artificial intelligence. We extend the notion of anti-unification to cover equational theories and present a method based on regular tree grammars to compute a finite representation of E-generalization sets. We present a framework to combine Inductive Logic Programming and E-generalization that includes an extension of Plotkin's lgg theorem to the equational case. We demonstrate the potential power of E-generalization by three example applications: computation of suggestions for auxiliary lemmas in equational inductive proofs, computation of construction laws for given term sequences, and learning of screen editor command sequences. Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way. Path planning is typically considered in Artificial Intelligence as a graph searching problem and R* is state-of-the-art algorithm tailored to solve it. The algorithm decomposes given path finding task into the series of subtasks each of which can be easily (in computational sense) solved by well-known methods (such as A*). Parameterized random choice is used to perform the decomposition and as a result R* performance largely depends on the choice of its input parameters. In our work we formulate a range of assumptions concerning possible upper and lower bounds of R* parameters, their interdependency and their influence on R* performance. Then we evaluate these assumptions by running a large number of experiments. As a result we formulate a set of heuristic rules which can be used to initialize the values of R* parameters in a way that leads to algorithm's best performance. Artificially intelligent agents equipped with strategic skills that can negotiate during their interactions with other natural or artificial agents are still underdeveloped. This paper describes a successful application of Deep Reinforcement Learning (DRL) for training intelligent agents with strategic conversational skills, in a situated dialogue setting. Previous studies have modelled the behaviour of strategic agents using supervised learning and traditional reinforcement learning techniques, the latter using tabular representations or learning with linear function approximation. In this study, we apply DRL with a high-dimensional state space to the strategic board game of Settlers of Catan---where players can offer resources in exchange for others and they can also reply to offers made by other players. Our experimental results report that the DRL-based learnt policies significantly outperformed several baselines including random, rule-based, and supervised-based behaviours. The DRL-based policy has a 53% win rate versus 3 automated players (`bots'), whereas a supervised player trained on a dialogue corpus in this setting achieved only 27%, versus the same 3 bots. This result supports the claim that DRL is a promising framework for training dialogue systems, and strategic agents with negotiation abilities. Verification and validation of agentic behavior have been suggested as important research priorities in efforts to reduce risks associated with the creation of general artificial intelligence (Russell et al 2015). In this paper we question the appropriateness of using language of certainty with respect to efforts to manage that risk. We begin by establishing a very general formalism to characterize agentic behavior and to describe standards of acceptable behavior. We show that determination of whether an agent meets any particular standard is not computable. We discuss the extent of the burden associated with verification by manual proof and by automated behavioral governance. We show that to ensure decidability of the behavioral standard itself, one must further limit the capabilities of the agent. We then demonstrate that if our concerns relate to outcomes in the physical world, attempts at validation are futile. Finally, we show that layered architectures aimed at making these challenges tractable mistakenly equate intentions with actions or outcomes, thereby failing to provide any guarantees. We conclude with a discussion of why language of certainty should be eradicated from the conversation about the safety of general artificial intelligence. Objective. To pilot test an artificial intelligence (AI) algorithm that selects peer change agents (PCA) to disseminate HIV testing messaging in a population of homeless youth. Methods. We recruited and assessed 62 youth at baseline, 1 month (n = 48), and 3 months (n = 38). A Facebook app collected preliminary social network data. Eleven PCAs selected by AI attended a 1-day training and 7 weekly booster sessions. Mixed-effects models with random effects were used to assess change over time. Results. Significant change over time was observed in past 6-month HIV testing (57.9%, 82.4%, 76.3%; p < .05) but not condom use (63.9%, 65.7%, 65.8%). Most youth reported speaking to a PCA about HIV prevention (72.0% at 1 month, 61.5% at 3 months). Conclusions. AI is a promising avenue for implementing PCA models for homeless youth. Increasing rates of regular HIV testing is critical to HIV prevention and linking homeless youth to treatment. In recent years, researchers in decision analysis and artificial intelligence (AI) have used Bayesian belief networks to build models of expert opinion. Using standard methods drawn from the theory of computational complexity, workers in the field have shown that the problem of exact probabilistic inference on belief networks almost certainly requires exponential computation in the worst ease [3]. We have previously described a randomized approximation scheme, called BN-RAS, for computation on belief networks [ 1, 2, 4]. We gave precise analytic bounds on the convergence of BN-RAS and showed how to trade running time for accuracy in the evaluation of posterior marginal probabilities. We now extend our previous results and demonstrate the generality of our framework by applying similar mathematical techniques to the analysis of convergence for logic sampling [7], an alternative simulation algorithm for probabilistic inference. Rule based reasoning (RBR) and case based reasoning (CBR) have emerged as two important and complementary reasoning methodologies in artificial intelligence (Al). For problem solving in complex, real world situations, it is useful to integrate RBR and CBR. This paper presents an approach to achieve a compact and seamless integration of RBR and CBR within the base architecture of rules. The paper focuses on the possibilistic nature of the approximate reasoning methodology common to both CBR and RBR. In CBR, the concept of similarity is casted as the complement of the distance between cases. In RBR the transitivity of similarity is the basis for the approximate deductions based on the generalized modus ponens. It is shown that the integration of CBR and RBR is possible without altering the inference engine of RBR. This integration is illustrated in the financial domain of mergers and acquisitions. These ideas have been implemented in a prototype system called MARS. In this paper, we first consider a Bayesian framework and model the "utility function" in terms of fuzzy random variables. On the basis of this model, we define the "prior (fuzzy) expected utility" associated with each action, and the corresponding "posterior (fuzzy) expected utility given sample information from a random experiment". The aim of this paper is to analyze how sample information can affect the expected utility. In this way, by using some fuzzy preference relations, we conclude that sample information allows a decision maker to increase the expected utility on the average. The upper bound on the value of the expected utility is when the decision maker has perfect information. Applications of this work to the field of artificial intelligence are presented through two examples. This paper shows that the common method used for making predictions under uncertainty in A1 and science is in error. This method is to use currently available data to select the best model from a given class of models-this process is called abduction-and then to use this model to make predictions about future data. The correct method requires averaging over all the models to make a prediction-we call this method transduction. Using transduction, an AI system will not give misleading results when basing predictions on small amounts of data, when no model is clearly best. For common classes of models we show that the optimal solution can be given in closed form. During the ongoing debate over the representation of uncertainty in Artificial Intelligence, Cheeseman, Lemmer, Pearl, and others have argued that probability theory, and in particular the Bayesian theory, should be used as the basis for the inference mechanisms of Expert Systems dealing with uncertainty. In order to pursue the issue in a practical setting, sophisticated tools for knowledge engineering are needed that allow flexible and understandable interaction with the underlying knowledge representation schemes. This paper describes a Generalized Bayesian framework for building expert systems which function in uncertain domains, using algorithms proposed by Lemmer. It is neither rule-based nor frame-based, and requires a new system of knowledge engineering tools. The framework we describe provides a knowledge-based system architecture with an inference engine, explanation capability, and a unique aid for building consistent knowledge bases. Belief updating schemes in artificial intelligence may be viewed as three dimensional languages, consisting of a syntax (e.g. probabilities or certainty factors), a calculus (e.g. Bayesian or CF combination rules), and a semantics (i.e. cognitive interpretations of competing formalisms). This paper studies the rational scope of those languages on the syntax and calculus grounds. In particular, the paper presents an endomorphism theorem which highlights the limitations imposed by the conditional independence assumptions implicit in the CF calculus. Implications of the theorem to the relationship between the CF and the Bayesian languages and the Dempster-Shafer theory of evidence are presented. The paper concludes with a discussion of some implications on rule-based knowledge engineering in uncertain domains. This paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to generalization and conceptual clustering which tend to focus on necessary and sufficient feature rules, equivalence classes, and simple search and match schemes. WITT is shown to be more consistent with human categorization while potentially including results produced by more traditional clustering schemes. Applications of this approach in the domains of expert systems and information retrieval are also discussed. Artificial intelligence applications such as industrial robotics, military surveillance, and hazardous environment clean-up, require situation understanding based on partial, uncertain, and ambiguous or erroneous evidence. It is necessary to evaluate the relative likelihood of multiple possible hypotheses of the (current) situation faced by the decision making program. Often, the evidence and hypotheses are hierarchical in nature. In image understanding tasks, for example, evidence begins with raw imagery, from which ambiguous features are extracted which have multiple possible aggregations providing evidential support for the presence of multiple hypothesis of objects and terrain, which in turn aggregate in multiple ways to provide partial evidence for different interpretations of the ambient scene. Information fusion for military situation understanding has a similar evidence/hypothesis hierarchy from multiple sensor through message level interpretations, and also provides evidence at multiple levels of the doctrinal hierarchy of military forces. This is a preliminary theoretical discussion on the computational requirements of the state of the art smoothed particle hydrodynamics (SPH) from the optics of pattern recognition and artificial intelligence. It is pointed out in the present paper that, when including anisotropy detection to improve resolution on shock layer, SPH is a very peculiar case of unsupervised machine learning. On the other hand, the free particle nature of SPH opens an opportunity for artificial intelligence to study particles as agents acting in a collaborative framework in which the timed outcomes of a fluid simulation forms a large knowledge base, which might be very attractive in computational astrophysics phenomenological problems like self-propagating star formation. In many technical fields, single-objective optimization procedures in continuous domains involve expensive numerical simulations. In this context, an improvement of the Artificial Bee Colony (ABC) algorithm, called the Artificial super-Bee enhanced Colony (AsBeC), is presented. AsBeC is designed to provide fast convergence speed, high solution accuracy and robust performance over a wide range of problems. It implements enhancements of the ABC structure and hybridizations with interpolation strategies. The latter are inspired by the quadratic trust region approach for local investigation and by an efficient global optimizer for separable problems. Each modification and their combined effects are studied with appropriate metrics on a numerical benchmark, which is also used for comparing AsBeC with some effective ABC variants and other derivative-free algorithms. In addition, the presented algorithm is validated on two recent benchmarks adopted for competitions in international conferences. Results show remarkable competitiveness and robustness for AsBeC. Relentless progress in artificial intelligence (AI) is increasingly raising concerns that machines will replace humans on the job market, and perhaps altogether. Eliezer Yudkowski and others have explored the possibility that a promising future for humankind could be guaranteed by a superintelligent "Friendly AI", designed to safeguard humanity and its values. I argue that, from a physics perspective where everything is simply an arrangement of elementary particles, this might be even harder than it appears. Indeed, it may require thinking rigorously about the meaning of life: What is "meaning" in a particle arrangement? What is "life"? What is the ultimate ethical imperative, i.e., how should we strive to rearrange the particles of our Universe and shape its future? If we fail to answer the last question rigorously, this future is unlikely to contain humans. The production system is a theoretical model of computation relevant to the artificial intelligence field allowing for problem solving procedures such as hierarchical tree search. In this work we explore some of the connections between artificial intelligence and quantum computation by presenting a model for a quantum production system. Our approach focuses on initially developing a model for a reversible production system which is a simple mapping of Bennett's reversible Turing machine. We then expand on this result in order to accommodate for the requirements of quantum computation. We present the details of how our proposition can be used alongside Grover's algorithm in order to yield a speedup comparatively to its classical counterpart. We discuss the requirements associated with such a speedup and how it compares against a similar quantum hierarchical search approach. This paper critically assesses the anti-functionalist stance on consciousness adopted by certain advocates of integrated information theory (IIT), a corollary of which is that human-level artificial intelligence implemented on conventional computing hardware is necessarily not conscious. The critique draws on variations of a well-known gradual neuronal replacement thought experiment, as well as bringing out tensions in IIT's treatment of self-knowledge. The aim, though, is neither to reject IIT outright nor to champion functionalism in particular. Rather, it is suggested that both ideas have something to offer a scientific understanding of consciousness, as long as they are not dressed up as solutions to illusory metaphysical problems. As for human-level AI, we must await its development before we can decide whether or not to ascribe consciousness to it. This paper presents a tentative outline for the construction of an artificial, generally intelligent system (AGI). It is argued that building a general data compression algorithm solving all problems up to a complexity threshold should be the main thrust of research. A measure for partial progress in AGI is suggested. Although the details are far from being clear, some general properties for a general compression algorithm are fleshed out. Its inductive bias should be flexible and adapt to the input data while constantly searching for a simple, orthogonal and complete set of hypotheses explaining the data. It should recursively reduce the size of its representations thereby compressing the data increasingly at every iteration. Abstract Based on that fundamental ability, a grounded reasoning system is proposed. It is argued how grounding and flexible feature bases made of hypotheses allow for resourceful thinking. While the simulation of representation contents on the mental stage accounts for much of the power of propositional logic, compression leads to simple sets of hypotheses that allow the detection and verification of universally quantified statements. Abstract Together, it is highlighted how general compression and grounded reasoning could account for the birth and growth of first concepts about the world and the commonsense reasoning about them. Nuclear magnetic resonance (NMR) spectroscopy is a powerful method for the investigation of three-dimensional structures of biological molecules such as proteins. Determining a protein structure is essential for understanding its function and alterations in function which lead to disease. One of the major challenges of the post-genomic era is to obtain structural and functional information on the many unknown proteins encoded by thousands of newly identified genes. The goal of this research is to design an algorithm capable of automating the analysis of backbone protein NMR data by implementing AI strategies such as greedy and A* search. A minimalistic cognitive architecture called MANIC is presented. The MANIC architecture requires only three function approximating models, and one state machine. Even with so few major components, it is theoretically sufficient to achieve functional equivalence with all other cognitive architectures, and can be practically trained. Instead of seeking to transfer architectural inspiration from biology into artificial intelligence, MANIC seeks to minimize novelty and follow the most well-established constructs that have evolved within various sub-fields of data science. From this perspective, MANIC offers an alternate approach to a long-standing objective of artificial intelligence. This paper provides a theoretical analysis of the MANIC architecture. Search is a central problem in artificial intelligence, and breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental ways to search. In this paper we derive estimates for average BFS and DFS runtime. The average runtime estimates can be used to allocate resources or judge the hardness of a problem. They can also be used for selecting the best graph representation, and for selecting the faster algorithm out of BFS and DFS. They may also form the basis for an analysis of more advanced search methods. The paper treats both tree search and graph search. For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results to predict BFS and DFS runtime on two concrete grammar problems and on the N-puzzle. Experimental verification shows that our analytical approximations come close to empirical reality. The artificial intelligence received broad interpretation as a literary image. This approach did not have unambiguous refering to the scopes of logical studies and mathematical investigations. An author applied methods peculiar to the semiotic approach, offered by Boris Uspensky and Yury Lotman. In addition, the article presented the criticism of modern versions of educational technologies, which led to the unconditional expectations for possibilities of information and telecommunication technologies. Methodological culture's growth, which was described on the base of semiotics and functional approach to word formation of new meanings for the description of the studied subjects, provided the development of pupils' thought. As a result, the research opened new prospects on understanding of artificial intelligence within educational practice. Decision making is a vital function in the age of machine learning and artificial intelligence; however, its physical realizations and their theoretical fundamentals are not yet known. In our former study, we demonstrated that single photons can be used to make decisions in uncertain, dynamically changing environments. The multi-armed bandit problem was successfully solved using the dual probabilistic and particle attributes of single photons. Herein, we revolutionize how decision making is comprehended via a category theoretic viewpoint; we present the category theoretic foundation of the single-photon-based decision making, including quantitative analysis that agrees well with the experimental results. The category theoretic model unveils complex interdependencies of the entities of the subject matter in the most simplified manner, including a dynamically changing environment. In particular, the octahedral structure and the braid structure in triangulated categories provide clear understandings and quantitative metrics of the underlying mechanisms for the single-photon decision maker. This is the first demonstration of a category theoretic interpretation of decision making, and provides a solid understanding and a design fundamental for machine learning and artificial intelligence. Bridge is among the zero-sum games for which artificial intelligence has not yet outperformed expert human players. The main difficulty lies in the bidding phase of bridge, which requires cooperative decision making under partial information. Existing artificial intelligence systems for bridge bidding rely on and are thus restricted by human-designed bidding systems or features. In this work, we propose a pioneering bridge bidding system without the aid of human domain knowledge. The system is based on a novel deep reinforcement learning model, which extracts sophisticated features and learns to bid automatically based on raw card data. The model includes an upper-confidence-bound algorithm and additional techniques to achieve a balance between exploration and exploitation. Our experiments validate the promising performance of our proposed model. In particular, the model advances from having no knowledge about bidding to achieving superior performance when compared with a champion-winning computer bridge program that implements a human-designed bidding system. Analyses of text corpora over time can reveal trends in beliefs, interest, and sentiment about a topic. We focus on views expressed about artificial intelligence (AI) in the New York Times over a 30-year period. General interest, awareness, and discussion about AI has waxed and waned since the field was founded in 1956. We present a set of measures that captures levels of engagement, measures of pessimism and optimism, the prevalence of specific hopes and concerns, and topics that are linked to discussions about AI over decades. We find that discussion of AI has increased sharply since 2009, and that these discussions have been consistently more optimistic than pessimistic. However, when we examine specific concerns, we find that worries of loss of control of AI, ethical concerns for AI, and the negative impact of AI on work have grown in recent years. We also find that hopes for AI in healthcare and education have increased over time. In this work, we present and analyze reported failures of artificially intelligent systems and extrapolate our analysis to future AIs. We suggest that both the frequency and the seriousness of future AI failures will steadily increase. AI Safety can be improved based on ideas developed by cybersecurity experts. For narrow AIs safety failures are at the same, moderate, level of criticality as in cybersecurity, however for general AI, failures have a fundamentally different impact. A single failure of a superintelligent system may cause a catastrophic event without a chance for recovery. The goal of cybersecurity is to reduce the number of successful attacks on the system; the goal of AI Safety is to make sure zero attacks succeed in bypassing the safety mechanisms. Unfortunately, such a level of performance is unachievable. Every security system will eventually fail; there is no such thing as a 100% secure system. When an agent cannot represent a perfectly accurate model of its environment's dynamics, model-based reinforcement learning (MBRL) can fail catastrophically. Planning involves composing the predictions of the model; when flawed predictions are composed, even minor errors can compound and render the model useless for planning. Hallucinated Replay (Talvitie 2014) trains the model to "correct" itself when it produces errors, substantially improving MBRL with flawed models. This paper theoretically analyzes this approach, illuminates settings in which it is likely to be effective or ineffective, and presents a novel error bound, showing that a model's ability to self-correct is more tightly related to MBRL performance than one-step prediction error. These results inspire an MBRL algorithm for deterministic MDPs with performance guarantees that are robust to model class limitations. Superconducting circuit technologies have recently achieved quantum protocols involving closed feedback loops. Quantum artificial intelligence and quantum machine learning are emerging fields inside quantum technologies which may enable quantum devices to acquire information from the outer world and improve themselves via a learning process. Here we propose the implementation of basic protocols in quantum reinforcement learning, with superconducting circuits employing feedback-loop control. We introduce diverse scenarios for proof-of-principle experiments with state-of-the-art superconducting circuit technologies and analyze their feasibility in presence of imperfections. The field of quantum artificial intelligence implemented with superconducting circuits paves the way for enhanced quantum control and quantum computation protocols. We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results. Currently, Artificial Intelligence (AI) has won unprecedented attention and is becoming the increasingly popular focus in China. This change can be judged by the impressive record of academic publications, the amount of state-level investment and the presence of nation-wide participation and devotion. In this paper, we place emphasis on discussing the progress of artificial intelligence engineerings in China. We first introduce the focus on AI in Chinese academia, including the supercomputing brain system, Cambrian Period supercomputer of neural networks, and biometric systems. Then, the development of AI in industrial circles and the latest layout of AI products in companies, such as Baidu, Tencent, and Alibaba, are introduced. Last, we bring in the opinions and arguments of the main intelligentsia of China about the future development of AI, including how to examine the relationship between humanity on one side and science and technology on the other. While artificial intelligence (AI) has become widespread, many commercial AI systems are not yet accessible to individual researchers nor the general public due to the deep knowledge of the systems required to use them. We believe that AI has matured to the point where it should be an accessible technology for everyone. We present an ongoing project whose ultimate goal is to deliver an open source, user-friendly AI system that is specialized for machine learning analysis of complex data in the biomedical and health care domains. We discuss how genetic programming can aid in this endeavor, and highlight specific examples where genetic programming has automated machine learning analyses in previous projects. Artificial Intelligence (AI) is an effective science which employs strong enough approaches, methods, and techniques to solve unsolvable real world based problems. Because of its unstoppable rise towards the future, there are also some discussions about its ethics and safety. Shaping an AI friendly environment for people and a people friendly environment for AI can be a possible answer for finding a shared context of values for both humans and robots. In this context, objective of this paper is to address the ethical issues of AI and explore the moral dilemmas that arise from ethical algorithms, from pre set or acquired values. In addition, the paper will also focus on the subject of AI safety. As general, the paper will briefly analyze the concerns and potential solutions to solving the ethical issues presented and increase readers awareness on AI safety as another related research interest. Artificial Intelligence methods to solve continuous- control tasks have made significant progress in recent years. However, these algorithms have important limitations and still need significant improvement to be used in industry and real- world applications. This means that this area is still in an active research phase. To involve a large number of research groups, standard benchmarks are needed to evaluate and compare proposed algorithms. In this paper, we propose a physical environment benchmark framework to facilitate collaborative research in this area by enabling different research groups to integrate their designed benchmarks in a unified cloud-based repository and also share their actual implemented benchmarks via the cloud. We demonstrate the proposed framework using an actual implementation of the classical mountain-car example and present the results obtained using a Reinforcement Learning algorithm. Automatic photo aesthetic assessment is a challenging artificial intelligence task. Existing computational approaches have focused on modeling a single aesthetic score or a class (good or bad), however these do not provide any details on why the photograph is good or bad, or which attributes contribute to the quality of the photograph. To obtain both accuracy and human interpretation of the score, we advocate learning the aesthetic attributes along with the prediction of the overall score. For this purpose, we propose a novel multitask deep convolution neural network, which jointly learns eight aesthetic attributes along with the overall aesthetic score. We report near human performance in the prediction of the overall aesthetic score. To understand the internal representation of these attributes in the learned model, we also develop the visualization technique using back propagation of gradients. These visualizations highlight the important image regions for the corresponding attributes, thus providing insights about model's representation of these attributes. We showcase the diversity and complexity associated with different attributes through a qualitative analysis of the activation maps. Pharmaco-epidemiology (PE) is the study of uses and effects of drugs in well defined populations. As medico-administrative databases cover a large part of the population, they have become very interesting to carry PE studies. Such databases provide longitudinal care pathways in real condition containing timestamped care events, especially drug deliveries. Temporal pattern mining becomes a strategic choice to gain valuable insights about drug uses. In this paper we propose DCM, a new discriminant temporal pattern mining algorithm. It extracts chronicle patterns that occur more in a studied population than in a control population. We present results on the identification of possible associations between hospitalizations for seizure and anti-epileptic drug switches in care pathway of epileptic patients. Much research in artificial intelligence is concerned with the development of autonomous agents that can interact effectively with other agents. An important aspect of such agents is the ability to reason about the behaviours of other agents, by constructing models which make predictions about various properties of interest (such as actions, goals, beliefs) of the modelled agents. A variety of modelling approaches now exist which vary widely in their methodology and underlying assumptions, catering to the needs of the different sub-communities within which they were developed and reflecting the different practical uses for which they are intended. The purpose of the present article is to provide a comprehensive survey of the salient modelling methods which can be found in the literature. The article concludes with a discussion of open problems which may form the basis for fruitful future research. Business Process Management (BPM) is a central element of today organizations. Despite over the years its main focus has been the support of processes in highly controlled domains, nowadays many domains of interest to the BPM community are characterized by ever-changing requirements, unpredictable environments and increasing amounts of data that influence the execution of process instances. Under such dynamic conditions, BPM systems must increase their level of automation to provide the reactivity and flexibility necessary for process management. On the other hand, the Artificial Intelligence (AI) community has concentrated its efforts on investigating dynamic domains that involve active control of computational entities and physical devices (e.g., robots, software agents, etc.). In this context, Automated Planning, which is one of the oldest areas in AI, is conceived as a model-based approach to synthesize autonomous behaviours in automated way from a model. In this paper, we discuss how automated planning techniques can be leveraged to enable new levels of automation and support for business processing, and we show some concrete examples of their successful application to the different stages of the BPM life cycle. Artificial intelligence (AI) has achieved superhuman performance in a growing number of tasks, but understanding and explaining AI remain challenging. This paper clarifies the connections between machine-learning algorithms to develop AIs and the econometrics of dynamic structural models through the case studies of three famous game AIs. Chess-playing Deep Blue is a calibrated value function, whereas shogi-playing Bonanza is an estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy network" is a deep neural network implementation of Hotz and Miller's (1993) conditional choice probability estimation; its "reinforcement-learning value network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) conditional choice simulation method. Relaxing these AIs' implicit econometric assumptions would improve their structural interpretability. Artificial intelligence (AI) is intrinsically data-driven. It calls for the application of statistical concepts through human-machine collaboration during generation of data, development of algorithms, and evaluation of results. This paper discusses how such human-machine collaboration can be approached through the statistical concepts of population, question of interest, representativeness of training data, and scrutiny of results (PQRS). The PQRS workflow provides a conceptual framework for integrating statistical ideas with human input into AI products and research. These ideas include experimental design principles of randomization and local control as well as the principle of stability to gain reproducibility and interpretability of algorithms and data results. We discuss the use of these principles in the contexts of self-driving cars, automated medical diagnoses, and examples from the authors' collaborative research. Computation has changed the world more than any previous expressions of knowledge. In its particular algorithmic embodiment, it offers a perspective, within which the digital computer (one of many possible) exercises a role reminiscent of theology. Since it is closed to meaning, algorithmic digital computation can at most mimic the creative aspects of life. AI, in the perspective of time, proved to be less an acronym for artificial intelligence and more of automating tasks associated with intelligence. The entire development led to the hypostatized role of the machine: outputting nothing else but reality, including that of the humanity that made the machine happen. The convergence machine called deep learning is only the latest form through which the deterministic theology of the machine claims more than what extremely effective data processing actually is. A new understanding of complexity, as well as the need to distinguish between the reactive nature of the artificial and the anticipatory nature of the living are suggested as practical responses to the challenges posed by machine theology. Artificial intelligence (AI) is an extensive scientific discipline which enables computer systems to solve problems by emulating complex biological processes such as learning, reasoning and self-correction. This paper presents a comprehensive review of the application of AI techniques for improving performance of optical communication systems and networks. The use of AI-based techniques is first studied in applications related to optical transmission, ranging from the characterization and operation of network components to performance monitoring, mitigation of nonlinearities, and quality of transmission estimation. Then, applications related to optical network control and management are also reviewed, including topics like optical network planning and operation in both transport and access networks. Finally, the paper also presents a summary of opportunities and challenges in optical networking where AI is expected to play a key role in the near future. This essay discusses whether computers, using Artificial Intelligence (AI), could create art. The first part concerns AI-based tools for assisting with art making. The history of technologies that automated aspects of art is covered, including photography and animation. In each case, we see initial fears and denial of the technology, followed by a blossoming of new creative and professional opportunities for artists. The hype and reality of Artificial Intelligence (AI) tools for art making is discussed, together with predictions about how AI tools will be used. The second part speculates about whether it could ever happen that AI systems could conceive of artwork, and be credited with authorship of an artwork. It is theorized that art is something created by social agents, and so computers cannot be credited with authorship of art in our current understanding. A few ways that this could change are also hypothesized. In order for people to be able to trust and take advantage of the results of advanced machine learning and artificial intelligence solutions for real decision making, people need to be able to understand the machine rationale for given output. Research in explain artificial intelligence (XAI) addresses the aim, but there is a need for evaluation of human relevance and understandability of explanations. Our work contributes a novel methodology for evaluating the quality or human interpretability of explanations for machine learning models. We present an evaluation benchmark for instance explanations from text and image classifiers. The explanation meta-data in this benchmark is generated from user annotations of image and text samples. We describe the benchmark and demonstrate its utility by a quantitative evaluation on explanations generated from a recent machine learning algorithm. This research demonstrates how human-grounded evaluation could be used as a measure to qualify local machine-learning explanations. Reasoning systems with too simple a model of the world and human intent are unable to consider potential negative side effects of their actions and modify their plans to avoid them (e.g., avoiding potential errors). However, hand-encoding the enormous and subtle body of facts that constitutes common sense into a knowledge base has proved too difficult despite decades of work. Distributed semantic vector spaces learned from large text corpora, on the other hand, can learn representations that capture shades of meaning of common-sense concepts and perform analogical and associational reasoning in ways that knowledge bases are too rigid to perform, by encoding concepts and the relations between them as geometric structures. These have, however, the disadvantage of being unreliable, poorly understood, and biased in their view of the world by the source material. This chapter will discuss how these approaches may be combined in a way that combines the best properties of each for understanding the world and human intentions in a richer way. There are several distinct failure modes for overoptimization of systems on the basis of metrics. This occurs when a metric which can be used to improve a system is used to an extent that further optimization is ineffective or harmful, and is sometimes termed Goodhart's Law. This class of failure is often poorly understood, partly because terminology for discussing them is ambiguous, and partly because discussion using this ambiguous terminology ignores distinctions between different failure modes of this general type. This paper expands on an earlier discussion by Garrabrant, which notes there are "(at least) four different mechanisms" that relate to Goodhart's Law. This paper is intended to explore these mechanisms further, and specify more clearly how they occur. This discussion should be helpful in better understanding these types of failures in economic regulation, in public policy, in machine learning, and in Artificial Intelligence alignment. The importance of Goodhart effects depends on the amount of power directed towards optimizing the proxy, and so the increased optimization power offered by artificial intelligence makes it especially critical for that field. Self-replication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train self-replicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradient-based or non-gradient-based methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a self-replicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a self-replicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a trade-off between the network's ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the trade-off between reproduction and other tasks observed in nature. We suggest that a self-replication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection. In recent years, the increased demand for dynamic management of network resources in modern computer networks in general and in today's data centers in particular has resulted in a new promising architecture, in which a more flexible controlling functionalities can be achieved with high level of abstraction. In software defined networking (SDN) architecture, a central management of the forwarding elements (i.e. switches and routers) is accomplished by a central unit, which can be programmed directly to perform fundamental networking tasks or implementing any other additional services. Combining both central management and network programmability, opens the door to employ more advanced techniques such as artificial intelligence (AI) in order to deal with high-demand and rapidly-changing networks. In this study, we provide a detailed overview of current efforts and recent advancements to include AI in SDN-based networks. Attacks to networks are becoming more complex and sophisticated every day. Beyond the so-called script-kiddies and hacking newbies, there is a myriad of professional attackers seeking to make serious profits infiltrating in corporate networks. Either hostile governments, big corporations or mafias are constantly increasing their resources and skills in cybercrime in order to spy, steal or cause damage more effectively. traditional approaches to Network Security seem to start hitting their limits and it is being recognized the need for a smarter approach to threat detections. This paper provides an introduction on the need for evolution of Cyber Security techniques and how Artificial Intelligence could be of application to help solving some of the problems. It provides also, a high-level overview of some state of the art AI Network Security techniques, to finish analysing what is the foreseeable future of the application of AI to Network Security. This essay examines how what is considered to be artificial intelligence (AI) has changed over time and come to intersect with the expertise of the author. Initially, AI developed on a separate trajectory, both topically and institutionally, from pattern recognition, neural information processing, decision and control systems, and allied topics by focusing on symbolic systems within computer science departments rather than on continuous systems in electrical engineering departments. The separate evolutions continued throughout the author's lifetime, with some crossover in reinforcement learning and graphical models, but were shocked into converging by the virality of deep learning, thus making an electrical engineer into an AI researcher. Now that this convergence has happened, opportunity exists to pursue an agenda that combines learning and reasoning bridged by interpretable machine learning models. In recent years, China, the United States and other countries, Google and other high-tech companies have increased investment in artificial intelligence. Deep learning is one of the current artificial intelligence research's key areas. This paper analyzes and summarizes the latest progress and future research directions of deep learning. Firstly, three basic models of deep learning are outlined, including multilayer perceptrons, convolutional neural networks, and recurrent neural networks. On this basis, we further analyze the emerging new models of convolution neural networks and recurrent neural networks. This paper then summarizes deep learning's applications in many areas of artificial intelligence, including voice, computer vision, natural language processing and so on. Finally, this paper discusses the existing problems of deep learning and gives the corresponding possible solutions. One of the common artificial intelligence applications in electronic games consists of making an artificial agent learn how to execute some determined task successfully in a game environment. One way to perform this task is through machine learning algorithms capable of learning the sequence of actions required to win in a given game environment. There are several supervised learning techniques able to learn the correct answer for a problem through examples. However, when learning how to play electronic games, the correct answer might only be known by the end of the game, after all the actions were already taken. Thus, not being possible to measure the accuracy of each individual action to be taken at each time step. A way for dealing with this problem is through Neuroevolution, a method which trains Artificial Neural Networks using evolutionary algorithms. In this article, we introduce a framework for testing optimization algorithms with artificial agent controllers in electronic games, called EvoMan, which is inspired in the action-platformer game Mega Man II. The environment can be configured to run in different experiment modes, as single evolution, coevolution and others. To demonstrate some challenges regarding the proposed platform, as initial experiments we applied Neuroevolution using Genetic Algorithms and the NEAT algorithm, in the context of competitively coevolving two distinct agents in this game. In this paper we describe a web search agent, called Global Search Agent (hereafter GSA for short). GSA integrates and enhances several search techniques in order to achieve significant improvements in the user-perceived quality of delivered information as compared to usual web search engines. GSA features intelligent merging of relevant documents from different search engines, anticipated selective exploration and evaluation of links from the current result set, automated derivation of refined queries based on user relevance feedback. System architecture as well as experimental accounts are also illustrated. Consumers of mass media must have a comprehensive, balanced and plural selection of news to get an unbiased perspective; but achieving this goal can be very challenging, laborious and time consuming. News stories development over time, its (in)consistency, and different level of coverage across the media outlets are challenges that a conscientious reader has to overcome in order to alleviate bias. In this paper we present an intelligent agent framework currently facilitating analysis of the main sources of on-line news in El Salvador. We show how prior tools of text analysis and Web 2.0 technologies can be combined with minimal manual intervention to help individuals on their rational decision process, while holding media outlets accountable for their work. Computational techniques have shown much promise in the field of Finance, owing to their ability to extract sense out of dauntingly complex systems. This paper reviews the most promising of these techniques, from traditional computational intelligence methods to their machine learning siblings, with particular view to their application in optimising the management of a portfolio of financial instruments. The current state of the art is assessed, and prospective further work is assessed and recommended The four intensive problems to the software rose by the software industry .i.e., User System Communication / Human Machine Interface, Meta Data extraction, Information processing & management and Data representation are discussed in this research paper. To contribute in the field we have proposed and described an intelligent semantic oriented agent based search engine including the concepts of intelligent graphical user interface, natural language based information processing, data management and data reconstruction for the final user end information representation. There is a growing trend towards the convergence of cyber-physical systems (CPS) and social computing, which will lead to the emergence of smart communities composed of various objects (including both human individuals and physical things) that interact and cooperate with each other. These smart communities promise to enable a number of innovative applications and services that will improve the quality of life. This position paper addresses some opportunities and challenges of building smart communities characterized by cyber-physical and social intelligence. Nurse scheduling is a difficult optimization problem with multiple constraints. There is extensive research in the literature solving the problem using meta-heuristics approaches. In this paper, we will investigate an intelligent search heuristics that handles cost based scheduling problem. The heuristics demonstrated superior performances compared to the original algorithms used to solve the problems described in Li et. Al. (2003) and Ozkarahan (1989) in terms of time needed to establish a feasible solution. Both problems can be formulated as a cost problem. The search heuristic consists of several phrases of search and input based on the cost of each assignment and how the assignment will interact with the cost of the resources. With the development of robotics and artificial intelligence field unceasingly thorough, path planning as an important field of robot calculation has been widespread concern. This paper analyzes the current development of robot and path planning algorithm and focuses on the advantages and disadvantages of the traditional intelligent path planning as well as the path planning. The problem of mobile robot path planning is studied by using ant colony algorithm, and it also provides some solving methods. The development of intelligent machines is one of the biggest unsolved challenges in computer science. In this paper, we propose some fundamental properties these machines should have, focusing in particular on communication and learning. We discuss a simple environment that could be used to incrementally teach a machine the basics of natural-language-based communication, as a prerequisite to more complex interaction with human users. We also present some conjectures on the sort of algorithms the machine should support in order to profitably learn from the environment. Wikipedia, the world largest encyclopedia contains a lot of knowledge that is expressed as formulae exclusively. Unfortunately, this knowledge is currently not fully accessible by intelligent information retrieval systems. This immense body of knowledge is hidden form value-added services, such as search. In this paper, we present our MathSearch implementation for Wikipedia that enables users to perform a combined text and fully unlock the potential benefits. Modality is one of the important components of grammar in linguistics. It lets speaker to express attitude towards, or give assessment or potentiality of state of affairs. It implies different senses and thus has different perceptions as per the context. This paper presents an account showing the gap in the functionality of the current state of art Natural Language Processing (NLP) systems. The contextual nature of linguistic modality is studied. In this paper, the works and logical approaches employed by Natural Language Processing systems dealing with modality are reviewed. It sees human cognition and intelligence as multi-layered approach that can be implemented by intelligent systems for learning. Lastly, current flow of research going on within this field is talked providing futurology. In this paper, the idea of applying Computational Intelligence in the process of creation board games, in particular mazes, is presented. For two different algorithms the proposed idea has been examined. The results of the experiments are shown and discussed to present advantages and disadvantages. We present a new social animal inspired emotional swarm intelligence technique. This technique is used to solve a variant of the popular collective robots problem called foraging. We show with a simulation study how simple interaction rules based on sensations like hunger and loneliness can lead to globally coherent emergent behavior which allows sensor constrained robots to solve the given problem Multiplayer Online Battle Arena (MOBA) is one of the most played game genres nowadays. With the increasing growth of this genre, it becomes necessary to develop effective intelligent agents to play alongside or against human players. In this paper we address the problem of agent development for MOBA games. We implement a two-layered architecture agent that handles both navigation and game mechanics. This architecture relies on the use of Influence Maps, a widely used approach for tactical analysis. Several experiments were performed using {\em League of Legends} as a testbed, and show promising results in this highly dynamic real-time context. An elementary mathematical example proves, thanks to the Routh-Hurwitz criterion, a result that is intriguing with respect to today's practical understanding of model-free control, i.e., an "intelligent" proportional controller (iP) may turn to be more difficult to tune than an intelligent proportional-derivative one (iPD). The vast superiority of iPDs when compared to classic PIDs is shown via computer simulations. The introduction as well as the conclusion analyse model-free control in the light of recent advances. In this paper we present a new form of access to knowledge through what we call "hypermediator websites". These hypermediator sites are intermediate between information devices that just scan the book culture and a "real" hypertext writing format. This paper examines common assumptions regarding the decision-making internal environment for intelligent agents and investigates issues related to processing of memory and belief states to help obtain better understanding of the responses. In specific, we consider order effects and discuss both classical and non-classical explanations for them. We also consider implicit cognition and explore if certain inaccessible states may be best modeled as quantum states. We propose that the hypothesis that quantum states are at the basis of order effects be tested on large databases such as those related to medical treatment and drug efficacy. A problem involving a maze network is considered and comparisons made between classical and quantum decision scenarios for it. Decision-making is a process of choosing among alternative courses of action for solving complicated problems where multi-criteria objectives are involved. The past few years have witnessed a growing recognition of Soft Computing (SC) technologies that underlie the conception, design and utilization of intelligent systems. In this paper, we present different SC paradigms involving an artificial neural network trained using the scaled conjugate gradient algorithm, two different fuzzy inference methods optimised using neural network learning/evolutionary algorithms and regression trees for developing intelligent decision support systems. We demonstrate the efficiency of the different algorithms by developing a decision support system for a Tactical Air Combat Environment (TACE). Some empirical comparisons between the different algorithms are also provided. Domain experts should provide relevant domain knowledge to an Intelligent Tutoring System (ITS) so that it can guide a learner during problemsolving learning activities. However, for many ill-defined domains, the domain knowledge is hard to define explicitly. In previous works, we showed how sequential pattern mining can be used to extract a partial problem space from logged user interactions, and how it can support tutoring services during problem-solving exercises. This article describes an extension of this approach to extract a problem space that is richer and more adapted for supporting tutoring services. We combined sequential pattern mining with (1) dimensional pattern mining (2) time intervals, (3) the automatic clustering of valued actions and (4) closed sequences mining. Some tutoring services have been implemented and an experiment has been conducted in a tutoring system. Eudaemonics is the study of the nature, causes, and conditions of human well-being. According to the ethical theory of eudaemonia, reaping satisfaction and fulfillment from life is not only a desirable end, but a moral responsibility. However, in modern society, many individuals struggle to meet this responsibility. Computational mechanisms could better enable individuals to achieve eudaemonia by yielding practical real-world systems that embody algorithms that promote human flourishing. This article presents eudaemonic systems as the evolutionary goal of the present day recommender system. The study is from a base of accident scenarii in rail transport (feedback) in order to develop a tool to share build and sustain knowledge and safety and secondly to exploit the knowledge stored to prevent the reproduction of accidents / incidents. This tool should ultimately lead to the proposal of prevention and protection measures to minimize the risk level of a new transport system and thus to improve safety. The approach to achieving this goal largely depends on the use of artificial intelligence techniques and rarely the use of a method of automatic learning in order to develop a feasibility model of a software tool based on case based reasoning (CBR) to exploit stored knowledge in order to create know-how that can help stimulate domain experts in the task of analysis, evaluation and certification of a new system. Expert systems use human knowledge often stored as rules within the computer to solve problems that generally would entail human intelligence. Today, with information systems turning out to be more pervasive and with the myriad advances in information technologies, automating computer fault diagnosis is becoming so fundamental that soon every enterprise has to endorse it. This paper proposes an expert system called Expert PC Troubleshooter for diagnosing computer problems. The system is composed of a user interface, a rule-base, an inference engine, and an expert interface. Additionally, the system features a fuzzy-logic module to troubleshoot POST beep errors, and an intelligent agent that assists in the knowledge acquisition process. The proposed system is meant to automate the maintenance, repair, and operations (MRO) process, and free-up human technicians from manually performing routine, laborious, and timeconsuming maintenance tasks. As future work, the proposed system is to be parallelized so as to boost its performance and speed-up its various operations. This paper presents formulae that can solve various seemingly hopeless philosophical conundrums. We discuss the simulation argument, teleportation, mind-uploading, the rationality of utilitarianism, and the ethics of exploiting artificial general intelligence. Our approach arises from combining the essential ideas of formalisms such as algorithmic probability, the universal intelligence measure, space-time-embedded intelligence, and Hutter's observer localization. We argue that such universal models can yield the ultimate solutions, but a novel research direction would be required in order to find computationally efficient approximations thereof. The recently introduced theory of practopoiesis offers an account on how adaptive intelligent systems are organized. According to that theory biological agents adapt at three levels of organization and this structure applies also to our brains. This is referred to as tri-traversal theory of the organization of mind or for short, a T3-structure. To implement a similar T3-organization in an artificially intelligent agent, it is necessary to have multiple policies, as usually used as a concept in the theory of reinforcement learning. These policies have to form a hierarchy. We define adaptive practopoietic systems in terms of hierarchy of policies and calculate whether the total variety of behavior required by real-life conditions of an adult human can be satisfactorily accounted for by a traditional approach to artificial intelligence based on T2-agents, or whether a T3-agent is needed instead. We conclude that the complexity of real life can be dealt with appropriately only by a T3-agent. Humans dispose of two intertwined information processing pathways, cognitive information processing via neural firing patterns and diffusive volume control via neuromodulation. The cognitive information processing in the brain is traditionally considered to be the prime neural correlate of human intelligence, clinical studies indicate that human emotions intrinsically correlate with the activation of the neuromodulatory system. We examine here the question: Why do humans dispose of the diffusive emotional control system? Is this a coincidence, a caprice of nature, perhaps a leftover of our genetic heritage, or a necessary aspect of any advanced intelligence, being it biological or synthetic? We argue here that emotional control is necessary to solve the motivational problem, viz the selection of short-term utility functions, in the context of an environment where information, computing power and time constitute scarce resources. Intelligent systems for the annotation of media content are increasingly being used for the automation of parts of social science research. In this domain the problem of integrating various Artificial Intelligence (AI) algorithms into a single intelligent system arises spontaneously. As part of our ongoing effort in automating media content analysis for the social sciences, we have built a modular system by combining multiple AI modules into a flexible framework in which they can cooperate in complex tasks. Our system combines data gathering, machine translation, topic classification, extraction and annotation of entities and social networks, as well as many other tasks that have been perfected over the past years of AI research. Over the last few years, it has allowed us to realise a series of scientific studies over a vast range of applications including comparative studies between news outlets and media content in different countries, modelling of user preferences, and monitoring public mood. The framework is flexible and allows the design and implementation of modular agents, where simple modules cooperate in the annotation of a large dataset without central coordination. Over the past decade, AI has made a remarkable progress. It is agreed that this is due to the recently revived Deep Learning technology. Deep Learning enables to process large amounts of data using simplified neuron networks that simulate the way in which the brain works. However, there is a different point of view, which posits that the brain is processing information, not data. This unresolved duality hampered AI progress for years. In this paper, I propose a notion of Integrated information that hopefully will resolve the problem. I consider integrated information as a coupling between two separate entities - physical information (that implies data processing) and semantic information (that provides physical information interpretation). In this regard, intelligence becomes a product of information processing. Extending further this line of thinking, it can be said that information processing does not require more a human brain for its implementation. Indeed, bacteria and amoebas exhibit intelligent behavior without any sign of a brain. That dramatically removes the need for AI systems to emulate the human brain complexity! The paper tries to explore this shift in AI systems design philosophy. There is both much optimism and pessimism around artificial intelligence (AI) today. The optimists are investing millions of dollars, and even in some cases billions of dollars into AI. The pessimists, on the other hand, predict that AI will end many things: jobs, warfare, and even the human race. Both the optimists and the pessimists often appeal to the idea of a technological singularity, a point in time where machine intelligence starts to run away, and a new, more intelligent species starts to inhabit the earth. If the optimists are right, this will be a moment that fundamentally changes our economy and our society. If the pessimists are right, this will be a moment that also fundamentally changes our economy and our society. It is therefore very worthwhile spending some time deciding if either of them might be right. Developing Intelligent Systems involves artificial intelligence approaches including artificial neural networks. Here, we present a tutorial of Deep Neural Networks (DNNs), and some insights about the origin of the term "deep"; references to deep learning are also given. Restricted Boltzmann Machines, which are the core of DNNs, are discussed in detail. An example of a simple two-layer network, performing unsupervised learning for unlabeled data, is shown. Deep Belief Networks (DBNs), which are used to build networks with more than two layers, are also described. Moreover, examples for supervised learning with DNNs performing simple prediction and classification tasks, are presented and explained. This tutorial includes two intelligent pattern recognition applications: hand- written digits (benchmark known as MNIST) and speech recognition. Meaning has been called the "holy grail" of a variety of scientific disciplines, ranging from linguistics to philosophy, psychology and the neurosciences. The field of Artifical Intelligence (AI) is very much a part of that list: the development of sophisticated natural language semantics is a sine qua non for achieving a level of intelligence comparable to humans. Embodiment theories in cognitive science hold that human semantic representation depends on sensori-motor experience; the abundant evidence that human meaning representation is grounded in the perception of physical reality leads to the conclusion that meaning must depend on a fusion of multiple (perceptual) modalities. Despite this, AI research in general, and its subdisciplines such as computational linguistics and computer vision in particular, have focused primarily on tasks that involve a single modality. Here, we propose virtual embodiment as an alternative, long-term strategy for AI research that is multi-modal in nature and that allows for the kind of scalability required to develop the field coherently and incrementally, in an ethically responsible fashion. What is the nature of curiosity? Is there any scientific way to understand the origin of this mysterious force that drives the behavior of even the stupidest naturally intelligent systems and is completely absent in their smartest artificial analogs? Can we build AI systems that could be curious about something, systems that would have an intrinsic motivation to learn? Is such a motivation quantifiable? Is it implementable? I will discuss this problem from the standpoint of physics. The relationship between physics and intelligence is a consequence of the fact that correctly predicted information is nothing but an energy resource, and the process of thinking can be viewed as a process of accumulating and spending this resource through the acts of perception and, respectively, decision making. The natural motivation of any autonomous system to keep this accumulation/spending balance as high as possible allows one to treat the problem of describing the dynamics of thinking processes as a resource optimization problem. Here I will propose and discuss a simple theoretical model of such an autonomous system which I call the Autonomous Turing Machine (ATM). The potential attractiveness of ATM lies in the fact that it is the model of a self-propelled AI for which the only available energy resource is the information itself. For ATM, the problem of optimal thinking, learning, and decision-making becomes conceptually simple and mathematically well tractable. This circumstance makes the ATM an ideal playground for studying the dynamics of intelligent behavior and allows one to quantify many seemingly unquantifiable features of genuine intelligence. Over the last decade, a new idea challenging the classical self-non-self viewpoint has become popular amongst immunologists. It is called the Danger Theory. In this conceptual paper, we look at this theory from the perspective of Artificial Immune System practitioners. An overview of the Danger Theory is presented with particular emphasis on analogies in the Artificial Immune Systems world. A number of potential application areas are then used to provide a framing for a critical assessment of the concept, and its relevance for Artificial Immune Systems. Artificial Immune Systems have been used successfully to build recommender systems for film databases. In this research, an attempt is made to extend this idea to web site recommendation. A collection of more than 1000 individuals web profiles (alternatively called preferences / favourites / bookmarks file) will be used. URLs will be classified using the DMOZ (Directory Mozilla) database of the Open Directory Project as our ontology. This will then be used as the data for the Artificial Immune Systems rather than the actual addresses. The first attempt will involve using a simple classification code number coupled with the number of pages within that classification code. However, this implementation does not make use of the hierarchical tree-like structure of DMOZ. Consideration will then be given to the construction of a similarity measure for web profiles that makes use of this hierarchical information to build a better-informed Artificial Immune System. The human immune system protects the human body against various pathogens like e.g. biological viruses and bacteria. Artificial immune systems reuse the architecture, organization, and workflows of the human immune system for various problems in computer science. In the network security, the artificial immune system is used to secure a network and its nodes against intrusions like viruses, worms, and trojans. However, these approaches are far away from production where they are academic proof-of-concept implementations or use only a small part to protect against a certain intrusion. This article discusses the required steps to bring artificial immune systems into production in the network security domain. It furthermore figures out the challenges and provides the description and results of the prototype of an artificial immune system, which is SANA called. The immune system provides a rich metaphor for computer security: anomaly detection that works in nature should work for machines. However, early artificial immune system approaches for computer security had only limited success. Arguably, this was due to these artificial systems being based on too simplistic a view of the immune system. We present here a second generation artificial immune system for process anomaly detection. It improves on earlier systems by having different artificial cell types that process information. Following detailed information about how to build such second generation systems, we find that communication between cells types is key to performance. Through realistic testing and validation we show that second generation artificial immune systems are capable of anomaly detection beyond generic system policies. The paper concludes with a discussion and outline of the next steps in this exciting area of computer security. The semi-automatic or automatic synthesis of robot controller software is both desirable and challenging. Synthesis of rather simple behaviors such as collision avoidance by applying artificial evolution has been shown multiple times. However, the difficulty of this synthesis increases heavily with increasing complexity of the task that should be performed by the robot. We try to tackle this problem of complexity with Artificial Homeostatic Hormone Systems (AHHS), which provide both intrinsic, homeostatic processes and (transient) intrinsic, variant behavior. By using AHHS the need for pre-defined controller topologies or information about the field of application is minimized. We investigate how the principle design of the controller and the hormone network size affects the overall performance of the artificial evolution (i.e., evolvability). This is done by comparing two variants of AHHS that show different effects when mutated. We evolve a controller for a robot built from five autonomous, cooperating modules. The desired behavior is a form of gait resulting in fast locomotion by using the modules' main hinges. When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models. Artificial fish swarm algorithm (AFSA) is one of the swarm intelligence optimization algorithms that works based on population and stochastic search. In order to achieve acceptable result, there are many parameters needs to be adjusted in AFSA. Among these parameters, visual and step are very significant in view of the fact that artificial fish basically move based on these parameters. In standard AFSA, these two parameters remain constant until the algorithm termination. Large values of these parameters increase the capability of algorithm in global search, while small values improve the local search ability of the algorithm. In this paper, we empirically study the performance of the AFSA and different approaches to balance between local and global exploration have been tested based on the adaptive modification of visual and step during algorithm execution. The proposed approaches have been evaluated based on the four well-known benchmark functions. Experimental results show considerable positive impact on the performance of AFSA. We survey concepts at the frontier of research connecting artificial, animal and human cognition to computation and information processing---from the Turing test to Searle's Chinese Room argument, from Integrated Information Theory to computational and algorithmic complexity. We start by arguing that passing the Turing test is a trivial computational problem and that its pragmatic difficulty sheds light on the computational nature of the human mind more than it does on the challenge of artificial intelligence. We then review our proposed algorithmic information-theoretic measures for quantifying and characterizing cognition in various forms. These are capable of accounting for known biases in human behavior, thus vindicating a computational algorithmic view of cognition as first suggested by Turing, but this time rooted in the concept of algorithmic probability, which in turn is based on computational universality while being independent of computational model, and which has the virtue of being predictive and testable as a model theory of cognitive behavior. In a 1950 article in Mind, decades before the existence of anything resembling an artificial intelligence system, Alan Turing addressed the question of how to test whether machines can think, or in modern terminology, whether a computer claimed to exhibit intelligence indeed does so. The current paper raises the analogous issue for olfaction: how to test the validity of a system claimed to reproduce arbitrary odors artificially, in a way recognizable to humans, in face of the unavailability of a general naming method for odors. Although odor reproduction systems are still far from being viable, the question of how to test candidates thereof is claimed to be interesting and nontrivial, and a novel method is proposed. To some extent, the method is inspired by Turing`s test for AI, in that it involves a human challenger and the real and artificial entities, yet it is very different: our test is conditional, requiring from the artificial no more than is required from the original, and it employs a novel method of immersion that takes advantage of the availability of near-perfect reproduction methods for sight and sound. It is expected that progress toward true artificial intelligence will be achieved through the emergence of a system that integrates representation learning and complex reasoning (LeCun et al. 2015). In response to this prediction, research has been conducted on implementing the symbolic reasoning of a von Neumann computer in an artificial neural network (Graves et al. 2016; Graves et al. 2014; Reed et al. 2015). However, these studies have many limitations in realizing neural-symbolic integration (Jaeger. 2016). Here, we present a new learning paradigm: a learning solving procedure (LSP) that learns the procedure for solving complex problems. This is not accomplished merely by learning input-output data, but by learning algorithms through a solving procedure that obtains the output as a sequence of tasks for a given input problem. The LSP neural network system not only learns simple problems of addition and multiplication, but also the algorithms of complicated problems, such as complex arithmetic expression, sorting, and Hanoi Tower. To realize this, the LSP neural network structure consists of a deep neural network and long short-term memory, which are recursively combined. Through experimentation, we demonstrate the efficiency and scalability of LSP and its validity as a mechanism of complex reasoning. On markets with receding prices, artificial noise traders may consider alternatives to buy-and-hold. By simulating variations of the Parrondo strategy, using real data from the Swedish stock market, we produce first indications of a buy-low-sell-random Parrondo variation outperforming buy-and-hold. Subject to our assumptions, buy-low-sell-random also outperforms the traditional value and trend investor strategies. We measure the success of the Parrondo variations not only through their performance compared to other kinds of strategies, but also relative to varying levels of perfect information, received through messages within a multi-agent system of artificial traders. This article underlines the learning and discrimination capabilities of a model of associative memory based on artificial networks of spiking neurons. Inspired from neuropsychology and neurobiology, the model implements top-down modulations, as in neocortical layer V pyramidal neurons, with a learning rule based on synaptic plasticity (STDP), for performing a multimodal association learning task. A temporal correlation method of analysis proves the ability of the model to associate specific activity patterns to different samples of stimulation. Even in the absence of initial learning and with continuously varying weights, the activity patterns become stable enough for discrimination. We apply the Artificial Immune System (AIS) technology to the Collaborative Filtering (CF) technology when we build the movie recommendation system. Two different affinity measure algorithms of AIS, Kendall tau and Weighted Kappa, are used to calculate the correlation coefficients for this movie recommendation system. From the testing we think that Weighted Kappa is more suitable than Kendall tau for movie problems. The Dendritic Cell algorithm (DCA) is inspired by recent work in innate immunity. In this paper a formal description of the DCA is given. The DCA is described in detail, and its use as an anomaly detector is illustrated within the context of computer security. A port scan detection task is performed to substantiate the influence of signal selection on the behaviour of the algorithm. Experimental results provide a comparison of differing input signal mappings. Dendritic Cells (DCs) are innate immune system cells which have the power to activate or suppress the immune system. The behaviour of human of human DCs is abstracted to form an algorithm suitable for anomaly detection. We test this algorithm on the real-time problem of port scan detection. Our results show a significant difference in artificial DC behaviour for an outgoing portscan when compared to behaviour for normal processes. This paper illustrates successful implementation of three evolutionary algorithms, namely- Particle Swarm Optimization(PSO), Artificial Bee Colony (ABC) and Bacterial Foraging Optimization (BFO) algorithms to economic load dispatch problem (ELD). Power output of each generating unit and optimum fuel cost obtained using all three algorithms have been compared. The results obtained show that ABC and BFO algorithms converge to optimal fuel cost with reduced computational time when compared to PSO for the two example problems considered. The paper presents the electronic design and motion planning of a robot based on decision making regarding its straight motion and precise turn using Artificial Neural Network (ANN). The ANN helps in learning of robot so that it performs motion autonomously. The weights calculated are implemented in microcontroller. The performance has been tested to be excellent. There is a need for new metaphors from immunology to flourish the application areas of Artificial Immune Systems. A metaheuristic called Obesity Heuristic derived from advances in obesity treatment is proposed. The main forces of the algorithm are the generation omega-6 and omega-3 fatty acids. The algorithm works with Just-In-Time philosophy; by starting only when desired. A case study of data cleaning is provided. With experiments conducted on standard tables, results show that Obesity Heuristic outperforms other algorithms, with 100% recall. This is a great improvement over other algorithms This paper describes a new model for an artificial neural network processing unit or neuron. It is slightly different to a traditional feedforward network by the fact that it favours a mechanism of trying to match the wave-like 'shape' of the input with the shape of the output against specific value error corrections. The expectation is then that a best fit shape can be transposed into the desired output values more easily. This allows for notions of reinforcement through resonance and also the construction of synapses. This paper motivates the study of decision theory as necessary for aligning smarter-than-human artificial systems with human interests. We discuss the shortcomings of two standard formulations of decision theory, and demonstrate that they cannot be used to describe an idealized decision procedure suitable for approximation by artificial systems. We then explore the notions of policy selection and logical counterfactuals, two recent insights into decision theory that point the way toward promising paths for future research. It is well known that artificial neural networks (ANNs) can learn deterministic automata. Learning nondeterministic automata is another matter. This is important because much of the world is nondeterministic, taking the form of unpredictable or probabilistic events that must be acted upon. If ANNs are to engage such phenomena, then they must be able to learn how to deal with nondeterminism. In this project the game of Pong poses a nondeterministic environment. The learner is given an incomplete view of the game state and underlying deterministic physics, resulting in a nondeterministic game. Three models were trained and tested on the game: Mona, Elman, and Numenta's NuPIC. We present a position paper advocating the notion that Stoic philosophy and ethics can inform the development of ethical A.I. systems. This is in sharp contrast to most work on building ethical A.I., which has focused on Utilitarian or Deontological ethical theories. We relate ethical A.I. to several core Stoic notions, including the dichotomy of control, the four cardinal virtues, the ideal Sage, Stoic practices, and Stoic perspectives on emotion or affect. More generally, we put forward an ethical view of A.I. that focuses more on internal states of the artificial agent rather than on external actions of the agent. We provide examples relating to near-term A.I. systems as well as hypothetical superintelligent agents. Self-organization has been an important concept within a number of disciplines, which Artificial Life (ALife) also has heavily utilized since its inception. The term and its implications, however, are often confusing or misinterpreted. In this work, we provide a mini-review of self-organization and its relationship with ALife, aiming at initiating discussions on this important topic with the interested audience. We first articulate some fundamental aspects of self-organization, outline its usage, and review its applications to ALife within its soft, hard, and wet domains. We also provide perspectives for further research. Autonomous lifelong development and learning is a fundamental capability of humans, differentiating them from current deep learning systems. However, other branches of artificial intelligence have designed crucial ingredients towards autonomous learning: curiosity and intrinsic motivation, social learning and natural interaction with peers, and embodiment. These mechanisms guide exploration and autonomous choice of goals, and integrating them with deep learning opens stimulating perspectives. Deep learning (DL) approaches made great advances in artificial intelligence, but are still far away from human learning. As argued convincingly by Lake et al., differences include human capabilities to learn causal models of the world from very little data, leveraging compositional representations and priors like intuitive physics and psychology. However, there are other fundamental differences between current DL systems and human learning, as well as technical ingredients to fill this gap, that are either superficially, or not adequately, discussed by Lake et al. These fundamental mechanisms relate to autonomous development and learning. They are bound to play a central role in artificial intelligence in the future. Current DL systems require engineers to manually specify a task-specific objective function for every new task, and learn through off-line processing of large training databases. On the contrary, humans learn autonomously open-ended repertoires of skills, deciding for themselves which goals to pursue or value, and which skills to explore, driven by intrinsic motivation/curiosity and social learning through natural interaction with peers. Such learning processes are incremental, online, and progressive. Human child development involves a progressive increase of complexity in a curriculum of learning where skills are explored, acquired, and built on each other, through particular ordering and timing. Finally, human learning happens in the physical world, and through bodily and physical experimentation, under severe constraints on energy, time, and computational resources. In the two last decades, the field of Developmental and Cognitive Robotics (Cangelosi and Schlesinger, 2015, Asada et al., 2009), in strong interaction with developmental psychology and neuroscience, has achieved significant advances in computational Quantum information technologies, and intelligent learning systems, are both emergent technologies that will likely have a transforming impact on our society. The respective underlying fields of research -- quantum information (QI) versus machine learning (ML) and artificial intelligence (AI) -- have their own specific challenges, which have hitherto been investigated largely independently. However, in a growing body of recent work, researchers have been probing the question to what extent these fields can learn and benefit from each other. QML explores the interaction between quantum computing and ML, investigating how results and techniques from one field can be used to solve the problems of the other. Recently, we have witnessed breakthroughs in both directions of influence. For instance, quantum computing is finding a vital application in providing speed-ups in ML, critical in our "big data" world. Conversely, ML already permeates cutting-edge technologies, and may become instrumental in advanced quantum technologies. Aside from quantum speed-up in data analysis, or classical ML optimization used in quantum experiments, quantum enhancements have also been demonstrated for interactive learning, highlighting the potential of quantum-enhanced learning agents. Finally, works exploring the use of AI for the very design of quantum experiments, and for performing parts of genuine research autonomously, have reported their first successes. Beyond the topics of mutual enhancement, researchers have also broached the fundamental issue of quantum generalizations of ML/AI concepts. This deals with questions of the very meaning of learning and intelligence in a world that is described by quantum mechanics. In this review, we describe the main ideas, recent developments, and progress in a broad spectrum of research investigating machine learning and artificial intelligence in the quantum domain. Some recent studies have pointed that, the self-organization of neurons into brain-like structures, and the self-organization of ants into a swarm are similar in many respects. If possible to implement, these features could lead to important developments in pattern recognition systems, where perceptive capabilities can emerge and evolve from the interaction of many simple local rules. The principle of the method is inspired by the work of Chialvo and Millonas who developed the first numerical simulation in which swarm cognitive map formation could be explained. From this point, an extended model is presented in order to deal with digital image habitats, in which artificial ants could be able to react to the environment and perceive it. Evolution of pheromone fields point that artificial ant colonies could react and adapt appropriately to any type of digital habitat. KEYWORDS: Swarm Intelligence, Self-Organization, Stigmergy, Artificial Ant Systems, Pattern Recognition and Perception, Image Segmentation, Gestalt Perception Theory, Distributed Computation. Did natural consciousness and intelligent systems arise out of a path that was co-evolutionary to evolution? Can we explain human self-consciousness as having risen out of such an evolutionary path? If so how could it have been? In this first part of a two-part paper (titled IXI), we take a learning system perspective to the problem of consciousness and intelligent systems, an approach that may look unseasonable in this age of fMRI's and high tech neuroscience. We posit conscious intelligent systems in natural environments and wonder how natural factors influence their design paths. Such a perspective allows us to explain seamlessly a variety of natural factors, factors ranging from the rise and presence of the human mind, man's sense of I, his self-consciousness and his looping thought processes to factors like reproduction, incubation, extinction, sleep, the richness of natural behavior, etc. It even allows us to speculate on a possible human evolution scenario and other natural phenomena. A definition of intelligence is given in terms of performance that can be quantitatively measured. In this study, we have presented a conceptual model of Intelligent Agent System for Automatic Vehicle Checking Agent (VCA). To achieve this goal, we have introduced several kinds of agents that exhibit intelligent features. These are the Management agent, internal agent, External Agent, Watcher agent and Report agent. Metrics and measurements are suggested for evaluating the performance of Automatic Vehicle Checking Agent (VCA). Calibrate data and test facilities are suggested to facilitate the development of intelligent systems. The relation between self awareness and intelligence is an open problem these days. Despite the fact that self awarness is usually related to Emotional Intelligence, this is not the case here. The problem described in this paper is how to model an agent which knows (Cognitive) Binary Logic and which is also able to pass (without any mistake) a certain family of Turing Tests designed to verify its knowledge and its discourse about the modal states of truth corresponding to well-formed formulae within the language of Propositional Binary Logic. Agents' judgment depends on perception and previous knowledge. Assuming that previous knowledge depends on perception, we can say that judgment depends on perception. So, if judgment depends on perception, can agents judge that they have the same perception? In few words, this is the addressed paradox through this document. While illustrating on the paradox, it's found that to reach agreement in communication, it's not necessary for parties to have the same perception however the necessity is to have perception correspondence. The attempted solution to this paradox reveals a potential uncertainty in judging the matter thus supporting the skeptical view of the problem. Moreover, relating perception to intelligence, the same uncertainty is inherited by judging the level of intelligence of an agent compared to others not necessarily from the same kind (e.g. machine intelligence compared to human intelligence). Using a proposed simple mathematical model for perception and action, a tool is developed to construct scenarios, and the problem is addressed mathematically such that conclusions are drawn systematically based on mathematically defined properties. When it comes to formalization, philosophical arguments and views become more visible and explicit. Web space is the huge repository of data. Everyday lots of new information get added to this web space. The more the information, more is demand for tools to access that information. Answering users' queries about the online information intelligently is one of the great challenges in information retrieval in intelligent systems. In this paper, we will start with the brief introduction on information retrieval and intelligent systems and explain how swoogle, the semantic search engine, uses its algorithms and techniques to search for the desired contents in the web. We then continue with the clustering technique that is used to group the similar things together and discuss the machine learning technique called Self-organizing maps [6] or SOM, which is a data visualization technique that reduces the dimensions of data through the use of self-organizing neural networks. We then discuss how SOM is used to visualize the contents of the data, by following some lines of algorithm, in the form of maps. So, we could say that websites or machines can be used to retrieve the information that what exactly users want from them. This paper discusses the merits and demerits of crisp logic and fuzzy logic with respect to their applicability in intelligent response generation by a human being and by a robot. Intelligent systems must have the capability of taking decisions that are wise and handle situations intelligently. A direct relationship exists between the level of perfection in handling a situation and the level of completeness of the available knowledge or information or data required to handle the situation. The paper concludes that the use of crisp logic with complete knowledge leads to perfection in handling situations whereas fuzzy logic can handle situations imperfectly only. However, in the light of availability of incomplete knowledge fuzzy theory is more effective but may be disadvantageous as compared to crisp logic. With advancement in computer science research on artificial intelligence and in cognitive psychology research on human learning and performance, the next generation of computer-based tutoring systems moved beyond the simple presentation of pages of text or graphics. These new intelligent tutoring systems (ITSs) called cognitive tutors; incorporated model-tracing technology which is a cognitive model of student problem solving that captures students multiple strategies and common misconceptions. Such Intelligent tutoring systems or Knowledge Based Tutoring Systems can guide learners to progress in the learning process at their best. This paper deals with the review of various Intelligent tutoring systems using Bayesian Networks and how Bayesian Networks can be used for efficient decision making. The goal of this paper is to elaborate swarm intelligence for business intelligence decision making and the business rules management improvement. .The swarm optimization, which is highly influenced by the behavior of creature, performs in group. The Spatial data is defined as data that is represented by 2D or 3D images. SQL Server supports only 2D images till now. As we know that location is an essential part of any organizational data as well as business data enterprises maintain customer address lists, own property, ship goods from and to warehouses, manage transport flows among their workforce, and perform many other activities. By means to say a lot of spatial data is used and processed by enterprises, organizations and other bodies in order to make the things more visible and self descriptive. From the experiments, we found that PSO is can facilitate the intelligence in social and business behavior. In this paper, we discuss software design issues related to the development of parallel computational intelligence algorithms on multi-core CPUs, using the new Java 8 functional programming features. In particular, we focus on probabilistic graphical models (PGMs) and present the parallelisation of a collection of algorithms that deal with inference and learning of PGMs from data. Namely, maximum likelihood estimation, importance sampling, and greedy search for solving combinatorial optimisation problems. Through these concrete examples, we tackle the problem of defining efficient data structures for PGMs and parallel processing of same-size batches of data sets using Java 8 features. We also provide straightforward techniques to code parallel algorithms that seamlessly exploit multi-core processors. The experimental analysis, carried out using our open source AMIDST (Analysis of MassIve Data STreams) Java toolbox, shows the merits of the proposed solutions. As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine's ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated. In order to identify an object, human eyes firstly search the field of view for points or areas which have particular properties. These properties are used to recognise an image or an object. Then this process could be taken as a model to develop computer algorithms for images identification. This paper proposes the idea of applying the simplified firefly algorithm to search for key-areas in 2D images. For a set of input test images the proposed version of firefly algorithm has been examined. Research results are presented and discussed to show the efficiency of this evolutionary computation method. We obtain the conditions for the emergence of the swarm intelligence effect in an interactive game of restless multi-armed bandit (rMAB). A player competes with multiple agents. Each bandit has a payoff that changes with a probability $p_{c}$ per round. The agents and player choose one of three options: (1) Exploit (a good bandit), (2) Innovate (asocial learning for a good bandit among $n_{I}$ randomly chosen bandits), and (3) Observe (social learning for a good bandit). Each agent has two parameters $(c,p_{obs})$ to specify the decision: (i) $c$, the threshold value for Exploit, and (ii) $p_{obs}$, the probability for Observe in learning. The parameters $(c,p_{obs})$ are uniformly distributed. We determine the optimal strategies for the player using complete knowledge about the rMAB. We show whether or not social or asocial learning is more optimal in the $(p_{c},n_{I})$ space and define the swarm intelligence effect. We conduct a laboratory experiment (67 subjects) and observe the swarm intelligence effect only if $(p_{c},n_{I})$ are chosen so that social learning is far more optimal than asocial learning. We continue our analysis of volume and energy measures that are appropriate for quantifying inductive inference systems. We extend logical depth and conceptual jump size measures in AIT to stochastic problems, and physical measures that involve volume and energy. We introduce a graphical model of computational complexity that we believe to be appropriate for intelligent machines. We show several asymptotic relations between energy, logical depth and volume of computation for inductive inference. In particular, we arrive at a "black-hole equation" of inductive inference, which relates energy, volume, space, and algorithmic information for an optimal inductive inference solution. We introduce energy-bounded algorithmic entropy. We briefly apply our ideas to the physical limits of intelligent computation in our universe. Developing intelligent systems requires combining results from both industry and academia. In this report you find an overview of relevant research fields and industrially applicable technologies for building very large scale cyber physical systems. A concept architecture is used to illustrate how existing pieces may fit together, and the maturity of the subsystems is estimated. The goal is to structure the developments and the challenge of machine intelligence for Consumer and Industrial Internet technologists, cyber physical systems researchers and people interested in the convergence of data & Internet of Things. It can be used for planning developments of intelligent systems. We are at the dawn of a new era, where advances in computer power, broadband communication and digital sensor technologies have led to an unprecedented flood of data inundating our surrounding. It is generally believed that means such as Computational Intelligence will help to outlive these tough times. However, these hopes are improperly high. Computational Intelligence is a surprising composition of two mutually exclusive and contradicting constituents that could be coupled only if you disregard and neglect their controversies: "Computational" implies reliance on data processing and "Intelligence" implies reliance on information processing. Only those who are indifferent to data-information discrepancy can believe that such a combination can be viable. We do not believe in miracles, so we will try to share with you our reservations. This paper describes how the "SP theory of intelligence", outlined in an Appendix, may throw light on aspects of commonsense reasoning (CSR) and commonsense knowledge (CSK) (together shortened to CSRK), as discussed in another paper by Ernest Davis and Gary Marcus (DM). The SP system has the generality needed for CSRK: Turing equivalence; the generality of information compression as the foundation for the SP system both in the representation of knowledge and in concepts of prediction and probability; the versatility of the SP system in the representation of knowledge and in aspects of intelligence including forms of reasoning; and the potential of the system for the seamless integration of diverse forms of knowledge and diverse aspects of intelligence. Several examples discussed by DM, and how they may be processed in the SP system, are discussed. Also discussed are current successes in CSR (taxonomic reasoning, temporal reasoning, action and change, and qualitative reasoning), how the SP system may promote seamless integration across these areas, and how insights gained from the SP programme of research may yield some potentially useful new ways of approaching these topics. The paper considers how the SP system may help overcome several challenges in the automation of CSR described by DM, and how it meets several of the objectives for research in CSRK that they have described. Since its inception, artificial intelligence has relied upon a theoretical foundation centered around perfect rationality as the desired property of intelligent systems. We argue, as others have done, that this foundation is inadequate because it imposes fundamentally unsatisfiable requirements. As a result, there has arisen a wide gap between theory and practice in AI, hindering progress in the field. We propose instead a property called bounded optimality. Roughly speaking, an agent is bounded-optimal if its program is a solution to the constrained optimization problem presented by its architecture and the task environment. We show how to construct agents with this property for a simple class of machine architectures in a broad class of real-time environments. We illustrate these results using a simple model of an automated mail sorting facility. We also define a weaker property, asymptotic bounded optimality (ABO), that generalizes the notion of optimality in classical complexity theory. We then construct universal ABO programs, i.e., programs that are ABO no matter what real-time constraints are applied. Universal ABO programs can be used as building blocks for more complex systems. We conclude with a discussion of the prospects for bounded optimality as a theoretical basis for AI, and relate it to similar trends in philosophy, economics, and game theory. Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks. Doing so automatically allows for regular and systematic testing. A key question is how to generate the attacks. This is naturally formulated as planning under uncertainty, i.e., under incomplete knowledge about the network configuration. Previous work uses classical planning, and requires costly pre-processes reducing this uncertainty by extensive application of scanning methods. By contrast, we herein model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This allows to reason about the knowledge available, and to intelligently employ scanning actions as part of the attack. As one would expect, this accurate solution does not scale. We devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed. Evaluating this method on a suitably adapted industrial test suite, we demonstrate its effectiveness in both runtime and solution quality. Cognitive radio networks (CRNs) are networks of nodes equipped with cognitive radios that can optimize performance by adapting to network conditions. While cognitive radio networks (CRN) are envisioned as intelligent networks, relatively little research has focused on the network level functionality of CRNs. Although various routing protocols, incorporating varying degrees of adaptiveness, have been proposed for CRNs, it is imperative for the long term success of CRNs that the design of cognitive routing protocols be pursued by the research community. Cognitive routing protocols are envisioned as routing protocols that fully and seamless incorporate AI-based techniques into their design. In this paper, we provide a self-contained tutorial on various AI and machine-learning techniques that have been, or can be, used for developing cognitive routing protocols. We also survey the application of various classes of AI techniques to CRNs in general, and to the problem of routing in particular. We discuss various decision making techniques and learning techniques from AI and document their current and potential applications to the problem of routing in CRNs. We also highlight the various inference, reasoning, modeling, and learning sub tasks that a cognitive routing protocol must solve. Finally, open research issues and future directions of work are identified. How useful can machine learning be in a quantum laboratory? Here we raise the question of the potential of intelligent machines in the context of scientific research. A major motivation for the present work is the unknown reachability of various entanglement classes in quantum experiments. We investigate this question by using the projective simulation model, a physics-oriented approach to artificial intelligence. In our approach, the projective simulation system is challenged to design complex photonic quantum experiments that produce high-dimensional entangled multiphoton states, which are of high interest in modern quantum experiments. The artificial intelligence system learns to create a variety of entangled states, and improves the efficiency of their realization. In the process, the system autonomously (re)discovers experimental techniques which are only now becoming standard in modern quantum optical experiments - a trait which was not explicitly demanded from the system but emerged through the process of learning. Such features highlight the possibility that machines could have a significantly more creative role in future research. In this paper, we conduct an empirical study on discovering the ordered collective dynamics obtained by a population of artificial intelligence (AI) agents. Our intention is to put AI agents into a simulated natural context, and then to understand their induced dynamics at the population level. In particular, we aim to verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning, and scale the population size up to millions. Our results show that the population dynamics of AI agents, driven only by each agent's individual self interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature. Power grids are one of the most important components of infrastructure in today's world. Every nation is dependent on the security and stability of its own power grid to provide electricity to the households and industries. A malfunction of even a small part of a power grid can cause loss of productivity, revenue and in some cases even life. Thus, it is imperative to design a system which can detect the health of the power grid and take protective measures accordingly even before a serious anomaly takes place. To achieve this objective, we have set out to create an artificially intelligent system which can analyze the grid information at any given time and determine the health of the grid through the usage of sophisticated formal models and novel machine learning techniques like recurrent neural networks. Our system simulates grid conditions including stimuli like faults, generator output fluctuations, load fluctuations using Siemens PSS/E software and this data is trained using various classifiers like SVM, LSTM and subsequently tested. The results are excellent with our methods giving very high accuracy for the data. This model can easily be scaled to handle larger and more complex grid architectures. We propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. This model captures the hidden inter-/intra-column relationships between database tokens of different types. For each database token, the model includes a vector that encodes contextual semantic relationships. We seamlessly integrate the word embedding model into existing SQL query infrastructure and use it to enable a new class of SQL-based analytics queries called cognitive intelligence (CI) queries. CI queries use the model vectors to enable complex queries such as semantic matching, inductive reasoning queries such as analogies, predictive queries using entities not present in a database, and, more generally, using knowledge from external sources. We demonstrate unique capabilities of Cognitive Databases using an Apache Spark based prototype to execute inductive reasoning CI queries over a multi-modal database containing text and images. We believe our first-of-a-kind system exemplifies using AI functionality to endow relational databases with capabilities that were previously very hard to realize in practice. The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self-cells or non-self cells. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here. Experiments in cognitive science and decision theory show that the ways in which people combine concepts and make decisions cannot be described by classical logic and probability theory. This has serious implications for applied disciplines such as information retrieval, artificial intelligence and robotics. Inspired by a mathematical formalism that generalizes quantum mechanics the authors have constructed a contextual framework for both concept representation and decision making, together with quantum models that are in strong alignment with experimental data. The results can be interpreted by assuming the existence in human thought of a double-layered structure, a 'classical logical thought' and a 'quantum conceptual thought', the latter being responsible of the above paradoxes and nonclassical effects. The presence of a quantum structure in cognition is relevant, for it shows that quantum mechanics provides not only a useful modeling tool for experimental data but also supplies a structural model for human and artificial thought processes. This approach has strong connections with theories formalizing meaning, such as semantic analysis, and has also a deep impact on computer science, information retrieval and artificial intelligence. More specifically, the links with information retrieval are discussed in this paper. By introducing elements of information mining to tax analysis, by means of data mining software and advanced computational concepts of artificial intelligence, the problem of tax evader's crime against public property has been addressed. Through an empirical approach from a hypothetical case of use, induction algorithms, neural networks and bayesian networks are applied to determine the feasibility of its heuristic application by the tax public administrator. Different strategies are explored to facilitate the work of local and regional federal tax inspectors, considering their limited computational capabilities, but equally effective for those social scientist committed to handcrafting tax research. ----- Apresentando a introdu\c{c}\~ao de elementos de explora\c{c}\~ao de informa\c{c}\~oes para an\'alise fiscal, por meio de software de minera\c{c}\~ao de dados e conceitos avan\c{c}ados computacionais de intelig\^encia artificial, foi abordado o problema do crime de sonegador fiscal contra o patrim\^onio p\'ublico. Atrav\'es de uma abordagem emp\'irica a partir de um caso hipot\'etico de uso, os algoritmos de indu\c{c}\~ao, redes neurais e redes bayesianas s\~ao aplicados para determinar a viabilidade de sua aplica\c{c}\~ao heur\'istica pelo administrador p\'ublico tribut\'ario. Diferentes estrat\'egias s\~ao exploradas para facilitar o trabalho dos inspectores tribut\'arios federais locais e regionais, tendo em conta as suas capacidades computacionais limitados, mas igualmente eficaz para aqueles cientista social comprometido com a investiga\c{c}\~ao fiscal. Many of the artificial intelligence techniques developed to date rely on heuristic search through large spaces. Unfortunately, the size of these spaces and the corresponding computational effort reduce the applicability of otherwise novel and effective algorithms. A number of parallel and distributed approaches to search have considerably improved the performance of the search process. Our goal is to develop an architecture that automatically selects parallel search strategies for optimal performance on a variety of search problems. In this paper we describe one such architecture realized in the Eureka system, which combines the benefits of many different approaches to parallel heuristic search. Through empirical and theoretical analyses we observe that features of the problem space directly affect the choice of optimal parallel search strategy. We then employ machine learning techniques to select the optimal parallel search strategy for a given problem space. When a new search task is input to the system, Eureka uses features describing the search space and the chosen architecture to automatically select the appropriate search strategy. Eureka has been tested on a MIMD parallel processor, a distributed network of workstations, and a single workstation using multithreading. Results generated from fifteen puzzle problems, robot arm motion problems, artificial search spaces, and planning problems indicate that Eureka outperforms any of the tested strategies used exclusively for all problem instances and is able to greatly reduce the search time for these applications. The increasing attention on deep learning has tremendously spurred the design of intelligence processing hardware. The variety of emerging intelligence processors requires standard benchmarks for fair comparison and system optimization (in both software and hardware). However, existing benchmarks are unsuitable for benchmarking intelligence processors due to their non-diversity and nonrepresentativeness. Also, the lack of a standard benchmarking methodology further exacerbates this problem. In this paper, we propose BENCHIP, a benchmark suite and benchmarking methodology for intelligence processors. The benchmark suite in BENCHIP consists of two sets of benchmarks: microbenchmarks and macrobenchmarks. The microbenchmarks consist of single-layer networks. They are mainly designed for bottleneck analysis and system optimization. The macrobenchmarks contain state-of-the-art industrial networks, so as to offer a realistic comparison of different platforms. We also propose a standard benchmarking methodology built upon an industrial software stack and evaluation metrics that comprehensively reflect the various characteristics of the evaluated intelligence processors. BENCHIP is utilized for evaluating various hardware platforms, including CPUs, GPUs, and accelerators. BENCHIP will be open-sourced soon. The Turing Test (TT) checks for human intelligence, rather than any putative general intelligence. It involves repeated interaction requiring learning in the form of adaption to the human conversation partner. It is a macro-level post-hoc test in contrast to the definition of a Turing Machine (TM), which is a prior micro-level definition. This raises the question of whether learning is just another computational process, i.e. can be implemented as a TM. Here we argue that learning or adaption is fundamentally different from computation, though it does involve processes that can be seen as computations. To illustrate this difference we compare (a) designing a TM and (b) learning a TM, defining them for the purpose of the argument. We show that there is a well-defined sequence of problems which are not effectively designable but are learnable, in the form of the bounded halting problem. Some characteristics of human intelligence are reviewed including it's: interactive nature, learning abilities, imitative tendencies, linguistic ability and context-dependency. A story that explains some of these is the Social Intelligence Hypothesis. If this is broadly correct, this points to the necessity of a considerable period of acculturation (social learning in context) if an artificial intelligence is to pass the TT. Whilst it is always possible to 'compile' the results of learning into a TM, this would not be a designed TM and would not be able to continually adapt (pass future TTs). We conclude three things, namely that: a purely "designed" TM will never pass the TT; that there is no such thing as a general intelligence since it necessary involves learning; and that learning/adaption and computation should be clearly distinguished. Individual-intelligence research, from a neurological perspective, discusses the hierarchical layers of the cortex as a structure that performs conceptual abstraction and specification. This theory has been used to explain how motor-cortex regions responsible for different behavioral modalities such as writing and speaking can be utilized to express the same general concept represented higher in the cortical hierarchy. For example, the concept of a dog, represented across a region of high-level cortical-neurons, can either be written or spoken about depending on the individual's context. The higher-layer cortical areas project down the hierarchy, sending abstract information to specific regions of the motor-cortex for contextual implementation. In this paper, this idea is expanded to incorporate collective-intelligence within a hyper-cortical construct. This hyper-cortex is a multi-layered network used to represent abstract collective concepts. These ideas play an important role in understanding how collective-intelligence systems can be engineered to handle problem abstraction and solution specification. Finally, a collection of common problems in the scientific community are solved using an artificial hyper-cortex generated from digital-library metadata. In the article a turn-based game played on four computers connected via network is investigated. There are three computers with natural intelligence and one with artificial intelligence. Game table is seen by each player's own view point in all players' monitors. Domino pieces are three dimensional. For distributed systems TCP/IP protocol is used. In order to get 3D image, Microsoft XNA technology is applied. Domino 101 game is nondeterministic game that is result of the game depends on the initial random distribution of the pieces. Number of the distributions is equal to the multiplication of following combinations: . Moreover, in this game that is played by four people, players are divided into 2 pairs. Accordingly, we cannot predict how the player uses the dominoes that is according to the dominoes of his/her partner or according to his/her own dominoes. The fact that the natural intelligence can be a player in any level affects the outcome. These reasons make it difficult to develop an AI. In the article four levels of AI are developed. The AI in the first level is equivalent to the intelligence of a child who knows the rules of the game and recognizes the numbers. The AI in this level plays if it has any domino, suitable to play or says pass. In most of the games which can be played on the internet, the AI does the same. But the AI in the last level is a master player, and it can develop itself according to its competitors' levels. Mycotoxin contamination in certain agricultural systems have been a serious concern for human and animal health. Mycotoxins are toxic substances produced mostly as secondary metabolites by fungi that grow on seeds and feed in the field, or in storage. The food-borne Mycotoxins likely to be of greatest significance for human health in tropical developing countries are Aflatoxins and Fumonisins. Chili pepper is also prone to Aflatoxin contamination during harvesting, production and storage periods.Various methods used for detection of Mycotoxins give accurate results, but they are slow, expensive and destructive. Destructive method is testing a material that degrades the sample under investigation. Whereas, non-destructive testing will, after testing, allow the part to be used for its intended purpose. Ultrasonic methods, Multispectral image processing methods, Terahertz methods, X-ray and Thermography have been very popular in nondestructive testing and characterization of materials and health monitoring. Image processing methods are used to improve the visual quality of the pictures and to extract useful information from them. In this proposed work, the chili pepper samples will be collected, and the X-ray, multispectral images of the samples will be processed using image processing methods. The term "Computational Intelligence" referred as simulation of human intelligence on computers. It is also called as "Artificial Intelligence" (AI) approach. The techniques used in AI approach are Neural network, Fuzzy logic and evolutionary computation. Finally, the computational intelligence method will be used in addition to image processing to provide best, high performance and accurate results for detecting the Mycotoxin level in the samples collected. This article describes existing and expected benefits of the "SP theory of intelligence", and some potential applications. The theory aims to simplify and integrate ideas across artificial intelligence, mainstream computing, and human perception and cognition, with information compression as a unifying theme. It combines conceptual simplicity with descriptive and explanatory power across several areas of computing and cognition. In the "SP machine" -- an expression of the SP theory which is currently realized in the form of a computer model -- there is potential for an overall simplification of computing systems, including software. The SP theory promises deeper insights and better solutions in several areas of application including, most notably, unsupervised learning, natural language processing, autonomous robots, computer vision, intelligent databases, software engineering, information compression, medical diagnosis and big data. There is also potential in areas such as the semantic web, bioinformatics, structuring of documents, the detection of computer viruses, data fusion, new kinds of computer, and the development of scientific theories. The theory promises seamless integration of structures and functions within and between different areas of application. The potential value, worldwide, of these benefits and applications is at least $190 billion each year. Further development would be facilitated by the creation of a high-parallel, open-source version of the SP machine, available to researchers everywhere. Nowadays, represented by Deep Learning techniques, the field of machine learning is experiencing unprecedented prosperity and its influence is demonstrated in academia, industry and civil society. "Intelligent" has become a label which could not be neglected for most applications; celebrities and scientists also warned that the development of full artificial intelligence may spell the end of the human race. It seems that the answer to building a computer system that could automatically improve with experience is right on the next corner. While for AI and machine learning researchers, it is a consensus that we are not anywhere near the core technique which could bring the Terminator, Number 5 or R2D2 into real life, and there is not even a formal definition about what is intelligence, or one of its basic properties: Learning. Therefore, even though researchers know these concerns are not necessary currently, there is no generalized explanation about why these concerns are not necessary, and what properties people should take into account that would make these concerns to be necessary. In this paper, starts from analysing the relation between information and its representation, a necessary condition for a model to be a learning model is proposed. This condition and related future works could be used to verify whether a system is able to learn or not, and enrich our understanding of learning: one important property of Intelligence. Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn't explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents. In this paper, we present MLEANN (Meta-Learning Evolutionary Artificial Neural Network), an automatic computational framework for the adaptive optimization of artificial neural networks wherein the neural network architecture, activation function, connection weights; learning algorithm and its parameters are adapted according to the problem. We explored the performance of MLEANN and conventionally designed artificial neural networks for function approximation problems. To evaluate the comparative performance, we used three different well-known chaotic time series. We also present the state of the art popular neural network learning algorithms and some experimentation results related to convergence speed and generalization performance. We explored the performance of backpropagation algorithm; conjugate gradient algorithm, quasi-Newton algorithm and Levenberg-Marquardt algorithm for the three chaotic time series. Performances of the different learning algorithms were evaluated when the activation functions and architecture were changed. We further present the theoretical background, algorithm, design strategy and further demonstrate how effective and inevitable is the proposed MLEANN framework to design a neural network, which is smaller, faster and with a better generalization performance. Artificial immune systems (AISs) to date have generally been inspired by naive biological metaphors. This has limited the effectiveness of these systems. In this position paper two ways in which AISs could be made more biologically realistic are discussed. We propose that AISs should draw their inspiration from organisms which possess only innate immune systems, and that AISs should employ systemic models of the immune system to structure their overall design. An outline of plant and invertebrate immune systems is presented, and a number of contemporary research that more biologically-realistic AISs could have is also discussed. In a previous paper the authors argued the case for incorporating ideas from innate immunity into artificial immune systems (AISs) and presented an outline for a conceptual framework for such systems. A number of key general properties observed in the biological innate and adaptive immune systems were highlighted, and how such properties might be instantiated in artificial systems was discussed in detail. The next logical step is to take these ideas and build a software system with which AISs with these properties can be implemented and experimentally evaluated. This paper reports on the results of that step - the libtissue system. The complementary DNA (cDNA) sequence is considered to be the magic biometric technique for personal identification. In this paper, we present a new method for cDNA recognition based on the artificial neural network (ANN). Microarray imaging is used for the concurrent identification of thousands of genes. We have segmented the location of the spots in a cDNA microarray. Thus, a precise localization and segmenting of a spot are essential to obtain a more accurate intensity measurement, leading to a more precise expression measurement of a gene. The segmented cDNA microarray image is resized and it is used as an input for the proposed artificial neural network. For matching and recognition, we have trained the artificial neural network. Recognition results are given for the galleries of cDNA sequences . The numerical results show that, the proposed matching technique is an effective in the cDNA sequences process. We also compare our results with previous results and find out that, the proposed technique is an effective matching performance. Can we evolve better training data for machine learning algorithms? To investigate this question we use population-based optimisation algorithms to generate artificial surrogate training data for naive Bayes for regression. We demonstrate that the generalisation performance of naive Bayes for regression models is enhanced by training them on the artificial data as opposed to the real data. These results are important for two reasons. Firstly, naive Bayes models are simple and interpretable but frequently underperform compared to more complex "black box" models, and therefore new methods of enhancing accuracy are called for. Secondly, the idea of using the real training data indirectly in the construction of the artificial training data, as opposed to directly for model training, is a novel twist on the usual machine learning paradigm. A system with artificial intelligence usually relies on symbol manipulation, at least partly and implicitly. However, the interpretation of the symbols - what they represent and what they are about - is ultimately left to humans, as designers and users of the system. How symbols can acquire meaning for the system itself, independent of external interpretation, is an unsolved problem. Some grounding of symbols can be obtained by embodiment, that is, by causally connecting symbols (or sub-symbolic variables) to the physical environment, such as in a robot with sensors and effectors. However, a causal connection as such does not produce representation and aboutness of the kind that symbols have for humans. Here I present a theory that explains how humans and other living organisms have acquired the capability to have symbols and sub-symbolic variables that represent, refer to, and are about something else. The theory shows how reference can be to physical objects, but also to abstract objects, and even how it can be misguided (errors in reference) or be about non-existing objects. I subsequently abstract the primary components of the theory from their biological context, and discuss how and under what conditions the theory could be implemented in artificial agents. A major component of the theory is the strong nonlinearity associated with (potentially unlimited) self-reproduction. The latter is likely not acceptable in artificial systems. It remains unclear if goals other than those inherently serving self-reproduction can have aboutness and if such goals could be stabilized. Through the success of deep learning, Artificial Neural Networks (ANNs) are among the most used artificial intelligence methods nowadays. ANNs have led to major breakthroughs in various domains, such as particle physics, reinforcement learning, speech recognition, computer vision, and so on. Taking inspiration from the network properties of biological neural networks (e.g. sparsity, scale-freeness), we argue that (contrary to general practice) Artificial Neural Networks (ANN), too, should not have fully-connected layers. We show how ANNs perform perfectly well with sparsely-connected layers. Following a Darwinian evolutionary approach, we propose a novel algorithm which evolves an initial random sparse topology (i.e. an Erd\H{o}s-R\'enyi random graph) of two consecutive layers of neurons into a scale-free topology, during the ANN training process. The resulting sparse layers can safely replace the corresponding fully-connected layers. Our method allows to quadratically reduce the number of parameters in the fully conencted layers of ANNs, yielding quadratically faster computational times in both phases (i.e. training and inference), at no decrease in accuracy. We demonstrate our claims on two popular ANN types (restricted Boltzmann machine and multi-layer perceptron), on two types of tasks (supervised and unsupervised learning), and on 14 benchmark datasets. We anticipate that our approach will enable ANNs having billions of neurons and evolved topologies to be capable of handling complex real-world tasks that are intractable using state-of-the-art methods. Despite the effort of many researchers in the area of multi-agent systems (MAS) for designing and programming agents, a few years ago the research community began to take into account that common features among different MAS exists. Based on these common features, several tools have tackled the problem of agent development on specific application domains or specific types of agents. As a consequence, their scope is restricted to a subset of the huge application domain of MAS. In this paper we propose a generic infrastructure for programming agents whose name is Brainstorm/J. The infrastructure has been implemented as an object oriented framework. As a consequence, our approach supports a broader scope of MAS applications than previous efforts, being flexible and reusable. Information Integration is a young and exciting field with enormous research and commercial significance in the new world of the Information Society. It stands at the crossroad of Databases and Artificial Intelligence requiring novel techniques that bring together different methods from these fields. Information from disparate heterogeneous sources often with no a-priori common schema needs to be synthesized in a flexible, transparent and intelligent way in order to respond to the demands of a query thus enabling a more informed decision by the user or application program. The field although relatively young has already found many practical applications particularly for integrating information over the World Wide Web. This paper gives a brief introduction of the field highlighting some of the main current and future research issues and application areas. It attempts to evaluate the current and potential role of Computational Logic in this and suggests some of the problems where logic-based techniques could be used. This paper presents a new approach to solving N-queen problems, which involves a model of distributed autonomous agents with artificial life (ALife) and a method of representing N-queen constraints in an agent environment. The distributed agents locally interact with their living environment, i.e., a chessboard, and execute their reactive behaviors by applying their behavioral rules for randomized motion, least-conflict position searching, and cooperating with other agents etc. The agent-based N-queen problem solving system evolves through selection and contest according to the rule of Survival of the Fittest, in which some agents will die or be eaten if their moving strategies are less efficient than others. The experimental results have shown that this system is capable of solving large-scale N-queen problems. This paper also provides a model of ALife agents for solving general CSPs. The study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. In a companion paper, we introduce a new framework to model belief change. This framework combines temporal and epistemic modalities with a notion of plausibility, allowing us to examine the change of beliefs over time. In this paper, we show how belief revision and belief update can be captured in our framework. This allows us to compare the assumptions made by each method, and to better understand the principles underlying them. In particular, it shows that Katsuno and Mendelzon's notion of belief update depends on several strong assumptions that may limit its applicability in artificial intelligence. Finally, our analysis allow us to identify a notion of minimal change that underlies a broad range of belief change operations including revision and update. The use of intelligent systems for stock market predictions has been widely established. In this paper, we investigate how the seemingly chaotic behavior of stock markets could be well represented using several connectionist paradigms and soft computing techniques. To demonstrate the different techniques, we considered Nasdaq-100 index of Nasdaq Stock MarketS and the S&P CNX NIFTY stock index. We analyzed 7 year's Nasdaq 100 main index values and 4 year's NIFTY index values. This paper investigates the development of a reliable and efficient technique to model the seemingly chaotic behavior of stock markets. We considered an artificial neural network trained using Levenberg-Marquardt algorithm, Support Vector Machine (SVM), Takagi-Sugeno neuro-fuzzy model and a Difference Boosting Neural Network (DBNN). This paper briefly explains how the different connectionist paradigms could be formulated using different learning methods and then investigates whether they can provide the required level of performance, which are sufficiently good and robust so as to provide a reliable forecast model for stock market indices. Experiment results reveal that all the connectionist paradigms considered could represent the stock indices behavior very accurately. Normally a decision support system is build to solve problem where multi-criteria decisions are involved. The knowledge base is the vital part of the decision support containing the information or data that is used in decision-making process. This is the field where engineers and scientists have applied several intelligent techniques and heuristics to obtain optimal decisions from imprecise information. In this paper, we present a hybrid neuro-genetic learning approach for the adaptation a Mamdani fuzzy inference system for the Tactical Air Combat Decision Support System (TACDSS). Some simulation results demonstrating the difference of the learning techniques and are also provided. Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequence-processing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2^9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a by-product of the way humans allocate memory space to past events? Knowledge plays a central role in human and artificial intelligence. One of the key characteristics of knowledge is its structured organization. Knowledge can be and should be presented in multiple levels and multiple views to meet people's needs in different levels of granularities and from different perspectives. In this paper, we stand on the view point of granular computing and provide our understanding on multi-level and multi-view of knowledge through granular knowledge structures (GKS). Representation of granular knowledge structures, operations for building granular knowledge structures and how to use them are investigated. As an illustration, we provide some examples through results from an analysis of proceeding papers. Results show that granular knowledge structures could help users get better understanding of the knowledge source from set theoretical, logical and visual point of views. One may consider using them to meet specific needs or solve certain kinds of problems. The recurrent theme of this paper is that sequences of long temporal patterns as opposed to sequences of simple statements are to be fed into computation devices, being them (new proposed) models for brain activity or multi-core/many-core computers. In such models, parts of these long temporal patterns are already committed while other are predicted. This combination of matching patterns and making predictions appears as a key element in producing intelligent processing in brain models and getting efficient speculative execution on multi-core/many-core computers. A bridge between these far-apart models of computation could be provided by appropriate design of massively parallel, interactive programming languages. Agapia is a recently proposed language of this kind, where user controlled long high-level temporal structures occur at the interaction interfaces of processes. In this paper Agapia is used to link HTMs brain models with TRIPS multi-core/many-core architectures. General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observations, actions, and rewards. On the other hand, reinforcement learning is well-developed for small finite state Markov Decision Processes (MDPs). So far it is an art performed by human designers to extract the right state representation out of the bare observations, i.e. to reduce the agent setup to the MDP framework. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in a companion article. General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II. The role of POMDPs is also considered there. This paper presents a combination of several automated reasoning and proof presentation tools with the Mizar system for formalization of mathematics. The combination forms an online service called MizAR, similar to the SystemOnTPTP service for first-order automated reasoning. The main differences to SystemOnTPTP are the use of the Mizar language that is oriented towards human mathematicians (rather than the pure first-order logic used in SystemOnTPTP), and setting the service in the context of the large Mizar Mathematical Library of previous theorems,definitions, and proofs (rather than the isolated problems that are solved in SystemOnTPTP). These differences poses new challenges and new opportunities for automated reasoning and for proof presentation tools. This paper describes the overall structure of MizAR, and presents the automated reasoning systems and proof presentation tools that are combined to make MizAR a useful mathematical service. The Dendritic Cell Algorithm (DCA) is inspired by the function of the dendritic cells of the human immune system. In nature, dendritic cells are the intrusion detection agents of the human body, policing the tissue and organs for potential invaders in the form of pathogens. In this research, and abstract model of DC behaviour is developed and subsequently used to form an algorithm, the DCA. The abstraction process was facilitated through close collaboration with laboratory- based immunologists, who performed bespoke experiments, the results of which are used as an integral part of this algorithm. The DCA is a population based algorithm, with each agent in the system represented as an 'artificial DC'. Each DC has the ability to combine multiple data streams and can add context to data suspected as anomalous. In this chapter the abstraction process and details of the resultant algorithm are given. The algorithm is applied to numerous intrusion detection problems in computer security including the detection of port scans and botnets, where it has produced impressive results with relatively low rates of false positives. Uncertainty of decisions in safety-critical engineering applications can be estimated on the basis of the Bayesian Markov Chain Monte Carlo (MCMC) technique of averaging over decision models. The use of decision tree (DT) models assists experts to interpret causal relations and find factors of the uncertainty. Bayesian averaging also allows experts to estimate the uncertainty accurately when a priori information on the favored structure of DTs is available. Then an expert can select a single DT model, typically the Maximum a Posteriori model, for interpretation purposes. Unfortunately, a priori information on favored structure of DTs is not always available. For this reason, we suggest a new prior on DTs for the Bayesian MCMC technique. We also suggest a new procedure of selecting a single DT and describe an application scenario. In our experiments on the Short-Term Conflict Alert data our technique outperforms the existing Bayesian techniques in predictive accuracy of the selected single DTs. Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions. The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves. Research in the field of Artificial Intelligence is continually progressing to simulate the human knowledge into automated intelligent knowledge base, which can encode and retrieve knowledge efficiently along with the capability of being is consistent and scalable at all times. However, there is no system at hand that can match the diversified abilities of human knowledge base. In this position paper, we put forward a theoretical model of a different system that intends to integrate pieces of knowledge, Informledge System (ILS). ILS would encode the knowledge, by virtue of knowledge units linked across diversified domains. The proposed ILS comprises of autonomous knowledge units termed as Knowledge Network Node (KNN), which would help in efficient cross-linking of knowledge units to encode fresh knowledge. These links are reasoned and inferred by the Parser and Link Manager, which are part of KNN. We present an application focused on the design of resilient long-reach passive optical networks. We specifically consider dual-parented networks whereby each customer must be connected to two metro sites via local exchange sites. An important property of such a placement is resilience to single metro node failure. The objective of the application is to determine the optimal position of a set of metro nodes such that the total optical fibre length is minimized. We prove that this problem is NP-Complete. We present two alternative combinatorial optimisation approaches to finding an optimal metro node placement using: a mixed integer linear programming (MIP) formulation of the problem; and, a hybrid approach that uses clustering as a preprocessing step. We consider a detailed case-study based on a network for Ireland. The hybrid approach scales well and finds solutions that are close to optimal, with a runtime that is two orders-of-magnitude better than the MIP model. It is widely recognized today that the management of imprecision and vagueness will yield more intelligent and realistic knowledge-based applications. Description Logics (DLs) are a family of knowledge representation languages that have gained considerable attention the last decade, mainly due to their decidability and the existence of empirically high performance of reasoning algorithms. In this paper, we extend the well known fuzzy ALC DL to the fuzzy SHIN DL, which extends the fuzzy ALC DL with transitive role axioms (S), inverse roles (I), role hierarchies (H) and number restrictions (N). We illustrate why transitive role axioms are difficult to handle in the presence of fuzzy interpretations and how to handle them properly. Then we extend these results by adding role hierarchies and finally number restrictions. The main contributions of the paper are the decidability proof of the fuzzy DL languages fuzzy-SI and fuzzy-SHIN, as well as decision procedures for the knowledge base satisfiability problem of the fuzzy-SI and fuzzy-SHIN. The theoretical transition from the graphs of production systems to the bipartite graphs of the MIVAR nets is shown. Examples of the implementation of the MIVAR nets in the formalisms of matrixes and graphs are given. The linear computational complexity of algorithms for automated building of objects and rules of the MIVAR nets is theoretically proved. On the basis of the MIVAR nets the UDAV software complex is developed, handling more than 1.17 million objects and more than 3.5 million rules on ordinary computers. The results of experiments that confirm a linear computational complexity of the MIVAR method of information processing are given. Keywords: MIVAR, MIVAR net, logical inference, computational complexity, artificial intelligence, intelligent systems, expert systems, General Problem Solver. Language evolution might have preferred certain prior social configurations over others. Experiments conducted with models of different social structures (varying subgroup interactions and the role of a dominant interlocutor) suggest that having isolated agent groups rather than an interconnected agent is more advantageous for the emergence of a social communication system. Distinctive groups that are closely connected by communication yield systems less like natural language than fully isolated groups inhabiting the same world. Furthermore, the addition of a dominant male who is asymmetrically favoured as a hearer, and equally likely to be a speaker has no positive influence on the disjoint groups. User preferences for automated assistance often vary widely, depending on the situation, and quality or presentation of help. Developing effectivemodels to learn individual preferences online requires domain models that associate observations of user behavior with their utility functions, which in turn can be constructed using utility elicitation techniques. However, most elicitation methods ask for users' predicted utilities based on hypothetical scenarios rather than more realistic experienced utilities. This is especially true in interface customization, where users are asked to assess novel interface designs. We propose experiential utility elicitation methods for customization and compare these to predictivemethods. As experienced utilities have been argued to better reflect true preferences in behavioral decision making, the purpose here is to investigate accurate and efficient procedures that are suitable for software domains. Unlike conventional elicitation, our results indicate that an experiential approach helps people understand stochastic outcomes, as well as better appreciate the sequential utility of intelligent assistance. Intelligent systems in an open world must reason about many interacting entities related to each other in diverse ways and having uncertain features and relationships. Traditional probabilistic languages lack the expressive power to handle relational domains. Classical first-order logic is sufficiently expressive, but lacks a coherent plausible reasoning capability. Recent years have seen the emergence of a variety of approaches to integrating first-order logic, probability, and machine learning. This paper presents Multi-entity Bayesian networks (MEBN), a formal system that integrates First Order Logic (FOL) with Bayesian probability theory. MEBN extends ordinary Bayesian networks to allow representation of graphical models with repeated sub-structures, and can express a probability distribution over models of any consistent, finitely axiomatizable first-order theory. We present the logic using an example inspired by the Paramount Series StarTrek. A key prerequisite to optimal reasoning under uncertainty in intelligent systems is to start with good class probability estimates. This paper improves on the current best probability estimation trees (Bagged-PETs) and also presents a new ensemble-based algorithm (MOB-ESP). Comparisons are made using several benchmark datasets and multiple metrics. These experiments show that MOB-ESP outputs significantly more accurate class probabilities than either the baseline BPETs algorithm or the enhanced version presented here (EB-PETs). These results are based on metrics closely associated with the average accuracy of the predictions. MOB-ESP also provides much better probability rankings than B-PETs. The paper further suggests how these estimation techniques can be applied in concert with a broader category of classifiers. For the past few decades, man has been trying to create an intelligent computer which can talk and respond like he can. The task of creating a system that can talk like a human being is the primary objective of Automatic Speech Recognition. Various Speech Recognition techniques have been developed in theory and have been applied in practice. This paper discusses the problems that have been encountered in developing Speech Recognition, the techniques that have been applied to automate the task, and a representation of the core problems of present day Speech Recognition by using Fuzzy Mathematics. Empty taxi cruising represents a wastage of resources in the context of urban taxi services. In this work, we seek to minimize such wastage. An analysis of a large trace of taxi operations reveals that the services' inefficiency is caused by drivers' greedy cruising behavior. We model the existing system as a continuous time Markov chain. To address the problem, we propose that each taxi be equipped with an intelligent agent that will guide the driver when cruising for passengers. Then, drawing from AI literature on multiagent planning, we explore two possible ways to compute such guidance. The first formulation assumes fully cooperative drivers. This allows us, in principle, to compute systemwide optimal cruising policy. This is modeled as a Markov decision process. The second formulation assumes rational drivers, seeking to maximize their own profit. This is modeled as a stochastic congestion game, a specialization of stochastic games. Nash equilibrium policy is proposed as the solution to the game, where no driver has the incentive to singly deviate from it. Empirical result shows that both formulations improve the efficiency of the service significantly. This paper considers the problem of knowledge-based model construction in the presence of uncertainty about the association of domain entities to random variables. Multi-entity Bayesian networks (MEBNs) are defined as a representation for knowledge in domains characterized by uncertainty in the number of relevant entities, their interrelationships, and their association with observables. An MEBN implicitly specifies a probability distribution in terms of a hierarchically structured collection of Bayesian network fragments that together encode a joint probability distribution over arbitrarily many interrelated hypotheses. Although a finite query-complete model can always be constructed, association uncertainty typically makes exact model construction and evaluation intractable. The objective of hypothesis management is to balance tractability against accuracy. We describe an application to the problem of using intelligence reports to infer the organization and activities of groups of military vehicles. Our approach is compared to related work in the tracking and fusion literature. Many intelligent user interfaces employ application and user models to determine the user's preferences, goals and likely future actions. Such models require application analysis, adaptation and expansion. Building and maintaining such models adds a substantial amount of time and labour to the application development cycle. We present a system that observes the interface of an unmodified application and records users' interactions with the application. From a history of such observations we build a coarse state space of observed interface states and actions between them. To refine the space, we hypothesize sub-states based upon the histories that led users to a given state. We evaluate the information gain of possible state splits, varying the length of the histories considered in such splits. In this way, we automatically produce a stochastic dynamic model of the application and of how it is used. To evaluate our approach, we present models derived from real-world application usage data. Many applications of intelligent systems require reasoning about the mental states of agents in the domain. We may want to reason about an agent's beliefs, including beliefs about other agents; we may also want to reason about an agent's preferences, and how his beliefs and preferences relate to his behavior. We define a probabilistic epistemic logic (PEL) in which belief statements are given a formal semantics, and provide an algorithm for asserting and querying PEL formulas in Bayesian networks. We then show how to reason about an agent's behavior by modeling his decision process as an influence diagram and assuming that he behaves rationally. PEL can then be used for reasoning from an agent's observed actions to conclusions about other aspects of the domain, including unobserved domain variables and the agent's mental states. The application of Bayesian networks (BNs) to cognitive assessment and intelligent tutoring systems poses new challenges for model construction. When cognitive task analyses suggest constructing a BN with several latent variables, empirical model criticism of the latent structure becomes both critical and complex. This paper introduces a methodology for criticizing models both globally (a BN in its entirety) and locally (observable nodes), and explores its value in identifying several kinds of misfit: node errors, edge errors, state errors, and prior probability errors in the latent structure. The results suggest the indices have potential for detecting model misfit and assisting in locating problematic components of the model. We present a new abductive, probabilistic theory of plan recognition. This model differs from previous plan recognition theories in being centered around a model of plan execution: most previous methods have been based on plans as formal objects or on rules describing the recognition process. We show that our new model accounts for phenomena omitted from most previous plan recognition theories: notably the cumulative effect of a sequence of observations of partially-ordered, interleaved plans and the effect of context on plan adoption. The model also supports inferences about the evolution of plan execution in situations where another agent intervenes in plan execution. This facility provides support for using plan recognition to build systems that will intelligently assist a user. The Lumiere Project centers on harnessing probability and utility to provide assistance to computer software users. We review work on Bayesian user models that can be employed to infer a users needs by considering a user's background, actions, and queries. Several problems were tackled in Lumiere research, including (1) the construction of Bayesian models for reasoning about the time-varying goals of computer users from their observed actions and queries, (2) gaining access to a stream of events from software applications, (3) developing a language for transforming system events into observational variables represented in Bayesian user models, (4) developing persistent profiles to capture changes in a user expertise, and (5) the development of an overall architecture for an intelligent user interface. Lumiere prototypes served as the basis for the Office Assistant in the Microsoft Office '97 suite of productivity applications. This paper presents two new approaches to decomposing and solving large Markov decision problems (MDPs), a partial decoupling method and a complete decoupling method. In these approaches, a large, stochastic decision problem is divided into smaller pieces. The first approach builds a cache of policies for each part of the problem independently, and then combines the pieces in a separate, light-weight step. A second approach also divides the problem into smaller pieces, but information is communicated between the different problem pieces, allowing intelligent decisions to be made about which piece requires the most attention. Both approaches can be used to find optimal policies or approximately optimal policies with provable bounds. These algorithms also provide a framework for the efficient transfer of knowledge across problems that share similar structure. Ever growing number of image documents available on the Internet continuously motivates research in better annotation models and more efficient retrieval methods. Formal knowledge representation of objects and events in pictures, their interaction as well as context complexity becomes no longer an option for a quality image repository, but a necessity. We present an ontology-based online image annotation tool WNtags and demonstrate its usefulness in several typical multimedia retrieval tasks using International Affective Picture System emotionally annotated image database. WNtags is built around WordNet lexical ontology but considers Suggested Upper Merged Ontology as the preferred labeling formalism. WNtags uses sets of weighted WordNet synsets as high-level image semantic descriptors and query matching is performed with word stemming and node distance metrics. We also elaborate our near future plans to expand image content description with induced affect as in stimuli for research of human emotion and attention. Approximate models of world state transitions are necessary when building plans for complex systems operating in dynamic environments. External event probabilities can depend on state feature values as well as time spent in that particular state. We assign temporally -dependent probability functions to state transitions. These functions are used to locally compute state probabilities, which are then used to select highly probable goal paths and eliminate improbable states. This probabilistic model has been implemented in the Cooperative Intelligent Real-time Control Architecture (CIRCA), which combines an AI planner with a separate real-time system such that plans are developed, scheduled, and executed with real-time guarantees. We present flight simulation tests that demonstrate how our probabilistic model may improve CIRCA performance. Most traditional models of uncertainty have focused on the associational relationship among variables as captured by conditional dependence. In order to successfully manage intelligent systems for decision making, however, we must be able to predict the effects of actions. In this paper, we attempt to unite two branches of research that address such predictions: causal modeling and decision analysis. First, we provide a definition of causal dependence in decision-analytic terms, which we derive from consequences of causal dependence cited in the literature. Using this definition, we show how causal dependence can be represented within an influence diagram. In particular, we identify two inadequacies of an ordinary influence diagram as a representation for cause. We introduce a special class of influence diagrams, called causal influence diagrams, which corrects one of these problems, and identify situations where the other inadequacy can be eliminated. In addition, we describe the relationships between Howard Canonical Form and existing graphical representations of cause. This paper addresses learning stochastic rules especially on an inter-attribute relation based on a Minimum Description Length (MDL) principle with a finite number of examples, assuming an application to the design of intelligent relational database systems. The stochastic rule in this paper consists of a model giving the structure like the dependencies of a Bayesian Belief Network (BBN) and some stochastic parameters each indicating a conditional probability of an attribute value given the state determined by the other attributes' values in the same record. Especially, we propose the extended version of the algorithm of Chow and Liu in that our learning algorithm selects the model in the range where the dependencies among the attributes are represented by some general plural number of trees. This paper describes the architecture of R&D Analyst, a commercial intelligent decision system for evaluating corporate research and development projects and portfolios. In analyzing projects, R&D Analyst interactively guides a user in constructing an influence diagram model for an individual research project. The system's interactive approach can be clearly explained from a blackboard system perspective. The opportunistic reasoning emphasis of blackboard systems satisfies the flexibility requirements of model construction, thereby suggesting that a similar architecture would be valuable for developing normative decision systems in other domains. Current research is aimed at extending the system architecture to explicitly consider of sequential decisions involving limited temporal, financial, and physical resources. In this paper the theory of semi-bounded rationality is proposed as an extension of the theory of bounded rationality. In particular, it is proposed that a decision making process involves two components and these are the correlation machine, which estimates missing values, and the causal machine, which relates the cause to the effect. Rational decision making involves using information which is almost always imperfect and incomplete as well as some intelligent machine which if it is a human being is inconsistent to make decisions. In the theory of bounded rationality this decision is made irrespective of the fact that the information to be used is incomplete and imperfect and the human brain is inconsistent and thus this decision that is to be made is taken within the bounds of these limitations. In the theory of semi-bounded rationality, signal processing is used to filter noise and outliers in the information and the correlation machine is applied to complete the missing information and artificial intelligence is used to make more consistent decisions. In this paper the theory of flexibly-bounded rationality which is an extension to the theory of bounded rationality is revisited. Rational decision making involves using information which is almost always imperfect and incomplete together with some intelligent machine which if it is a human being is inconsistent to make decisions. In bounded rationality, this decision is made irrespective of the fact that the information to be used is incomplete and imperfect and that the human brain is inconsistent and thus this decision that is to be made is taken within the bounds of these limitations. In the theory of flexibly-bounded rationality, advanced information analysis is used, the correlation machine is applied to complete missing information and artificial intelligence is used to make more consistent decisions. Therefore flexibly-bounded rationality expands the bounds within which rationality is exercised. Because human decision making is essentially irrational, this paper proposes the theory of marginalization of irrationality in decision making to deal with the problem of satisficing in the presence of irrationality. The current paper offers a perspective on what we term conceptive intelligence - the capacity of an agent to continuously think of new object definitions (tasks, problems, physical systems, etc.) and to look for methods to realize them. The framework, called a Brouwer machine, is inspired by previous research in design theory and modeling, with its roots in the constructivist mathematics of intuitionism. The dual constructivist perspective we describe offers the possibility to create novelty both in terms of the types of objects and the methods for constructing objects. More generally, the theoretical work on which Brouwer machines are based is called imaginative constructivism. Based on the framework and the theory, we discuss many paradigms and techniques omnipresent in AI research and their merits and shortcomings for modeling aspects of design, as described by imaginative constructivism. To demonstrate and explain the type of creative process expressed by the notion of a Brouwer machine, we compare this concept with a system using genetic algorithms for scientific law discovery. Over the past decade, AI has made a remarkable progress due to recently revived Deep Learning technology. Deep Learning enables to process large amounts of data using simplified neuron networks that simulate the way in which the brain works. At the same time, there is another point of view that posits that brain is processing information, not data. This duality hampered AI progress for years. To provide a remedy for this situation, I propose a new definition of information that considers it as a coupling between two separate entities - physical information (that implies data processing) and semantic information (that provides physical information interpretation). In such a case, intelligence arises as a result of information processing. The paper points on the consequences of this turn for the AI design philosophy. We show in this paper how managed multi-context systems (mMCSs) can be turned into a reactive formalism suitable for continuous reasoning in dynamic environments. We extend mMCSs with (abstract) sensors and define the notion of a run of the extended systems. We then show how typical problems arising in online reasoning can be addressed: handling potentially inconsistent sensor input, modeling intelligent forms of forgetting, selective integration of knowledge, and controlling the reasoning effort spent by contexts, like setting contexts to an idle mode. We also investigate the complexity of some important related decision problems and discuss different design choices which are given to the knowledge engineer. There is considerable uncertainty about what properties, capabilities and motivations future AGIs will have. In some plausible scenarios, AGIs may pose security risks arising from accidents and defects. In order to mitigate these risks, prudent early AGI research teams will perform significant testing on their creations before use. Unfortunately, if an AGI has human-level or greater intelligence, testing itself may not be safe; some natural AGI goal systems create emergent incentives for AGIs to tamper with their test environments, make copies of themselves on the internet, or convince developers and operators to do dangerous things. In this paper, we survey the AGI containment problem - the question of how to build a container in which tests can be conducted safely and reliably, even on AGIs with unknown motivations and capabilities that could be dangerous. We identify requirements for AGI containers, available mechanisms, and weaknesses that need to be addressed. To operate intelligently in the world, an agent must reason about its actions. The consequences of an action are a function of both the state of the world and the action itself. Many aspects of the world are inherently stochastic, so a representation for reasoning about actions must be able to express chances of world states as well as indeterminacy in the effects of actions and other events. This paper presents a propositional temporal probability logic for representing and reasoning about actions. The logic can represent the probability that facts hold and events occur at various times. It can represent the probability that actions and other events affect the future. It can represent concurrent actions and conditions that hold or change during execution of an action. The model of probability relates probabilities over time. The logical language integrates both modal and probabilistic constructs and can thus represent and distinguish between possibility, probability, and truth. Several examples illustrating the use of the logic are given. As the technology for building knowledge based systems has matured, important lessons have been learned about the relationship between the architecture of a system and the nature of the problems it is intended to solve. We are implementing a knowledge engineering tool called BART that is designed with these lessons in mind. BART is a Bayesian reasoning tool that makes belief networks and other probabilistic techniques available to knowledge engineers building classificatory problem solvers. BART has already been used to develop a decision aid for classifying ship images, and it is currently being used to manage uncertainty in systems concerned with analyzing intelligence reports. This paper discusses how state-of-the-art probabilistic methods fit naturally into a knowledge based approach to classificatory problem solving, and describes the current capabilities of BART. Scheduling in the factory setting is compounded by computational complexity and temporal uncertainty. Together, these two factors guarantee that the process of constructing an optimal schedule will be costly and the chances of executing that schedule will be slight. Temporal uncertainty in the task execution time can be offset by several methods: eliminate uncertainty by careful engineering, restore certainty whenever it is lost, reduce the uncertainty by using more accurate sensors, and quantify and circumscribe the remaining uncertainty. Unfortunately, these methods focus exclusively on the sources of uncertainty and fail to apply knowledge of the tasks which are to be scheduled. A complete solution must adapt the schedule of activities to be performed according to the evolving state of the production world. The example of vision-directed assembly is presented to illustrate that the principle of least commitment, in the creation of a plan, in the representation of a schedule, and in the execution of a schedule, enables a robot to operate intelligently and efficiently, even in the presence of considerable uncertainty in the sequence of future events. The control and integration of distributed, multi-sensor perceptual systems is a complex and challenging problem. The observations or opinions of different sensors are often disparate incomparable and are usually only partial views. Sensor information is inherently uncertain and in addition the individual sensors may themselves be in error with respect to the system as a whole. The successful operation of a multi-sensor system must account for this uncertainty and provide for the aggregation of disparate information in an intelligent and robust manner. We consider the sensors of a multi-sensor system to be members or agents of a team, able to offer opinions and bargain in group decisions. We will analyze the coordination and control of this structure using a theory of team decision-making. We present some new analytic results on multi-sensor aggregation and detail a simulation which we use to investigate our ideas. This simulation provides a basis for the analysis of complex agent structures cooperating in the presence of uncertainty. The results of this study are discussed with reference to multi-sensor robot systems, distributed Al and decision making under uncertainty. In designing an intelligent system that must be able to explain its reasoning to a human user, or to provide generalizations that the human user finds reasonable, it may be useful to take into consideration psychological data on what types of concepts and categories people naturally use. The psychological literature on concept learning and categorization provides strong evidence that certain categories are more easily learned, recalled, and recognized than others. We show here how a measure of the informational value of a category predicts the results of several important categorization experiments better than standard alternative explanations. This suggests that information-based approaches to machine generalization may prove particularly useful and natural for human users of the systems. A key challenge in non-cooperative multi-agent systems is that of developing efficient planning algorithms for intelligent agents to interact and perform effectively among boundedly rational, self-interested agents (e.g., humans). The practicality of existing works addressing this challenge is being undermined due to either the restrictive assumptions of the other agents' behavior, the failure in accounting for their rationality, or the prohibitively expensive cost of modeling and predicting their intentions. To boost the practicality of research in this field, we investigate how intention prediction can be efficiently exploited and made practical in planning, thereby leading to efficient intention-aware planning frameworks capable of predicting the intentions of other agents and acting optimally with respect to their predicted intentions. We show that the performance losses incurred by the resulting planning policies are linearly bounded by the error of intention prediction. Empirical evaluations through a series of stochastic games demonstrate that our policies can achieve better and more robust performance than the state-of-the-art algorithms. Rational agents are usually built to maximize rewards. However, AGI agents can find undesirable ways of maximizing any prior reward function. Therefore value learning is crucial for safe AGI. We assume that generalized states of the world are valuable - not rewards themselves, and propose an extension of AIXI, in which rewards are used only to bootstrap hierarchical value learning. The modified AIXI agent is considered in the multi-agent environment, where other agents can be either humans or other "mature" agents, which values should be revealed and adopted by the "infant" AGI agent. General framework for designing such empathic agent with ethical bias is proposed also as an extension of the universal intelligence model. Moreover, we perform experiments in the simple Markov environment, which demonstrate feasibility of our approach to value learning in safe AGI. This note revisits the concepts of task and difficulty. The notion of cognitive task and its use for the evaluation of intelligent systems is still replete with issues. The view of tasks as MDP in the context of reinforcement learning has been especially useful for the formalisation of learning tasks. However, this alternate interaction does not accommodate well for some other tasks that are usual in artificial intelligence and, most especially, in animal and human evaluation. In particular, we want to have a more general account of episodes, rewards and responses, and, most especially, the computational complexity of the algorithm behind an agent solving a task. This is crucial for the determination of the difficulty of a task as the (logarithm of the) number of computational steps required to acquire an acceptable policy for the task, which includes the exploration of policies and their verification. We introduce a notion of asynchronous-time stochastic tasks. Based on this interpretation, we can see what task difficulty is, what instance difficulty is (relative to a task) and also what task compositions and decompositions are. The ability to generalize is an important feature of any intelligent agent. Not only because it may allow the agent to cope with large amounts of data, but also because in some environments, an agent with no generalization capabilities cannot learn. In this work we outline several criteria for generalization, and present a dynamic and autonomous machinery that enables projective simulation agents to meaningfully generalize. Projective simulation, a novel, physical approach to artificial intelligence, was recently shown to perform well in standard reinforcement learning problems, with applications in advanced robotics as well as quantum experiments. Both the basic projective simulation model and the presented generalization machinery are based on very simple principles. This allows us to provide a full analytical analysis of the agent's performance and to illustrate the benefit the agent gains by generalizing. Specifically, we show that already in basic (but extreme) environments, learning without generalization may be impossible, and demonstrate how the presented generalization machinery enables the projective simulation agent to learn. In this paper, a method is proposed to detect the emotion of a song based on its lyrical and audio features. Lyrical features are generated by segmentation of lyrics during the process of data extraction. ANEW and WordNet knowledge is then incorporated to compute Valence and Arousal values. In addition to this, linguistic association rules are applied to ensure that the issue of ambiguity is properly addressed. Audio features are used to supplement the lyrical ones and include attributes like energy, tempo, and danceability. These features are extracted from The Echo Nest, a widely used music intelligence platform. Construction of training and test sets is done on the basis of social tags extracted from the last.fm website. The classification is done by applying feature weighting and stepwise threshold reduction on the k-Nearest Neighbors algorithm to provide fuzziness in the classification. We administered the Verbal IQ (VIQ) part of the Wechsler Preschool and Primary Scale of Intelligence (WPPSI-III) to the ConceptNet 4 AI system. The test questions (e.g., "Why do we shake hands?") were translated into ConceptNet 4 inputs using a combination of the simple natural language processing tools that come with ConceptNet together with short Python programs that we wrote. The question answering used a version of ConceptNet based on spectral methods. The ConceptNet system scored a WPPSI-III VIQ that is average for a four-year-old child, but below average for 5 to 7 year-olds. Large variations among subtests indicate potential areas of improvement. In particular, results were strongest for the Vocabulary and Similarities subtests, intermediate for the Information subtest, and lowest for the Comprehension and Word Reasoning subtests. Comprehension is the subtest most strongly associated with common sense. The large variations among subtests and ordinary common sense strongly suggest that the WPPSI-III VIQ results do not show that "ConceptNet has the verbal abilities a four-year-old." Rather, children's IQ tests offer one objective metric for the evaluation and comparison of AI systems. Also, this work continues previous research on Psychometric AI. Agents of general intelligence deployed in real-world scenarios must adapt to ever-changing environmental conditions. While such adaptive agents may leverage engineered knowledge, they will require the capacity to construct and evaluate knowledge themselves from their own experience in a bottom-up, constructivist fashion. This position paper builds on the idea of encoding knowledge as temporally extended predictions through the use of general value functions. Prior work has focused on learning predictions about externally derived signals about a task or environment (e.g. battery level, joint position). Here we advocate that the agent should also predict internally generated signals regarding its own learning process - for example, an agent's confidence in its learned predictions. Finally, we suggest how such information would be beneficial in creating an introspective agent that is able to learn to make good decisions in a complex, changing world. After an earthquake, disaster sites pose a multitude of health and safety concerns. A rescue operation of people trapped in the ruins after an earthquake disaster requires a series of intelligent behavior, including planning. For a successful rescue operation, given a limited number of available actions and regulations, the role of planning in rescue operations is crucial. Fortunately, recent developments in automated planning by artificial intelligence community can help different organization in this crucial task. Due to the number of rules and regulations, we believe that a rule based system for planning can be helpful for this specific planning problem. In this research work, we use logic rules to represent rescue and related regular regulations, together with a logic based planner to solve this complicated problem. Although this research is still in the prototyping and modeling stage, it clearly shows that rule based languages can be a good infrastructure for this computational task. The results of this research can be used by different organizations, such as Iranian Red Crescent Society and International Institute of Seismology and Earthquake Engineering (IISEE). Handwriting is a skill learned by humans from a very early age. The ability to develop one's own unique handwriting as well as mimic another person's handwriting is a task learned by the brain with practice. This paper deals with this very problem where an intelligent system tries to learn the handwriting of an entity using Generative Adversarial Networks (GANs). We propose a modified architecture of DCGAN (Radford, Metz, and Chintala 2015) to achieve this. We also discuss about applying reinforcement learning techniques to achieve faster learning. Our algorithm hopes to give new insights in this area and its uses include identification of forged documents, signature verification, computer generated art, digitization of documents among others. Our early implementation of the algorithm illustrates a good performance with MNIST datasets. Reasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. Our model takes graphs as input, performs object- and relation-centric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks. We evaluate its ability to reason about several challenging physical domains: n-body problems, rigid-body collision, and non-rigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations. Our interaction network implementation is the first general-purpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex real-world domains. Communicating and sharing intelligence among agents is an important facet of achieving Artificial General Intelligence. As a first step towards this challenge, we introduce a novel framework for image generation: Message Passing Multi-Agent Generative Adversarial Networks (MPM GANs). While GANs have recently been shown to be very effective for image generation and other tasks, these networks have been limited to mostly single generator-discriminator networks. We show that we can obtain multi-agent GANs that communicate through message passing to achieve better image generation. The objectives of the individual agents in this framework are two fold: a co-operation objective and a competing objective. The co-operation objective ensures that the message sharing mechanism guides the other generator to generate better than itself while the competing objective encourages each generator to generate better than its counterpart. We analyze and visualize the messages that these GANs share among themselves in various scenarios. We quantitatively show that the message sharing formulation serves as a regularizer for the adversarial training. Qualitatively, we show that the different generators capture different traits of the underlying data distribution. Intelligent systems capable of automatically understanding natural language text are important for many artificial intelligence applications including mobile phone voice assistants, computer vision, and robotics. Understanding language often constitutes fitting new information into a previously acquired view of the world. However, many machine reading systems rely on the text alone to infer its meaning. In this paper, we pursue a different approach; machine reading methods that make use of background knowledge to facilitate language understanding. To this end, we have developed two methods: The first method addresses prepositional phrase attachment ambiguity. It uses background knowledge within a semi-supervised machine learning algorithm that learns from both labeled and unlabeled data. This approach yields state-of-the-art results on two datasets against strong baselines; The second method extracts relationships from compound nouns. Our knowledge-aware method for compound noun analysis accurately extracts relationships and significantly outperforms a baseline that does not make use of background knowledge. A recurring topic in interstellar exploration and the search for extraterrestrial intelligence (SETI) is the role of artificial intelligence. More precisely, these are programs or devices that are capable of performing cognitive tasks that have been previously associated with humans such as image recognition, reasoning, decision-making etc. Such systems are likely to play an important role in future deep space missions, notably interstellar exploration, where the spacecraft needs to act autonomously. This article explores the drivers for an interstellar mission with a computation-heavy payload and provides an outline of a spacecraft and mission architecture that supports such a payload. Based on existing technologies and extrapolations of current trends, it is shown that AI spacecraft development and operation will be constrained and driven by three aspects: power requirements for the payload, power generation capabilities, and heat rejection capabilities. A likely mission architecture for such a probe is to get into an orbit close to the star in order to generate maximum power for computational activities, and then to prepare for further exploration activities. Given current levels of increase in computational power, such a payload with a similar computational power as the human brain would have a mass of hundreds to dozens of tons in a 2050 - 2060 timeframe. The explosive increase in number of smart devices hosting sophisticated applications is rapidly affecting the landscape of information communication technology industry. Mobile subscriptions, expected to reach 8.9 billion by 2022, would drastically increase the demand of extra capacity with aggregate throughput anticipated to be enhanced by a factor of 1000. In an already crowded radio spectrum, it becomes increasingly difficult to meet ever growing application demands of wireless bandwidth. It has been shown that the allocated spectrum is seldom utilized by the primary users and hence contains spectrum holes that may be exploited by the unlicensed users for their communication. As we enter the Internet Of Things (IoT) era in which appliances of common use will become smart digital devices with rigid performance requirements (such as low latency, energy efficiency, etc.), current networks face the vexing problem of how to create sufficient capacity for such applications. The fifth generation of cellular networks (5G) envisioned to address these challenges are thus required to incorporate cognition and intelligence to resolve the aforementioned issues. Evolution has resulted in highly developed abilities in many natural intelligences to quickly and accurately predict mechanical phenomena. Humans have successfully developed laws of physics to abstract and model such mechanical phenomena. In the context of artificial intelligence, a recent line of work has focused on estimating physical parameters based on sensory data and use them in physical simulators to make long-term predictions. In contrast, we investigate the effectiveness of a single neural network for end-to-end long-term prediction of mechanical phenomena. Based on extensive evaluation, we demonstrate that such networks can outperform alternate approaches having even access to ground-truth physical simulators, especially when some physical parameters are unobserved or not known a-priori. Further, our network outputs a distribution of outcomes to capture the inherent uncertainty in the data. Our approach demonstrates for the first time the possibility of making actionable long-term predictions from sensor data without requiring to explicitly model the underlying physical laws. The real-time strategy game StarCraft has proven to be a challenging environment for artificial intelligence techniques, and as a result, current state-of-the-art solutions consist of numerous hand-crafted modules. In this paper, we show how macromanagement decisions in StarCraft can be learned directly from game replays using deep learning. Neural networks are trained on 789,571 state-action pairs extracted from 2,005 replays of highly skilled players, achieving top-1 and top-3 error rates of 54.6% and 22.9% in predicting the next build action. By integrating the trained network into UAlbertaBot, an open source StarCraft bot, the system can significantly outperform the game's built-in Terran bot, and play competitively against UAlbertaBot with a fixed rush strategy. To our knowledge, this is the first time macromanagement tasks are learned directly from replays in StarCraft. While the best hand-crafted strategies are still the state-of-the-art, the deep network approach is able to express a wide range of different strategies and thus improving the network's performance further with deep reinforcement learning is an immediately promising avenue for future research. Ultimately this approach could lead to strong StarCraft bots that are less reliant on hard-coded strategies. Artificial General Intelligence (AGI) or Strong AI aims to create machines with human-like or human-level intelligence, which is still a very ambitious goal when compared to the existing computing and AI systems. After many hype cycles and lessons from AI history, it is clear that a big conceptual leap is needed for crossing the starting line to kick-start mainstream AGI research. This position paper aims to make a small conceptual contribution toward reaching that starting line. After a broad analysis of the AGI problem from different perspectives, a system-theoretic and engineering-based research approach is introduced, which builds upon the existing mainstream AI and systems foundations. Several promising cross-fertilization opportunities between systems disciplines and AI research are identified. Specific potential research directions are discussed. The General AI Challenge is an initiative to encourage the wider artificial intelligence community to focus on important problems in building intelligent machines with more general scope than is currently possible. The challenge comprises of multiple rounds, with the first round focusing on gradual learning, i.e. the ability to re-use already learned knowledge for efficiently learning to solve subsequent problems. In this article, we will present details of the first round of the challenge, its inspiration and aims. We also outline a more formal description of the challenge and present a preliminary analysis of its curriculum, based on ideas from computational mechanics. We believe, that such formalism will allow for a more principled approach towards investigating tasks in the challenge, building new curricula and for potentially improving consequent challenge rounds. Data science and machine learning are the key technologies when it comes to the processes and products with automatic learning and optimization to be used in the automotive industry of the future. This article defines the terms "data science" (also referred to as "data analytics") and "machine learning" and how they are related. In addition, it defines the term "optimizing analytics" and illustrates the role of automatic optimization as a key technology in combination with data analytics. It also uses examples to explain the way that these technologies are currently being used in the automotive industry on the basis of the major subprocesses in the automotive value chain (development, procurement; logistics, production, marketing, sales and after-sales, connected customer). Since the industry is just starting to explore the broad range of potential uses for these technologies, visionary application examples are used to illustrate the revolutionary possibilities that they offer. Finally, the article demonstrates how these technologies can make the automotive industry more efficient and enhance its customer focus throughout all its operations and activities, extending from the product and its development process to the customers and their connection to the product. Novel user interfaces based on artificial intelligence, such as natural-language agents, present new categories of engineering challenges. These systems need to cope with uncertainty and ambiguity, interface with machine learning algorithms, and compose information from multiple users to make decisions. We propose to treat these challenges as language-design problems. We describe three programming language abstractions for three core problems in intelligent system design. First, hypothetical worlds support nondeterministic search over spaces of alternative actions. Second, a feature type system abstracts the interaction between applications and learning algorithms. Finally, constructs for collaborative execution extend hypothetical worlds across multiple machines while controlling access to private data. We envision these features as first steps toward a complete language for implementing AI-based interfaces and applications. Deep reinforcement learning is revolutionizing the artificial intelligence field. Currently, it serves as a good starting point for constructing intelligent autonomous systems which offer a better knowledge of the visual world. It is possible to scale deep reinforcement learning with the use of deep learning and do amazing tasks such as use of pixels in playing video games. In this paper, key concepts of deep reinforcement learning including reward function, differences between reinforcement learning and supervised learning and models for implementation of reinforcement are discussed. Key challenges related to the implementation of reinforcement learning in conversational AI domain are identified as well as discussed in detail. Various conversational models which are based on deep reinforcement learning (as well as deep learning) are also discussed. In summary, this paper discusses key aspects of deep reinforcement learning which are crucial for designing an efficient conversational AI. An Oracle is a design for potentially high power artificial intelligences (AIs), where the AI is made safe by restricting it to only answer questions. Unfortunately most designs cause the Oracle to be motivated to manipulate humans with the contents of their answers, and Oracles of potentially high intelligence might be very successful at this. Solving that problem, without compromising the accuracy of the answer, is tricky. This paper reduces the issue to a cryptographic-style problem of Alice ensuring that her Oracle answers her questions while not providing key information to an eavesdropping Eve. Two Oracle designs solve this problem, one counterfactual (the Oracle answers as if it expected its answer to never be read) and one on-policy, but limited by the quantity of information it can transmit. Cooperative multi-agent planning (MAP) is a relatively recent research field that combines technologies, algorithms and techniques developed by the Artificial Intelligence Planning and Multi-Agent Systems communities. While planning has been generally treated as a single-agent task, MAP generalizes this concept by considering multiple intelligent agents that work cooperatively to develop a course of action that satisfies the goals of the group. This paper reviews the most relevant approaches to MAP, putting the focus on the solvers that took part in the 2015 Competition of Distributed and Multi-Agent Planning, and classifies them according to their key features and relative performance. We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents' optimal polices, but more importantly, the observation and understanding of individual agent's behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch. In recent years, deep learning has made tremendous progress in a number of fields that were previously out of reach for artificial intelligence. The successes in these problems has led researchers to consider the possibilities for intelligent systems to tackle a problem that humans have only recently themselves considered: program synthesis. This challenge is unlike others such as object recognition and speech translation, since its abstract nature and demand for rigor make it difficult even for human minds to attempt. While it is still far from being solved or even competitive with most existing methods, neural program synthesis is a rapidly growing discipline which holds great promise if completely realized. In this paper, we start with exploring the problem statement and challenges of program synthesis. Then, we examine the fascinating evolution of program induction models, along with how they have succeeded, failed and been reimagined since. Finally, we conclude with a contrastive look at program synthesis and future research recommendations for the field. Ethics and safety research in artificial intelligence is increasingly framed in terms of "alignment" with human values and interests. I argue that Turing's call for "fair play for machines" is an early and often overlooked contribution to the alignment literature. Turing's appeal to fair play suggests a need to correct human behavior to accommodate our machines, a surprising inversion of how value alignment is treated today. Reflections on "fair play" motivate a novel interpretation of Turing's notorious "imitation game" as a condition not of intelligence but instead of value alignment: a machine demonstrates a minimal degree of alignment (with the norms of conversation, for instance) when it can go undetected when interrogated by a human. I carefully distinguish this interpretation from the Moral Turing Test, which is not motivated by a principle of fair play, but instead depends on imitation of human moral behavior. Finally, I consider how the framework of fair play can be used to situate the debate over robot rights within the alignment literature. I argue that extending rights to service robots operating in public spaces is "fair" in precisely the sense that it encourages an alignment of interests between humans and machines. This paper presents an overview of the sixth AIBIRDS competition, held at the 26th International Joint Conference on Artificial Intelligence. This competition tasked participants with developing an intelligent agent which can play the physics-based puzzle game Angry Birds. This game uses a sophisticated physics engine that requires agents to reason and predict the outcome of actions with only limited environmental information. Agents entered into this competition were required to solve a wide assortment of previously unseen levels within a set time limit. The physical reasoning and planning required to solve these levels are very similar to those of many real-world problems. This year's competition featured some of the best agents developed so far and even included several new AI techniques such as deep reinforcement learning. Within this paper we describe the framework, rules, submitted agents and results for this competition. We also provide some background information on related work and other video game AI competitions, as well as discussing some potential ideas for future AIBIRDS competitions and agent improvements. The ability to predict the intentions of people based solely on their visual actions is a skill only performed by humans and animals. The intelligence of current computer algorithms has not reached this level of complexity, but there are several research efforts that are working towards it. With the number of classification algorithms available, it is hard to determine which algorithm works best for a particular situation. In classification of visual human intent data, Hidden Markov Models (HMM), and their variants, are leading candidates. The inability of HMMs to provide a probability in the observation to observation linkages is a big downfall in this classification technique. If a person is visually identifying an action of another person, they monitor patterns in the observations. By estimating the next observation, people have the ability to summarize the actions, and thus determine, with pretty good accuracy, the intention of the person performing the action. These visual cues and linkages are important in creating intelligent algorithms for determining human actions based on visual observations. The Evidence Feed Forward Hidden Markov Model is a newly developed algorithm which provides observation to observation linkages. The following research addresses the theory behind Evidence Feed Forward HMMs, provides mathematical proofs of their learning of these parameters to optimize the likelihood of observations with a Evidence Feed Forwards HMM, which is important in all computational intelligence algorithm, and gives comparative examples with standard HMMs in classification of both visual action data and measurement data; thus providing a strong base for Evidence Feed Forward HMMs in classification of many types of problems. Social Robot Lumen is an Artificial Intelligence development project that aims to create an Artificial Intelligence (AI) which allows a humanoid robot to communicate with human being naturally. In this study, Lumen will be developed to be a tour guide in Electrical Engineering Days 2015 exhibition. In developing an AI, there are a lot of modules that need to be developed separately. To make the development easier, we need a computational platform which becomes basis for all developers to give easiness in developing the modules in parallel way. That computational platform that developed by the writer is called Lumen Server. Lumen Server has two main function, which are to be a bridge between all Lumen intelligence modules with NAO robot, and to be the communication bridge between those Lumen intelligence modules. For the second function, Lumen Server implements the AMQP protocol using RabbitMQ. Besides that, writer also developed a control system for robot movement called Lumen Motion. Lumen motion is implemented by modelling the movement of NAO robot and also by creating a control system using fuzzy logic controller. Writer also developed a program that connects all Lumen intelligence modules so that Lumen can act like a tour guide. The implementation of this program uses FSM and event-driven program. From implementation result, all the features which were designed are successfully implemented. By the developing of this computational platform, it can ease the development of Lumen in the future. For next development, it must be focused on creating integration system so that Lumen can be more responsive to the environment. ----- Sosial Robot Lumen adalah proyek pengembangan kecerdasan buatan yang bertujuan untuk menciptakan kecerdasan buatan atau artificial intelligence (AI) yang memungkinkan robot untuk dapat berkomunikasi dengan manusia secara alami. Market price systems constitute a well-understood class of mechanisms that under certain conditions provide effective decentralization of decision making with minimal communication overhead. In a market-oriented programming approach to distributed problem solving, we derive the activities and resource allocations for a set of computational agents by computing the competitive equilibrium of an artificial economy. WALRAS provides basic constructs for defining computational market structures, and protocols for deriving their corresponding price equilibria. In a particular realization of this approach for a form of multicommodity flow problem, we see that careful construction of the decision process according to economic principles can lead to efficient distributed resource allocation, and that the behavior of the system can be meaningfully analyzed in economic terms. Learning the past tense of English verbs - a seemingly minor aspect of language acquisition - has generated heated debates since 1986, and has become a landmark task for testing the adequacy of cognitive modeling. Several artificial neural networks (ANNs) have been implemented, and a challenge for better symbolic models has been posed. In this paper, we present a general-purpose Symbolic Pattern Associator (SPA) based upon the decision-tree learning algorithm ID3. We conduct extensive head-to-head comparisons on the generalization ability between ANN models and the SPA under different representations. We conclude that the SPA generalizes the past tense of unseen verbs better than ANN models by a wide margin, and we offer insights as to why this should be the case. We also discuss a new default strategy for decision-tree learning algorithms. We report on a series of experiments in which all decision trees consistent with the training data are constructed. These experiments were run to gain an understanding of the properties of the set of consistent decision trees and the factors that affect the accuracy of individual trees. In particular, we investigated the relationship between the size of a decision tree consistent with some training data and the accuracy of the tree on test data. The experiments were performed on a massively parallel Maspar computer. The results of the experiments on several artificial and two real world problems indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees. This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees. In this paper we re-investigate windowing for rule learning algorithms. We show that, contrary to previous results for decision tree learning, windowing can in fact achieve significant run-time gains in noise-free domains and explain the different behavior of rule learning algorithms by the fact that they learn each rule independently. The main contribution of this paper is integrative windowing, a new type of algorithm that further exploits this property by integrating good rules into the final theory right after they have been discovered. Thus it avoids re-learning these rules in subsequent iterations of the windowing process. Experimental evidence in a variety of noise-free domains shows that integrative windowing can in fact achieve substantial run-time gains. Furthermore, we discuss the problem of noise in windowing and present an algorithm that is able to achieve run-time gains in a set of experiments in a simple domain with artificial noise. Evolutionary artificial neural networks (EANNs) refer to a special class of artificial neural networks (ANNs) in which evolution is another fundamental form of adaptation in addition to learning. Evolutionary algorithms are used to adapt the connection weights, network architecture and learning algorithms according to the problem environment. Even though evolutionary algorithms are well known as efficient global search algorithms, very often they miss the best local solutions in the complex solution space. In this paper, we propose a hybrid meta-heuristic learning approach combining evolutionary learning and local search methods (using 1st and 2nd order error information) to improve the learning and faster convergence obtained using a direct evolutionary approach. The proposed technique is tested on three different chaotic time series and the test results are compared with some popular neuro-fuzzy systems and a recently developed cutting angle method of global optimization. Empirical results reveal that the proposed technique is efficient in spite of the computational complexity. Imagine a "machine" where there is no pre-commitment to any particular representational scheme: the desired behaviour is distributed and roughly specified simultaneously among many parts, but there is minimal specification of the mechanism required to generate that behaviour, i.e. the global behaviour evolves from the many relations of multiple simple behaviours. A machine that lives to and from/with Synergy. An artificial super-organism that avoids specific constraints and emerges within multiple low-level implicit bio-inspired mechanisms. KEYWORDS: Complex Science, ArtSBots Project, Swarm Intelligence, Stigmergy, UnManned Art, Symbiotic Art, Swarm Paintings, Robot Paintings, Non-Human Art, Painting Emergence and Cooperation, Art and Complexity, ArtBots: The Robot Talent Show. Over the last few years, more and more heuristic decision making techniques have been inspired by nature, e.g. evolutionary algorithms, ant colony optimisation and simulated annealing. More recently, a novel computational intelligence technique inspired by immunology has emerged, called Artificial Immune Systems (AIS). This immune system inspired technique has already been useful in solving some computational problems. In this keynote, we will very briefly describe the immune system metaphors that are relevant to AIS. We will then give some illustrative real-world problems suitable for AIS use and show a step-by-step algorithm walkthrough. A comparison of AIS to other well-known algorithms and areas for future work will round this keynote off. It should be noted that as AIS is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from the examples given here. This paper investigates a new method for improving the learning algorithm of Mixture of Experts (ME) model using a hybrid of Modified Cuckoo Search (MCS) and Conjugate Gradient (CG) as a second order optimization technique. The CG technique is combined with Back-Propagation (BP) algorithm to yield a much more efficient learning algorithm for ME structure. In addition, the experts and gating networks in enhanced model are replaced by CG based Multi-Layer Perceptrons (MLPs) to provide faster and more accurate learning. The CG is considerably depends on initial weights of connections of Artificial Neural Network (ANN), so, a metaheuristic algorithm, the so-called Modified Cuckoo Search is applied in order to select the optimal weights. The performance of proposed method is compared with Gradient Decent Based ME (GDME) and Conjugate Gradient Based ME (CGME) in classification and regression problems. The experimental results show that hybrid MSC and CG based ME (MCS-CGME) has faster convergence and better performance in utilized benchmark data sets. Essentially, motive behind using control system is to generate suitable control signal for yielding desired response of a physical process. Control of synchronous generator has always remained very critical in power system operation and control. For certain well known reasons power generators are normally operated well below their steady state stability limit. This raises demand for efficient and fast controllers. Artificial intelligence has been reported to give revolutionary outcomes in the field of control engineering. Artificial Neural Network (ANN), a branch of artificial intelligence has been used for nonlinear and adaptive control, utilizing its inherent observability. The overall performance of neurocontroller is dependent upon input features too. Selecting optimum features to train a neurocontroller optimally is very critical. Both quality and size of data are of equal importance for better performance. In this work filter technique is employed to select independent factors for ANN training. A central task of artificial intelligence is the design of artificial agents that act towards specified goals in partially observed environments. Since such environments frequently include interaction over time with other agents with their own goals, reasoning about such interaction relies on sequential game-theoretic models such as extensive-form games or some of their succinct representations such as multi-agent influence diagrams. The current algorithms for calculating equilibria either work with inefficient representations, possibly doubly exponential inthe number of time steps, or place strong assumptions on the game structure. In this paper,we propose a sampling-based approach, which calculates extensive-form correlated equilibria with small representations without placing such strong assumptions. Thus, it is practical in situations where the previous approaches would fail. In addition, our algorithm allows control over characteristics of the target equilibrium, e.g., we can ask for an equilibrium with high social welfare. Our approach is based on a multiplicativeweight update algorithm analogous to AdaBoost, and Markov chain Monte Carlo sampling. We prove convergence guarantees and explore the utility of our approach on several moderately sized multi-player games. Artificial bee colony (ABC), an optimization algorithm is a recent addition to the family of population based search algorithm. ABC has taken its inspiration from the collective intelligent foraging behavior of honey bees. In this study we have incorporated golden section search mechanism in the structure of basic ABC to improve the global convergence and prevent to stick on a local solution. The proposed variant is termed as ILS-ABC. Comparative numerical results with the state-of-art algorithms show the performance of the proposal when applied to the set of unconstrained engineering design problems. The simulated results show that the proposed variant can be successfully applied to solve real life problems. Renewable and sustainable energy is one of the most important challenges currently facing mankind. Wind has made an increasing contribution to the world's energy supply mix, but still remains a long way from reaching its full potential. In this paper, we investigate the use of artificial evolution to design vertical-axis wind turbine prototypes that are physically instantiated and evaluated under approximated wind tunnel conditions. An artificial neural network is used as a surrogate model to assist learning and found to reduce the number of fabrications required to reach a higher aerodynamic efficiency, resulting in an important cost reduction. Unlike in other approaches, such as computational fluid dynamics simulations, no mathematical formulations are used and no model assumptions are made. A novel method for estimating Bayesian network (BN) parameters from data is presented which provides improved performance on test data. Previous research has shown the value of representing conditional probability distributions (CPDs) via neural networks(Neal 1992), noisy-OR gates (Neal 1992, Diez 1993)and decision trees (Friedman and Goldszmidt 1996).The Bernoulli mixture network (BMN) explicitly represents the CPDs of discrete BN nodes as mixtures of local distributions,each having a different set of parents.This increases the space of possible structures which can be considered,enabling the CPDs to have finer-grained dependencies.The resulting estimation procedure induces a modelthat is better able to emulate the underlying interactions occurring in the data than conventional conditional Bernoulli network models.The results for artificially generated data indicate that overfitting is best reduced by restricting the complexity of candidate mixture substructures local to each node. Furthermore, mixtures of very simple substructures can perform almost as well as more complex ones.The BMN is also applied to data collected from an online adventure game with an application to keyhole plan recognition. The results show that the BMN-based model brings a dramatic improvement in performance over a conventional BN model. Recent developments show that Multiply Sectioned Bayesian Networks (MSBNs) can be used for diagnosis of natural systems as well as for model-based diagnosis of artificial systems. They can be applied to single-agent oriented reasoning systems as well as multi-agent distributed probabilistic reasoning systems. Belief propagation between a pair of subnets plays a central role in maintenance of global consistency in a MSBN. This paper studies the operation UpdateBelief, presented originally with MSBNs, for inter-subnet propagation. We analyze how the operation achieves its intended functionality, which provides hints as for how its efficiency can be improved. We then define two new versions of UpdateBelief that reduce the computation time for inter-subnet propagation. One of them is optimal in the sense that the minimum amount of computation for coordinating multi-linkage belief propagation is required. The optimization problem is solved through the solution of a graph-theoretic problem: the minimum weight open tour in a tree. In this paper, we present structured message passing (SMP), a unifying framework for approximate inference algorithms that take advantage of structured representations such as algebraic decision diagrams and sparse hash tables. These representations can yield significant time and space savings over the conventional tabular representation when the message has several identical values (context-specific independence) or zeros (determinism) or both in its range. Therefore, in order to fully exploit the power of structured representations, we propose to artificially introduce context-specific independence and determinism in the messages. This yields a new class of powerful approximate inference algorithms which includes popular algorithms such as cluster-graph Belief propagation (BP), expectation propagation and particle BP as special cases. We show that our new algorithms introduce several interesting bias-variance trade-offs. We evaluate these trade-offs empirically and demonstrate that our new algorithms are more accurate and scalable than state-of-the-art techniques. Consider a collection of weighted subsets of a ground set N. Given a query subset Q of N, how fast can one (1) find the weighted sum over all subsets of Q, and (2) sample a subset of Q proportionally to the weights? We present a tree-based greedy heuristic, Treedy, that for a given positive tolerance d answers such counting and sampling queries to within a guaranteed relative error d and total variation distance d, respectively. Experimental results on artificial instances and in application to Bayesian structure discovery in Bayesian networks show that approximations yield dramatic savings in running time compared to exact computation, and that Treedy typically outperforms a previously proposed sorting-based heuristic. The problem of similarity search is one of the main problems in computer science. This problem has many applications in text-retrieval, web search, computational biology, bioinformatics and others. Similarity between two data objects can be depicted using a similarity measure or a distance metric. There are numerous distance metrics in the literature, some are used for a particular data type, and others are more general. In this paper we present a new distance metric for sequential data which is based on the sum of n-grams. The novelty of our distance is that these n-grams are weighted using artificial bee colony; a recent optimization algorithm based on the collective intelligence of a swarm of bees on their search for nectar. This algorithm has been used in optimizing a large number of numerical problems. We validate the new distance experimentally. Artificial bee colony (ABC) algorithm has proved its importance in solving a number of problems including engineering optimization problems. ABC algorithm is one of the most popular and youngest member of the family of population based nature inspired meta-heuristic swarm intelligence method. ABC has been proved its superiority over some other Nature Inspired Algorithms (NIA) when applied for both benchmark functions and real world problems. The performance of search process of ABC depends on a random value which tries to balance exploration and exploitation phase. In order to increase the performance it is required to balance the exploration of search space and exploitation of optimal solution of the ABC. This paper outlines a new hybrid of ABC algorithm with Genetic Algorithm. The proposed method integrates crossover operation from Genetic Algorithm (GA) with original ABC algorithm. The proposed method is named as Crossover based ABC (CbABC). The CbABC strengthens the exploitation phase of ABC as crossover enhances exploration of search space. The CbABC tested over four standard benchmark functions and a popular continuous optimization problem. There is a consensus that human and non-human subjects experience temporal distortions in many stages of their perceptual and decision-making systems. Similarly, intertemporal choice research has shown that decision-makers undervalue future outcomes relative to immediate ones. Here we combine techniques from information theory and artificial intelligence to show how both temporal distortions and intertemporal choice preferences can be explained as a consequence of the coding efficiency of sensorimotor representation. In particular, the model implies that interactions that constrain future behavior are perceived as being both longer in duration and more valuable. Furthermore, using simulations of artificial agents, we investigate how memory constraints enforce a renormalization of the perceived timescales. Our results show that qualitatively different discount functions, such as exponential and hyperbolic discounting, arise as a consequence of an agent's probabilistic model of the world. Recently, the long short-term memory neural network (LSTM) has attracted wide interest due to its success in many tasks. LSTM architecture consists of a memory cell and three gates, which looks similar to the neuronal networks in the brain. However, there still lacks the evidence of the cognitive plausibility of LSTM architecture as well as its working mechanism. In this paper, we study the cognitive plausibility of LSTM by aligning its internal architecture with the brain activity observed via fMRI when the subjects read a story. Experiment results show that the artificial memory vector in LSTM can accurately predict the observed sequential brain activities, indicating the correlation between LSTM architecture and the cognitive process of story reading. Using an improved backtrack algorithm with sophisticated pruning techniques, we revise previous observations correlating a high frequency of hard to solve Hamiltonian Cycle instances with the Gn,m phase transition between Hamiltonicity and non-Hamiltonicity. Instead all tested graphs of 100 to 1500 vertices are easily solved. When we artificially restrict the degree sequence with a bounded maximum degree, although there is some increase in difficulty, the frequency of hard graphs is still low. When we consider more regular graphs based on a generalization of knight's tours, we observe frequent instances of really hard graphs, but on these the average degree is bounded by a constant. We design a set of graphs with a feature our algorithm is unable to detect and so are very hard for our algorithm, but in these we can vary the average degree from O(1) to O(n). We have so far found no class of graphs correlated with the Gn,m phase transition which asymptotically produces a high frequency of hard instances. This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six state-of-the-art routing algorithms coming from the telecommunications and machine learning fields. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and artificial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal traffic distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority. Nowadays, computer scientists have shown the interest in the study of social insect's behaviour in neural networks area for solving different combinatorial and statistical problems. Chief among these is the Artificial Bee Colony (ABC) algorithm. This paper investigates the use of ABC algorithm that simulates the intelligent foraging behaviour of a honey bee swarm. Multilayer Perceptron (MLP) trained with the standard back propagation algorithm normally utilises computationally intensive training algorithms. One of the crucial problems with the backpropagation (BP) algorithm is that it can sometimes yield the networks with suboptimal weights because of the presence of many local optima in the solution space. To overcome ABC algorithm used in this work to train MLP learning the complex behaviour of earthquake time series data trained by BP, the performance of MLP-ABC is benchmarked against MLP training with the standard BP. The experimental result shows that MLP-ABC performance is better than MLP-BP for time series data. Games have always been a popular test bed for artificial intelligence techniques. Game developers are always in constant search for techniques that can automatically create computer games minimizing the developer's task. In this work we present an evolutionary strategy based solution towards the automatic generation of two player board games. To guide the evolutionary process towards games, which are entertaining, we propose a set of metrics. These metrics are based upon different theories of entertainment in computer games. This work also compares the entertainment value of the evolved games with the existing popular board based games. Further to verify the entertainment value of the evolved games with the entertainment value of the human user a human user survey is conducted. In addition to the user survey we check the learnability of the evolved games using an artificial neural network based controller. The proposed metrics and the evolutionary process can be employed for generating new and entertaining board games, provided an initial search space is given to the evolutionary algorithm. DeepMind Lab is a first-person 3D game platform designed for research and development of general artificial intelligence and machine learning systems. DeepMind Lab can be used to study how autonomous artificial agents may learn complex tasks in large, partially observed, and visually diverse worlds. DeepMind Lab has a simple and flexible API enabling creative task-designs and novel AI-designs to be explored and quickly iterated upon. It is powered by a fast and widely recognised game engine, and tailored for effective use by the research community. Question Answering System (QAS) is used for information retrieval and natural language processing (NLP) to reduce human effort. There are numerous QAS based on the user documents present today, but they all are limited to providing objective answers and process simple questions only. Complex questions cannot be answered by the existing QAS, as they require interpretation of the current and old data as well as the question asked by the user. The above limitations can be overcome by using deep cases and neural network. Hence we propose a modified QAS in which we create a deep artificial neural network with associative memory from text documents. The modified QAS processes the contents of the text document provided to it and find the answer to even complex questions in the documents. Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables better learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful. Several types of sensors have been available in off-the-shelf mobile devices, including motion, magnetic, vision, acoustic, and location sensors. This paper focuses on the fusion of the data acquired from motion and magnetic sensors, i.e., accelerometer, gyroscope and magnetometer sensors, for the recognition of Activities of Daily Living (ADL) using pattern recognition techniques. The system developed in this study includes data acquisition, data processing, data fusion, and artificial intelligence methods. Artificial Neural Networks (ANN) are included in artificial intelligence methods, which are used in this study for the recognition of ADL. The purpose of this study is the creation of a new method using ANN for the identification of ADL, comparing three types of ANN, in order to achieve results with a reliable accuracy. The best accuracy was obtained with Deep Learning, which, after the application of the L2 regularization and normalization techniques on the sensors data, reports an accuracy of 89.51%. There currently exists a wide range of techniques to model and evolve artificial players for games. Existing techniques range from black box neural networks to entirely hand-designed solutions. In this paper, we demonstrate the feasibility of a genetic programming framework using human controller input to derive meaningful artificial players which can, later on, be optimised by hand. The current state of the art in game character design relies heavily on human designers to manually create and edit scripts and rules for game characters. To address this manual editing bottleneck, current computational intelligence techniques approach the issue with fully autonomous character generators, replacing most of the design process using black box solutions such as neural networks or the like. Our GP approach to this problem creates character controllers which can be further authored and developed by a designer it also offers designers to included their play style without the need to use a programming language. This keeps the designer in the loop while reducing repetitive manual labour. Our system also provides insights into how players express themselves in games and into deriving appropriate models for representing those insights. We present our framework, supporting findings and open challenges. A vexing problem in artificial intelligence is reasoning about events that occur in complex, changing visual stimuli such as in video analysis or game play. Inspired by a rich tradition of visual reasoning and memory in cognitive psychology and neuroscience, we developed an artificial, configurable visual question and answer dataset (COG) to parallel experiments in humans and animals. COG is much simpler than the general problem of video analysis, yet it addresses many of the problems relating to visual and logical reasoning and memory -- problems that remain challenging for modern deep learning architectures. We additionally propose a deep learning architecture that performs competitively on other diagnostic VQA datasets (i.e. CLEVR) as well as easy settings of the COG dataset. However, several settings of COG result in datasets that are progressively more challenging to learn. After training, the network can zero-shot generalize to many new tasks. Preliminary analyses of the network architectures trained on COG demonstrate that the network accomplishes the task in a manner interpretable to humans. Disease Intelligence (DI) is based on the acquisition and aggregation of fragmented knowledge of diseases at multiple sources all over the world to provide valuable information to doctors, researchers and information seeking community. Some diseases have their own characteristics changed rapidly at different places of the world and are reported on documents as unrelated and heterogeneous information which may be going unnoticed and may not be quickly available. This research presents an Ontology based theoretical framework in the context of medical intelligence and country/region. Ontology is designed for storing information about rapidly spreading and changing diseases with incorporating existing disease taxonomies to genetic information of both humans and infectious organisms. It further maps disease symptoms to diseases and drug effects to disease symptoms. The machine understandable disease ontology represented as a website thus allows the drug effects to be evaluated on disease symptoms and exposes genetic involvements in the human diseases. Infectious agents which have no known place in an existing classification but have data on genetics would still be identified as organisms through the intelligence of this system. It will further facilitate researchers on the subject to try out different solutions for curing diseases. In this paper we define Clinical Data Intelligence as the analysis of data generated in the clinical routine with the goal of improving patient care. We define a science of a Clinical Data Intelligence as a data analysis that permits the derivation of scientific, i.e., generalizable and reliable results. We argue that a science of a Clinical Data Intelligence is sensible in the context of a Big Data analysis, i.e., with data from many patients and with complete patient information. We discuss that Clinical Data Intelligence requires the joint efforts of knowledge engineering, information extraction (from textual and other unstructured data), and statistics and statistical machine learning. We describe some of our main results as conjectures and relate them to a recently funded research project involving two major German university hospitals. Business Intelligence and Analytics (BI&A) is the process of extracting and predicting business-critical insights from data. Traditional BI focused on data collection, extraction, and organization to enable efficient query processing for deriving insights from historical data. With the rise of big data and cloud computing, there are many challenges and opportunities for the BI. Especially with the growing number of data sources, traditional BI\&A are evolving to provide intelligence at different scales and perspectives - operational BI, situational BI, self-service BI. In this survey, we review the evolution of business intelligence systems in full scale from back-end architecture to and front-end applications. We focus on the changes in the back-end architecture that deals with the collection and organization of the data. We also review the changes in the front-end applications, where analytic services and visualization are the core components. Using a uses case from BI in Healthcare, which is one of the most complex enterprises, we show how BI\&A will play an important role beyond the traditional usage. The survey provides a holistic view of Business Intelligence and Analytics for anyone interested in getting a complete picture of the different pieces in the emerging next generation BI\&A solutions. In this paper, we demonstrate the application of Fuzzy Markup Language (FML) to construct an FML-based Dynamic Assessment Agent (FDAA), and we present an FML-based Human-Machine Cooperative System (FHMCS) for the game of Go. The proposed FDAA comprises an intelligent decision-making and learning mechanism, an intelligent game bot, a proximal development agent, and an intelligent agent. The intelligent game bot is based on the open-source code of Facebook Darkforest, and it features a representational state transfer application programming interface mechanism. The proximal development agent contains a dynamic assessment mechanism, a GoSocket mechanism, and an FML engine with a fuzzy knowledge base and rule base. The intelligent agent contains a GoSocket engine and a summarization agent that is based on the estimated win rate, real-time simulation number, and matching degree of predicted moves. Additionally, the FML for player performance evaluation and linguistic descriptions for game results commentary are presented. We experimentally verify and validate the performance of the FDAA and variants of the FHMCS by testing five games in 2016 and 60 games of Google Master Go, a new version of the AlphaGo program, in January 2017. The experimental results demonstrate that the proposed FDAA can work effectively for Go applications. The combination of Artificial Intelligence (AI) and Internet-of-Things (IoT), which is denoted as AI-powered Internet-of-Things (AIoT), is capable of processing huge amount of data generated from a large number of devices and handling complex problems in social infrastructures. As AI and IoT technologies are becoming mature, in this paper, we propose to apply AIoT technologies for traffic light control, which is an essential component for intelligent transportation system, to improve the efficiency of smart city's road system. Specifically, various sensors such as surveillance cameras provide real-time information for intelligent traffic light control system to observe the states of both motorized traffic and non-motorized traffic. In this paper, we propose an intelligent traffic light control solution by using distributed multi-agent Q learning, considering the traffic information at the neighboring intersections as well as local motorized and non-motorized traffic, to improve the overall performance of the entire control system. By using the proposed multi-agent Q learning algorithm, our solution is targeting to optimize both the motorized and non-motorized traffic. In addition, we considered many constraints/rules for traffic light control in the real world, and integrate these constraints in the learning algorithm, which can facilitate the proposed solution to be deployed in real operational scenarios. We conducted numerical simulations for a real-world map with real-world traffic data. The simulation results show that our proposed solution outperforms existing solutions in terms of vehicle and pedestrian queue lengths, waiting time at intersections, and many other key performance metrics. This report describes an initial reference architecture for intelligent software agents performing active, largely autonomous cyber defense actions on military networks of computing and communicating devices. The report is produced by the North Atlantic Treaty Organization (NATO) Research Task Group (RTG) IST-152 "Intelligent Autonomous Agents for Cyber Defense and Resilience". In a conflict with a technically sophisticated adversary, NATO military tactical networks will operate in a heavily contested battlefield. Enemy software cyber agents - malware - will infiltrate friendly networks and attack friendly command, control, communications, computers, intelligence, surveillance, and reconnaissance and computerized weapon systems. To fight them, NATO needs artificial cyber hunters - intelligent, autonomous, mobile agents specialized in active cyber defense. With this in mind, in 2016, NATO initiated RTG IST-152. Its objective is to help accelerate development and transition to practice of such software agents by producing a reference architecture and technical roadmap. This report presents the concept and architecture of an Autonomous Intelligent Cyber Defense Agent (AICA). We describe the rationale of the AICA concept, explain the methodology and purpose that drive the definition of the AICA Reference Architecture, and review some of the main features and challenges of the AICA. Intelligent agents are often faced with the problem of trying to merge possibly conflicting pieces of information obtained from different sources into a consistent view of the world. We propose a framework for the modelling of such merging operations with roots in the work of Spohn (1988, 1991). Unlike most approaches we focus on the merging of epistemic states, not knowledge bases. We construct a number of plausible merging operations and measure them against various properties that merging operations ought to satisfy. Finally, we discuss the connection between merging and the use of infobases Meyer (1999) and Meyer et al. (2000). The goal of the LP+ project at the K.U.Leuven is to design an expressive logic, suitable for declarative knowledge representation, and to develop intelligent systems based on Logic Programming technology for solving computational problems using the declarative specifications. The ID-logic is an integration of typed classical logic and a definition logic. Different abductive solvers for this language are being developed. This paper is a report of the integration of high order aggregates into ID-logic and the consequences on the solver SLDNFA. The aim of this work is to explore new methodologies on Semantic Parsing for unrestricted texts. Our approach follows the current trends in Information Extraction (IE) and is based on the application of a verbal subcategorization lexicon (LEXPIR) by means of complex pattern recognition techniques. LEXPIR is framed on the theoretical model of the verbal subcategorization developed in the Pirapides project. When simultaneously reasoning with evidences about several different events it is necessary to separate the evidence according to event. These events should then be handled independently. However, when propositions of evidences are weakly specified in the sense that it may not be certain to which event they are referring, this may not be directly possible. In this paper a criterion for partitioning evidences into subsets representing events is established. This criterion, derived from the conflict within each subset, involves minimising a criterion function for the overall conflict of the partition. An algorithm based on characteristics of the criterion function and an iterative optimisation among partitionings of evidences is proposed. Within the scope of the three-year ANTI-SUBMARINE WARFARE project of the National Defence Research Establishment, the INFORMATION SYSTEMS subproject has developed the demonstration prototype Dezzy for handling and analysis of intelligence reports concerning foreign underwater activities. ----- Inom ramen f\"or FOA:s tre{\aa}riga huvudprojekt UB{\AA}TSSKYDD har delprojekt INFORMATIONSSYSTEM utvecklat demonstrationsprototypen Dezzy till ett beslutsst\"odsystem f\"or hantering och analys av underr\"attelser om fr\"ammande undervattensverksamhet. We present a flexible rule compiler developed for a text-to-speech (TTS) system. The compiler converts a set of rules into a finite-state transducer (FST). The input and output of the FST are subject to parameterization, so that the system can be applied to strings and sequences of feature-structures. The resulting transducer is guaranteed to realize a function (as opposed to a relation), and therefore can be implemented as a deterministic device (either a deterministic FST or a bimachine). Research concerning organization and coordination within multi-agent systems continues to draw from a variety of architectures and methodologies. The work presented in this paper combines techniques from game theory and multi-agent systems to produce self-organizing, polymorphic, lightweight, embedded agents for systems scheduling within a large-scale real-time systems environment. Results show how this approach is used to experimentally produce optimum real-time scheduling through the emergent behavior of thousands of agents. These results are obtained using a SWARM simulation of systems scheduling within a High Energy Physics experiment consisting of 2500 digital signal processors. In Dempster-Shafer belief theory, general beliefs are expressed as belief mass distribution functions over frames of discernment. In Subjective Logic beliefs are expressed as belief mass distribution functions over binary frames of discernment. Belief representations in Subjective Logic, which are called opinions, also contain a base rate parameter which express the a priori belief in the absence of evidence. Philosophically, beliefs are quantitative representations of evidence as perceived by humans or by other intelligent agents. The basic operators of classical probability calculus, such as addition and multiplication, can be applied to opinions, thereby making belief calculus practical. Through the equivalence between opinions and Beta probability density functions, this also provides a calculus for Beta probability density functions. This article explains the basic elements of belief calculus. Solomonoff's inductive learning model is a powerful, universal and highly elegant theory of sequence prediction. Its critical flaw is that it is incomputable and thus cannot be used in practice. It is sometimes suggested that it may still be useful to help guide the development of very general and powerful theories of prediction which are computable. In this paper it is shown that although powerful algorithms exist, they are necessarily highly complex. This alone makes their theoretical analysis problematic, however it is further shown that beyond a moderate level of complexity the analysis runs into the deeper problem of Goedel incompleteness. This limits the power of mathematics to analyse and study prediction algorithms, and indeed intelligent systems in general. In this report, a novel approach to intelligence and learning is introduced, this approach is based on what we call 'perception logic'. Based on this logic, a computing mechanism and automata are introduced. Multi-resolution analysis of perceptual information is given, in which learning is accomplished in at most O(log(N))epochs, where N is the number of samples, and the convergence is guarnteed. This approach combines the favors of computational modeles in the sense that they are structured and mathematically well-defined, and the adaptivity of soft computing approaches, in addition to the continuity and real-time response of dynamical systems. With the great success in simulating many intelligent behaviors using computing devices, there has been an ongoing debate whether all conscious activities are computational processes. In this paper, the answer to this question is shown to be no. A certain phenomenon of consciousness is demonstrated to be fully represented as a computational process using a quantum computer. Based on the computability criterion discussed with Turing machines, the model constructed is shown to necessarily involve a non-computable element. The concept that this is solely a quantum effect and does not work for a classical case is also discussed. Plasmodium of Physarym polycephalum is an ideal biological substrate for implementing concurrent and parallel computation, including combinatorial geometry and optimization on graphs. We report results of scoping experiments on Physarum computing in conditions of minimal friction, on the water surface. We show that plasmodium of Physarum is capable for computing a basic spanning trees and manipulating of light-weight objects. We speculate that our results pave the pathways towards design and implementation of amorphous biological robots. Purpose: To present an algorithm for spatially sorting objects into an annular structure. Design/Methodology/Approach: A swarm-based model that requires only stochastic agent behaviour coupled with a pheromone-inspired "attraction-repulsion" mechanism. Findings: The algorithm consistently generates high-quality annular structures, and is particularly powerful in situations where the initial configuration of objects is similar to those observed in nature. Research limitations/implications: Experimental evidence supports previous theoretical arguments about the nature and mechanism of spatial sorting by insects. Practical implications: The algorithm may find applications in distributed robotics. Originality/value: The model offers a powerful minimal algorithmic framework, and also sheds further light on the nature of attraction-repulsion algorithms and underlying natural processes. The article presents an approach to interactively solve multi-objective optimization problems. While the identification of efficient solutions is supported by computational intelligence techniques on the basis of local search, the search is directed by partial preference information obtained from the decision maker. An application of the approach to biobjective portfolio optimization, modeled as the well-known knapsack problem, is reported, and experimental results are reported for benchmark instances taken from the literature. In brief, we obtain encouraging results that show the applicability of the approach to the described problem. This document describes the communication language used in one multiagent system environment for ecological simulations, based on EcoDynamo simulator application linked with several intelligent agents and visualisation applications, and extends the initial definition of the language. The agents actions and perceptions are translated into messages exchanged with the simulator application and other agents. The concepts and definitions used follow the BNF notation (Backus et al. 1960) and is inspired in the Coach Unilang language (Reis and Lau 2002). The logic which describes quantum robots is not orthodox quantum logic, but a deductive calculus which reproduces the quantum tasks (computational processes, and actions) taking into account quantum superposition and quantum entanglement. A way toward the realization of intelligent quantum robots is to adopt a quantum metalanguage to control quantum robots. A physical implementation of a quantum metalanguage might be the use of coherent states in brain signals. The paper proposes an analysis on some existent ontologies, in order to point out ways to resolve semantic heterogeneity in information systems. Authors are highlighting the tasks in a Knowledge Acquisiton System and identifying aspects related to the addition of new information to an intelligent system. A solution is proposed, as a combination of ontology reasoning services and natural languages generation. A multi-agent system will be conceived with an extractor agent, a reasoner agent and a competence management agent. This paper describes application of information granulation theory, on the back analysis of Jeffrey mine southeast wall Quebec. In this manner, using a combining of Self Organizing Map (SOM) and rough set theory (RST), crisp and rough granules are obtained. Balancing of crisp granules and sub rough granules is rendered in close-open iteration. Combining of hard and soft computing, namely finite difference method (FDM) and computational intelligence and taking in to account missing information are two main benefits of the proposed method. As a practical example, reverse analysis on the failure of the southeast wall Jeffrey mine is accomplished. Mamdani Fuzzy Model is an important technique in Computational Intelligence (CI) study. This paper presents an implementation of a supervised learning method based on membership function training in the context of Mamdani fuzzy models. Specifically, auto zoom function of a digital camera is modelled using Mamdani technique. The performance of control method is verified through a series of simulation and numerical results are provided as illustrations. A lot of mathematical knowledge has been formalized and stored in repositories by now: different mathematical theorems and theories have been taken into consideration and included in mathematical repositories. Applications more distant from pure mathematics, however --- though based on these theories --- often need more detailed knowledge about the underlying theories. In this paper we present an example Mizar formalization from the area of electrical engineering focusing on stability theory which is based on complex analysis. We discuss what kind of special knowledge is necessary here and which amount of this knowledge is included in existing repositories. Brain-Like Stochastic Search (BLiSS) refers to this task: given a family of utility functions U(u,A), where u is a vector of parameters or task descriptors, maximize or minimize U with respect to u, using networks (Option Nets) which input A and learn to generate good options u stochastically. This paper discusses why this is crucial to brain-like intelligence (an area funded by NSF) and to many applications, and discusses various possibilities for network design and training. The appendix discusses recent research, relations to work on stochastic optimization in operations research, and relations to engineering-based approaches to understanding neocortex. We present in this paper, a modelling of an expertise in pragmatics. We follow knowledge engineering techniques and observe the expert when he analyses a social discussion forum. Then a number of models are defined. These models emphasises the process followed by the expert and a number of criteria used in his analysis. Results can be used as guides that help to understand and annotate discussion forum. We aim at modelling other pragmatics analysis in order to complete the base of guides; criteria, process, etc. of discussion analysis Serious games have recently emerged as an avenue for curriculum delivery. Serious games incorporate motivation and entertainment while providing pointed curriculum for the user. This paper presents a serious game, called MiBoard, currently being developed from the iSTART Intelligent Tutoring System. MiBoard incorporates a multiplayer interaction that iSTART was previously unable to provide. This multiplayer interaction produces a wide variation across game trials, while also increasing the repeat playability for users. This paper presents a demonstration of the MiBoard system and the expectations for its application. In this work, we develop an intelligent user interface that allows users to enter biomedical queries in a natural language, and that presents the answers (possibly with explanations if requested) in a natural language. We develop a rule layer over biomedical ontologies and databases, and use automated reasoners to answer queries considering relevant parts of the rule layer. Intersections constitute one of the most dangerous elements in road systems. Traffic signals remain the most common way to control traffic at high-volume intersections and offer many opportunities to apply intelligent transportation systems to make traffic more efficient and safe. This paper describes an automated method to estimate the temporal exposure of road users crossing the conflict zone to lateral collision with road users originating from a different approach. This component is part of a larger system relying on video sensors to provide queue lengths and spatial occupancy that are used for real time traffic control and monitoring. The method is evaluated on data collected during a real world experiment. Statistics is a uniquely difficult field to convey to the uninitiated. It sits astride the abstract and the concrete, the theoretical and the applied. It has a mathematical flavor and yet it is not simply a branch of mathematics. Its core problems blend into those of the disciplines that probe into the nature of intelligence and thought, in particular philosophy, psychology and artificial intelligence. Debates over foundational issues have waxed and waned, but the field has not yet arrived at a single foundational perspective. One of the first step in the realization of an automatic system of check recognition is the extraction of the handwritten area. We propose in this paper an hybrid method to extract these areas. This method is based on digit recognition by Fourier descriptors and different steps of colored image processing . It requires the bank recognition of its code which is located in the check marking band as well as the handwritten color recognition by the method of difference of histograms. The areas extraction is then carried out by the use of some mathematical morphology tools. This paper introduces an elemental building block which combines Dictionary Learning and Dimension Reduction (DRDL). We show how this foundational element can be used to iteratively construct a Hierarchical Sparse Representation (HSR) of a sensory stream. We compare our approach to existing models showing the generality of our simple prescription. We then perform preliminary experiments using this framework, illustrating with the example of an object recognition task using standard datasets. This work introduces the very first steps towards an integrated framework for designing and analyzing various computational tasks from learning to attention to action. The ultimate goal is building a mathematically rigorous, integrated theory of intelligence. Harmonic theory provides a mathematical framework to describe the structure, behavior, evolution and emergence of harmonic systems. A harmonic system is context aware, contains elements that manifest characteristics either collaboratively or independently according to system's expression and can interact with its environment. This theory provides a fresh way to analyze emergence and collaboration of "ad-hoc" and complex systems. The bioinformatical methods to detect lateral gene transfer events are mainly based on functional coding DNA characteristics. In this paper, we propose the use of DNA traits not depending on protein coding requirements. We introduce several semilocal variables that depend on DNA primary sequence and that reflect thermodynamic as well as physico-chemical magnitudes that are able to tell apart the genome of different organisms. After combining these variables in a neural classificator, we obtain results whose power of resolution go as far as to detect the exchange of genomic material between bacteria that are phylogenetically close. Quality assurance remains a key topic in human computation research. Prior work indicates that majority voting is effective for low difficulty tasks, but has limitations for harder tasks. This paper explores two methods of addressing this problem: tournament selection and elimination selection, which exploit 2-, 3- and 4-way comparisons between different answers to human computation tasks. Our experimental results and statistical analyses show that both methods produce the correct answer in noisy human computation environment more often than majority voting. Furthermore, we find that the use of 4-way comparisons can significantly reduce the cost of quality assurance relative to the use of 2-way comparisons. Partially observable Markov decision processes have been widely used to provide models for real-world decision making problems. In this paper, we will provide a method in which a slightly different version of them called Mixed observability Markov decision process, MOMDP, is going to join with our problem. Basically, we aim at offering a behavioural model for interaction of intelligent agents with musical pitch environment and we will show that how MOMDP can shed some light on building up a decision making model for musical pitch conveniently. Using the recently proposed model of combinatorial landscapes: local optima networks, we study the distribution of local optima in two classes of instances of the quadratic assignment problem. Our results indicate that the two problem instance classes give rise to very different configuration spaces. For the so-called real-like class, the optima networks possess a clear modular structure, while the networks belonging to the class of random uniform instances are less well partitionable into clusters. We briefly discuss the consequences of the findings for heuristically searching the corresponding problem spaces. One of the remarkable feats of intelligent life is that it restructures the world it lives in for its own benefit. This extended abstract outlines how the information-theoretic principle of empowerment, as an intrinsic motivation, can be used to restructure the environment an agent lives in. We present a first qualitative evaluation of how an agent in a 3d-gridworld builds a staircase-like structure, which reflects the agent's embodiment. Reasoning, the most important human brain operation, is charactrized by a degree fuzziness. In the present paper we construct a fuzzy model for the reasoning process giving through the calculation of the possibilities of all possible individuals' profiles a quantitative/qualitative view of their behaviour during the above process and we use the centroid defuzzification technique for measuring the reasoning skills. We also present a number of classroom experiments illustrating our results in practice. For a computational system to be intelligent, it should be able to perform, at least, basic deductions. Nonetheless, since deductions are, in some sense, equivalent to tautologies, it seems that they do not provide new information. The present article proposes a measure the degree of semantic informativity of valid deductions in a dynamic setting. Concepts of coherency and relevancy, displayed in terms of insertions and deletions on databases, are used to define semantic informativity. In this way, the article shows that a solution to the problem about the informativity of deductions provides a heuristic principle to improve the deductive power of computational systems. We present an implementation of E-anti-unification as defined in Heinz (1995), where tree-grammar descriptions of equivalence classes of terms are used to compute generalizations modulo equational theories. We discuss several improvements, including an efficient implementation of variable-restricted E-anti-unification from Heinz (1995), and give some runtime figures about them. We present applications in various areas, including lemma generation in equational inductive proofs, intelligence tests, diverging Knuth-Bendix completion, strengthening of induction hypotheses, and theory formation about finite algebras. The study of collective decision making system has become the central part of the Swarm- Intelligence Related research in recent years. The most challenging task of modelling a collec- tive decision making system is to develop the macroscopic stochastic equation from its microscopic model. In this report we have investigated the behaviour of a collective decision making system with specified microscopic rules that resemble the chemical reaction and used different group size. Then we ventured to derive a generalized analytical model of a collective-decision system using hyper-geometric distribution. Index Terms-swarm; collective decision making; noise; group size; hyper-geometric distribution This chapter discusses the existing and future use of robotics and intelligent sensing technology in mental health care. While the use of this technology is nascent in mental health care, it represents a potentially useful tool in the practitioner's toolbox. The goal of this chapter is to provide a brief overview of the field, discuss the recent use of robotics technology in mental health care practice, explore some of the design issues and ethical issues of using robots in this space, and finally to explore the potential of emerging technology. We propose a system for automated essay grading using ontologies and textual entailment. The process of textual entailment is guided by hypotheses, which are extracted from a domain ontology. Textual entailment checks if the truth of the hypothesis follows from a given text. We enact textual entailment to compare students answer to a model answer obtained from ontology. We validated the solution against various essays written by students in the chemistry domain. Our aim is to extract information about literary characters in unstructured texts. We employ natural language processing and reasoning on domain ontologies. The first task is to identify the main characters and the parts of the story where these characters are described or act. We illustrate the system in a scenario in the folktale domain. The system relies on a folktale ontology that we have developed based on Propp's model for folktales morphology. In this paper we address planning problems in high-dimensional hybrid configuration spaces, with a particular focus on manipulation planning problems involving many objects. We present the hybrid backward-forward (HBF) planning algorithm that uses a backward identification of constraints to direct the sampling of the infinite action space in a forward search from the initial state towards a goal configuration. The resulting planner is probabilistically complete and can effectively construct long manipulation plans requiring both prehensile and nonprehensile actions in cluttered environments. Given recent successes in AI (e.g., AlphaGo's victory against Lee Sedol in the game of GO), it's become increasingly important to assess: how close are AI systems to human-level intelligence? This paper describes the Allen AI Science Challenge---an approach towards that goal which led to a unique Kaggle Competition, its results, the lessons learned, and our next steps. Advances in ICT are bringing into reality the vision of a large number of uniquely identifiable, interconnected objects and things that gather information from diverse physical environments and deliver the information to a variety of innovative applications and services. These sensing objects and things form the Internet of Things (IoT) that can improve energy and cost efficiency and automation in many different industry fields such as transportation and logistics, health care and manufacturing, and facilitate our everyday lives as well. IoT applications rely on real-time context data and allow sending information for driving the behaviors of users in intelligent environments. A visual type theory is a cognitive tool that has much in common with language, and may be regarded as an exceptional form of spatial text adjunct. A mathematical visual type theory, called NPM, has been under development that can be viewed as an early-stage project in mathematical knowledge management and mathematical user interface development. We discuss in greater detail the notion of a visual type theory, report on progress towards a usable mathematical visual type theory, and discuss the outlook for future work on this project. Explanations have been introduced in the previous century. Their interest in reducing the search space is no longer questioned. Yet, their efficient implementation into CSP solver is still a challenge. In this paper, we introduce ESeR, an Event Selection Rules algorithm that filters events generated during propagation. This dynamic selection enables an efficient computation of explanations for intelligent backtracking al- gorithms. We show the effectiveness of our approach on the instances of the last three MiniZinc challenges This paper describes an approach for user (e.g. SW architect) assisting in software processes. The approach observes the user's action and tries to predict his next step. For this we use approaches in the area of machine learning (sequence learning) and adopt these for the use in software processes. Keywords: Software engineering, Software process description languages, Software processes, Machine learning, Sequence prediction This manual describes the competition software for the Simulated Car Racing Championship, an international competition held at major conferences in the field of Evolutionary Computation and in the field of Computational Intelligence and Games. It provides an overview of the architecture, the instructions to install the software and to run the simple drivers provided in the package, the description of the sensors and the actuators. The measure of distance between two fuzzy sets is a fundamental tool within fuzzy set theory. However, current distance measures within the literature do not account for the direction of change between fuzzy sets; a useful concept in a variety of applications, such as Computing With Words. In this paper, we highlight this utility and introduce a distance measure which takes the direction between sets into account. We provide details of its application for normal and non-normal, as well as convex and non-convex fuzzy sets. We demonstrate the new distance measure using real data from the MovieLens dataset and establish the benefits of measuring the direction between fuzzy sets. The measure of distance between two fuzzy sets is a fundamental tool within fuzzy set theory, however, distance measures currently within the literature use a crisp value to represent the distance between fuzzy sets. A real valued distance measure is developed into a fuzzy distance measure which better reflects the uncertainty inherent in fuzzy sets and a fuzzy directional distance measure is presented, which accounts for the direction of change between fuzzy sets. A multiplicative version is explored as a full maximal assignment is computationally intractable so an intermediate solution is offered. Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show that the incremental resolution is more efficient than the non-incremental one, but both ASP programs are less efficient than dedicated algorithms. Nonetheless, this approach can be seen as a first step toward a generic framework for sequential pattern mining with constraints. We characterize different types of conflicts that may occur in complex distributed multi-agent scenarios, such as in Ambient Intelligence (AmI) environments, and we argue that these conflicts should be resolved in a suitable order and with the appropriate strategies for each individual conflict type. We call for further research with the goal of turning conflict resolution in AmI environments and similar multi-agent domains into a more coordinated and agreed upon process. In a Human-Computer Interaction context, we aim to elaborate an adaptive and generic interaction model in two different use cases: Embodied Conversational Agents and Creative Musical Agents for musical improvisation. To reach this goal, we'll try to use the concepts of adaptation and synchronization to enhance the interactive abilities of our agents and guide the development of our interaction model, and will try to make synchrony emerge from non-verbal dimensions of interaction. We propose a way of extracting and aggregating per-move evaluations from sets of Go game records. The evaluations capture different aspects of the games such as played patterns or statistic of sente/gote sequences. Using machine learning algorithms, the evaluations can be utilized to predict different relevant target variables. We apply this methodology to predict the strength and playing style of the player (e.g. territoriality or aggressivity) with good accuracy. We propose a number of possible applications including aiding in Go study, seeding real-work ranks of internet players or tuning of Go-playing programs. The paper presents an application of non-linear stacking ensembles for prediction of Go player attributes. An evolutionary algorithm is used to form a diverse ensemble of base learners, which are then aggregated by a stacking ensemble. This methodology allows for an efficient prediction of different attributes of Go players from sets of their games. These attributes can be fairly general, in this work, we used the strength and style of the players. This paper presents 'SimpleDS', a simple and publicly available dialogue system trained with deep reinforcement learning. In contrast to previous reinforcement learning dialogue systems, this system avoids manual feature engineering by performing action selection directly from raw text of the last system and (noisy) user responses. Our initial results, in the restaurant domain, show that it is indeed possible to induce reasonable dialogue behaviour with an approach that aims for high levels of automation in dialogue control for intelligent interactive agents. There exists a theory of a single general-purpose learning algorithm which could explain the principles of its operation. This theory assumes that the brain has some initial rough architecture, a small library of simple innate circuits which are prewired at birth and proposes that all significant mental algorithms can be learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from both architectural and functional perspectives. We describe CITlab's recognition system for the HTRtS competition attached to the 13. International Conference on Document Analysis and Recognition, ICDAR 2015. The task comprises the recognition of historical handwritten documents. The core algorithms of our system are based on multi-dimensional recurrent neural networks (MDRNN) and connectionist temporal classification (CTC). The software modules behind that as well as the basic utility technologies are essentially powered by PLANET's ARGUS framework for intelligent text recognition and image processing. This paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment than before. The competition will debut at the IEEE Computational Intelligence and Games Conference 2016. Some preliminary results showing the effects of Partial Observability and the benefits of simple communication are also presented. This paper presents a systematic survey on existing literature and seminal works relevant to the application of ontologies in different aspects of Cloud computing. Our hypothesis is that ontologies along with their reasoning capabilities can have significant impact on improving various aspects of the Cloud computing phenomena. Ontologies can promote intelligent decision support mechanisms for various Cloud based services. They can also provide effective interoperability among the Cloud based systems and resources. This survey can promote a comprehensive understanding on the roles and significance of ontologies within the overall domain of Cloud Computing. Also, this project can potentially form the basis of new research area and possibilities for both ontology and Cloud computing communities. Neo-fuzzy elements are used as nodes for an evolving cascade system. The proposed system can tune both its parameters and architecture in an online mode. It can be used for solving a wide range of Data Mining tasks (namely time series forecasting). The evolving cascade system with neo-fuzzy nodes can process rather large data sets with high speed and effectiveness. Despite outstanding success in vision amongst other domains, many of the recent deep learning approaches have evident drawbacks for robots. This manuscript surveys recent work in the literature that pertain to applying deep learning systems to the robotics domain, either as means of estimation or as a tool to resolve motor commands directly from raw percepts. These recent advances are only a piece to the puzzle. We suggest that deep learning as a tool alone is insufficient in building a unified framework to acquire general intelligence. For this reason, we complement our survey with insights from cognitive development and refer to ideas from classical control theory, producing an integrated direction for a lifelong learning architecture. Defining various dishonest notions in a formal way is a key step to enable intelligent agents to act in untrustworthy environments. This review evaluates the literature for this topic by looking at formal definitions based on modal logic as well as other formal approaches. Criteria from philosophical groundwork is used to assess the definitions for correctness and completeness. The key contribution of this review is to show that only a few definitions fully comply with this gold standard and to point out the missing steps towards a successful application of these definitions in an actual agent environment. In this paper we introduce a fuzzy constraint linear discriminant analysis (FC-LDA). The FC-LDA tries to minimize misclassification error based on modified perceptron criterion that benefits handling the uncertainty near the decision boundary by means of a fuzzy linear programming approach with fuzzy resources. The method proposed has low computational complexity because of its linear characteristics and the ability to deal with noisy data with different degrees of tolerance. Obtained results verify the success of the algorithm when dealing with different problems. Comparing FC-LDA and LDA shows superiority in classification task. The ability to accurately predict and simulate human driving behavior is critical for the development of intelligent transportation systems. Traditional modeling methods have employed simple parametric models and behavioral cloning. This paper adopts a method for overcoming the problem of cascading errors inherent in prior approaches, resulting in realistic behavior that is robust to trajectory perturbations. We extend Generative Adversarial Imitation Learning to the training of recurrent policies, and we demonstrate that our model outperforms rule-based controllers and maximum likelihood models in realistic highway simulations. Our model both reproduces emergent behavior of human drivers, such as lane change rate, while maintaining realistic control over long time horizons. Retention of residual skills for persons who partially lose their cognitive or physical ability is of utmost importance. Research is focused on developing systems that provide need-based assistance for retention of such residual skills. This paper describes a novel cognitive collaborative control architecture C3A, designed to address the challenges of developing need- based assistance for wheelchair navigation. Organization of C3A is detailed and results from simulation of the proposed architecture is presented. For simulation of our proposed architecture, we have used ROS (Robot Operating System) as a control framework and a 3D robotic simulator called USARSim (Unified System for Automation and Robot Simulation). Classical higher-order logic, when utilized as a meta-logic in which various other (classical and non-classical) logics can be shallowly embedded, is well suited for realising a universal logic reasoning approach. Universal logic reasoning in turn, as envisioned already by Leibniz, may support the rigorous formalisation and deep logical analysis of rational arguments within machines. A respective universal logic reasoning framework is described and a range of exemplary applications are discussed. In the future, universal logic reasoning in combination with appropriate, controlled forms of rational argumentation may serve as a communication layer between humans and intelligent machines. In this paper we describe an automatic generator to support the data scientist to construct, in a user-friendly way, dashboards from data represented as networks. The generator called SBINet (Semantic for Business Intelligence from Networks) has a semantic layer that, through ontologies, describes the data that represents a network as well as the possible metrics to be calculated in the network. Thus, with SBINet, the stages of the dashboard constructing process that uses complex network metrics are facilitated and can be done by users who do not necessarily know about complex networks. In this report, an automated bartender system was developed for making orders in a bar using hand gestures. The gesture recognition of the system was developed using Machine Learning techniques, where the model was trained to classify gestures using collected data. The final model used in the system reached an average accuracy of 95%. The system raised ethical concerns both in terms of user interaction and having such a system in a real world scenario, but it could initially work as a complement to a real bartender. Symbolic has been long considered as a language of human intelligence while neural networks have advantages of robust computation and dealing with noisy data. The integration of neural-symbolic can offer better learning and reasoning while providing a means for interpretability through the representation of symbolic knowledge. Although previous works focus intensively on supervised feedforward neural networks, little has been done for the unsupervised counterparts. In this paper we show how to integrate symbolic knowledge into unsupervised neural networks. We exemplify our approach with knowledge in different forms, including propositional logic for DNA promoter prediction and first-order logic for understanding family relationship. We propose a novel method for automatic program synthesis. P-Tree Programming represents the program search space through a single probabilistic prototype tree. From this prototype tree we form program instances which we evaluate on a given problem. The error values from the evaluations are propagated through the prototype tree. We use them to update the probability distributions that determine the symbol choices of further instances. The iterative method is applied to several symbolic regression benchmarks from the literature. It outperforms standard Genetic Programming to a large extend. Furthermore, it relies on a concise set of parameters which are held constant for all problems. The algorithm can be employed for most of the typical computational intelligence tasks such as classification, automatic program induction, and symbolic regression. Neuroevolution has proven effective at many reinforcement learning tasks, but does not seem to scale well to high-dimensional controller representations, which are needed for tasks where the input is raw pixel data. We propose a novel method where we train an autoencoder to create a comparatively low-dimensional representation of the environment observation, and then use CMA-ES to train neural network controllers acting on this input data. As the behavior of the agent changes the nature of the input data, the autoencoder training progresses throughout evolution. We test this method in the VizDoom environment built on the classic FPS Doom, where it performs well on a health-pack gathering task. The paper considers the problem of planning a set of non-conflict trajectories for the coalition of intelligent agents (mobile robots). Two divergent approaches, e.g. centralized and decentralized, are surveyed and analyzed. Decentralized planner - MAPP is described and applied to the task of finding trajectories for dozens UAVs performing nap-of-the-earth flight in urban environments. Results of the experimental studies provide an opportunity to claim that MAPP is a highly efficient planner for solving considered types of tasks. Sustainable and economical generation of electrical power is an essential and mandatory component of infrastructure in today's world. Optimal generation (generator subset selection) of power requires a careful evaluation of various factors like type of source, generation, transmission & storage capacities, congestion among others which makes this a difficult task. We created a grid to simulate various conditions including stimuli like generator supply, weather and load demand using Siemens PSS/E software and this data is trained using deep learning methods and subsequently tested. The results are highly encouraging. As per our knowledge, this is the first paper to propose a working and scalable deep learning model for this problem. Ordered weighted averaging (OWA) operators have been widely used in decision making these past few years. An important issue facing the OWA operators' users is the determination of the OWA weights. This paper introduces an OWA determination method based on truncated distributions that enables intuitive generation of OWA weights according to a certain level of risk and trade-off. These two dimensions are represented by the two first moments of the truncated distribution. We illustrate our approach with the well-know normal distribution and the definition of a continuous parabolic decision-strategy space. We finally study the impact of the number of criteria on the results. In this work we propose an ontology to support automated negotiation in multiagent systems. The ontology can be connected with some domain-specific ontologies to facilitate the negotiation in different domains, such as Intelligent Transportation Systems (ITS), e-commerce, etc. The specific negotiation rules for each type of negotiation strategy can also be defined as part of the ontology, reducing the amount of knowledge hardcoded in the agents and ensuring the interoperability. The expressiveness of the ontology was proved in a multiagent architecture for the automatic traffic light setting application on ITS. Telecom companies are severely damaged by bypass fraud or SIM boxing. However, there is a shortage of published research to tackle this problem. The traditional method of Test Call Generating is easily overcome by fraudsters and the need for more sophisticated ways is inevitable. In this work, we are developing intelligent algorithms that mine a huge amount of mobile operator's data and detect the SIMs that are used to bypass international calls. This method will make it hard for fraudsters to generate revenue and hinder their work. Also by reducing fraudulent activities, quality of service can be increased as well as customer satisfaction. Our technique has been evaluated and tested on real world mobile operator data, and proved to be very efficient. We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at http://ai2thor.allenai.org. AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain. The Winograd Schema Challenge (WSC) is a test of machine intelligence, designed to be an improvement on the Turing test. A Winograd Schema consists of a sentence and a corresponding question. To successfully answer these questions, one requires the use of commonsense knowledge and reasoning. This work focuses on extracting common sense knowledge which can be used to generate answers for the Winograd schema challenge. Common sense knowledge is extracted based on events (or actions) and their participants; called Event-Based Conditional Commonsense (ECC). I propose an approach using Narrative Event Chains [Chambers et al., 2008] to extract ECC knowledge. These are stored in templates, to be later used for answering the WSC questions. This approach works well with respect to a subset of WSC tasks. Current machine learning systems operate, almost exclusively, in a statistical, or model-free mode, which entails severe theoretical limits on their power and performance. Such systems cannot reason about interventions and retrospection and, therefore, cannot serve as the basis for strong AI. To achieve human level intelligence, learning machines need the guidance of a model of reality, similar to the ones used in causal inference tasks. To demonstrate the essential role of such models, I will present a summary of seven tasks which are beyond reach of current machine learning systems and which have been accomplished using the tools of causal modeling. The emerging paradigm of Human-Machine Inference Networks (HuMaINs) combines complementary cognitive strengths of humans and machines in an intelligent manner to tackle various inference tasks and achieves higher performance than either humans or machines by themselves. While inference performance optimization techniques for human-only or sensor-only networks are quite mature, HuMaINs require novel signal processing and machine learning solutions. In this paper, we present an overview of the HuMaINs architecture with a focus on three main issues that include architecture design, inference algorithms including security/privacy challenges, and application areas/use cases. While great advances are made in pattern recognition and machine learning, the successes of such fields remain restricted to narrow applications and seem to break down when training data is scarce, a shift in domain occurs, or when intelligent reasoning is required for rapid adaptation to new environments. In this work, we list several of the shortcomings of modern machine-learning solutions, specifically in the contexts of computer vision and in reinforcement learning and suggest directions to explore in order to try to ameliorate these weaknesses. The paper describes the architecture of the intelligence system for automated design of ontological knowledge bases of domain areas and the software model of the management GUI (Graphical User Interface) subsystem We present MIRIAM (Multimodal Intelligent inteRactIon for Autonomous systeMs), a multimodal interface to support situation awareness of autonomous vehicles through chat-based interaction. The user is able to chat about the vehicle's plan, objectives, previous activities and mission progress. The system is mixed initiative in that it pro-actively sends messages about key events, such as fault warnings. We will demonstrate MIRIAM using SeeByte's SeeTrack command and control interface and Neptune autonomy simulator. Conversation interfaces (CIs), or chatbots, are a popular form of intelligent agents that engage humans in task-oriented or informal conversation. In this position paper and demonstration, we argue that chatbots working in dynamic environments, like with sensor data, can not only serve as a promising platform to research issues at the intersection of learning, reasoning, representation and execution for goal-directed autonomy; but also handle non-trivial business applications. We explore the underlying issues in the context of Water Advisor, a preliminary multi-modal conversation system that can access and explain water quality data. We discuss and predict the evolution of Simultaneous Localisation and Mapping (SLAM) into a general geometric and semantic `Spatial AI' perception capability for intelligent embodied devices. A big gap remains between the visual perception performance that devices such as augmented reality eyewear or comsumer robots will require and what is possible within the constraints imposed by real products. Co-design of algorithms, processors and sensors will be needed. We explore the computational structure of current and future Spatial AI algorithms and consider this within the landscape of ongoing hardware developments. We developed an Android Smartophone application software for tourist information system. Especially, the agent system recommends the sightseeing spot and local hospitality corresponding to the current feelings. The system such as concierge can estimate user's emotion and mood by Emotion Generating Calculations and Mental State Transition Network. In this paper, the system decides the next candidates for spots and foods by the reasoning of fuzzy Petri Net in order to make more smooth communication between human and smartphone. The system was developed for Hiroshima Tourist Information and described some hospitality about the concierge system. WWW has a scale-free structure where novel information is often difficult to locate. Moreover, Intelligent agents easily get trapped in this structure. Here a novel method is put forth, which turns these traps into information repositories, supplies: We populated an Internet environment with intelligent news foragers. Foraging has its associated cost whereas foragers are rewarded if they detect not yet discovered novel information. The intelligent news foragers crawl by using the estimated long-term cumulated reward, and also have a finite sized memory: the list of most promising supplies. Foragers form an artificial life community: the most successful ones are allowed to multiply, while unsuccessful ones die out. The specific property of this community is that there is no direct communication amongst foragers but the centralized rewarding system. Still, fast division of work is achieved. The most advanced implementation of adaptive constraint processing with Constraint Handling Rules (CHR) allows the application of intelligent search strategies to solve Constraint Satisfaction Problems (CSP). This presentation compares an improved version of conflict-directed backjumping and two variants of dynamic backtracking with respect to chronological backtracking on some of the AIM instances which are a benchmark set of random 3-SAT problems. A CHR implementation of a Boolean constraint solver combined with these different search strategies in Java is thus being compared with a CHR implementation of the same Boolean constraint solver combined with chronological backtracking in SICStus Prolog. This comparison shows that the addition of ``intelligence'' to the search process may reduce the number of search steps dramatically. Furthermore, the runtime of their Java implementations is in most cases faster than the implementations of chronological backtracking. More specifically, conflict-directed backjumping is even faster than the SICStus Prolog implementation of chronological backtracking, although our Java implementation of CHR lacks the optimisations made in the SICStus Prolog system. To appear in Theory and Practice of Logic Programming (TPLP). We discuss which properties common-use artifacts should have to collaborate without human intervention. We conceive how devices, such as mobile phones, PDAs, and home appliances, could be seamlessly integrated to provide an "ambient intelligence" that responds to the user's desires without requiring explicit programming or commands. While the hardware and software technology to build such systems already exists, as yet there is no standard protocol that can learn new meanings. We propose the first steps in the development of such a protocol, which would need to be adaptive, extensible, and open to the community, while promoting self-organization. We argue that devices, interacting through "game-like" moves, can learn to agree about how to communicate, with whom to cooperate, and how to delegate and coordinate specialized tasks. Thus, they may evolve a distributed cognition or collective intelligence capable of tackling complex tasks. The act of bluffing confounds game designers to this day. The very nature of bluffing is even open for debate, adding further complication to the process of creating intelligent virtual players that can bluff, and hence play, realistically. Through the use of intelligent, learning agents, and carefully designed agent outlooks, an agent can in fact learn to predict its opponents reactions based not only on its own cards, but on the actions of those around it. With this wider scope of understanding, an agent can in learn to bluff its opponents, with the action representing not an illogical action, as bluffing is often viewed, but rather as an act of maximising returns through an effective statistical optimisation. By using a tee dee lambda learning algorithm to continuously adapt neural network agent intelligence, agents have been shown to be able to learn to bluff without outside prompting, and even to learn to call each others bluffs in free, competitive play. In this study, we introduce general frame of MAny Connected Intelligent Particles Systems (MACIPS). Connections and interconnections between particles get a complex behavior of such merely simple system (system in system).Contribution of natural computing, under information granulation theory, are the main topics of this spacious skeleton. Upon this clue, we organize two algorithms involved a few prominent intelligent computing and approximate reasoning methods: self organizing feature map (SOM), Neuro- Fuzzy Inference System and Rough Set Theory (RST). Over this, we show how our algorithms can be taken as a linkage of government-society interaction, where government catches various fashions of behavior: solid (absolute) or flexible. So, transition of such society, by changing of connectivity parameters (noise) from order to disorder is inferred. Add to this, one may find an indirect mapping among financial systems and eventual market fluctuations with MACIPS. Keywords: phase transition, SONFIS, SORST, many connected intelligent particles system, society-government interaction The investigation of the terrorist attack is a time-critical task. The investigators have a limited time window to diagnose the organizational background of the terrorists, to run down and arrest the wire-pullers, and to take an action to prevent or eradicate the terrorist attack. The intuitive interface to visualize the intelligence data set stimulates the investigators' experience and knowledge, and aids them in decision-making for an immediately effective action. This paper presents a computational method to analyze the intelligence data set on the collective actions of the perpetrators of the attack, and to visualize it into the form of a social network diagram which predicts the positions where the wire-pullers conceals themselves. In this study, we introduce general frame of MAny Connected Intelligent Particles Systems (MACIPS). Connections and interconnections between particles get a complex behavior of such merely simple system (system in system).Contribution of natural computing, under information granulation theory, are the main topic of this spacious skeleton. Upon this clue, we organize different algorithms involved a few prominent intelligent computing and approximate reasoning methods such as self organizing feature map (SOM)[9], Neuro- Fuzzy Inference System[10], Rough Set Theory (RST)[11], collaborative clustering, Genetic Algorithm and Ant Colony System. Upon this, we have employed our algorithms on the several engineering systems, especially emerged systems in Civil and Mineral processing. In other process, we investigated how our algorithms can be taken as a linkage of government-society interaction, where government catches various fashions of behavior: solid (absolute) or flexible. So, transition of such society, by changing of connectivity parameters (noise) from order to disorder is inferred. Add to this, one may find an indirect mapping among finical systems and eventual market fluctuations with MACIPS. In the following sections, we will mention the main topics of the suggested proposal, briefly Details of the proposed algorithms can be found in the references. Computer-based information technologies have been extensively used to help many organizations, private companies, and academic and education institutions manage their processes and information systems hereby become their nervous centre. The explosion of massive data sets created by businesses, science and governments necessitates intelligent and more powerful computing paradigms so that users can benefit from this data. Therefore most new-generation database applications demand intelligent information management to enhance efficient interactions between database and the users. Database systems support only a Boolean query model. A selection query on SQL database returns all those tuples that satisfy the conditions in the query. Informledge System (ILS) is a knowledge network with autonomous nodes and intelligent links that integrate and structure the pieces of knowledge. In this paper, we put forward the strategies for knowledge embedding and retrieval in an ILS. ILS is a powerful knowledge network system dealing with logical storage and connectivity of information units to form knowledge using autonomous nodes and multi-lateral links. In ILS, the autonomous nodes known as Knowledge Network Nodes (KNN)s play vital roles which are not only used in storage, parsing and in forming the multi-lateral linkages between knowledge points but also in helping the realization of intelligent retrieval of linked information units in the form of knowledge. Knowledge built in to the ILS forms the shape of sphere. The intelligence incorporated into the links of a KNN helps in retrieving various knowledge threads from a specific set of KNNs. A developed entity of information realized through KNN forms in to the shape of a knowledge cone Holding commercial negotiations and selecting the best supplier in supply chain management systems are among weaknesses of producers in production process. Therefore, applying intelligent systems may have an effective role in increased speed and improved quality in the selections .This paper introduces a system which tries to trade using multi-agents systems and holding negotiations between any agents. In this system, an intelligent agent is considered for each segment of chains which it tries to send order and receive the response with attendance in negotiation medium and communication with other agents .This paper introduces how to communicate between agents, characteristics of multi-agent and standard registration medium of each agent in the environment. JADE (Java Application Development Environment) was used for implementation and simulation of agents cooperation. We describe challenges in the risk management of smart card based electronic cash industry and describe a method to evaluate the effectiveness of distributed intelligence on the smart card. More specifically, we discuss the evaluation of distributed intelligence function called "on-chip risk management" of the smart card for the global electronic cash payment application using micro dynamic simulation. Handling of uncertainty related to future economic environment, various potential counterfeit attack scenarios, requires simulation of such environment to evaluate on-chip performance. Creation of realistic simulation of electronic cash economy, transaction environment, consumers, merchants, banks are challenge themselves. In addition, we shows examples of detection capability of off-chip, host based counterfeit detection systems based on the micro dynamic simulation model generated data set. The paper considers the class of information systems capable of solving heuristic problems on basis of formal theory that was termed modal and vector theory of formal intelligent systems (FIS). The paper justifies the construction of FIS resolution algorithm, defines the main features of these systems and proves theorems that underlie the theory. The principle of representation diversity of FIS construction is formulated. The paper deals with the main principles of constructing and functioning formal intelligent system (FIS) on basis of FIS modal and vector theory. The following phenomena are considered: modular architecture of FIS presentation sub-system, algorithms of data processing at every step of the stage of creating presentations. Besides the paper suggests the structure of neural elements, i.e. zone detectors and processors that are the basis for FIS construction. This paper introduces a novel idea for representation of individuals using quaternions in swarm intelligence and evolutionary algorithms. Quaternions are a number system, which extends complex numbers. They are successfully applied to problems of theoretical physics and to those areas needing fast rotation calculations. We propose the application of quaternions in optimization, more precisely, we have been using quaternions for representation of individuals in Bat algorithm. The preliminary results of our experiments when optimizing a test-suite consisting of ten standard functions showed that this new algorithm significantly improved the results of the original Bat algorithm. Moreover, the obtained results are comparable with other swarm intelligence and evolutionary algorithms, like the artificial bees colony, and differential evolution. We believe that this representation could also be successfully applied to other swarm intelligence and evolutionary algorithms. Path planning of Robot is one of the challenging fields in the area of Robotics research. In this paper, we proposed a novel algorithm to find path between starting and ending position for an intelligent system. An intelligent system is considered to be a device/robot having an antenna connected with sensor-detector system. The proposed algorithm is based on Neural Network training concept. The considered neural network is Adapti ve to the knowledge bases. However, implementation of this algorithm is slightly expensive due to hardware it requires. From detailed analysis, it can be proved that the resulted path of this algorithm is efficient. Intelligent Transportation System in case of cities is controlling traffic congestion and regulating the traffic flow. This paper presents three modules that will help in managing city traffic issues and ultimately gives advanced development in transportation system. First module, Congestion Detection and Management will provide user real time information about congestion on the road towards his destination, Second module, Intelligent Public Transport System will provide user real time public transport information,i.e, local buses, and the third module, Signal Synchronization will help in controlling congestion at signals, with real time adjustments of signal timers according to the congestion. All the information that user is getting about the traffic or public transportation will be provided on users day to day device that is mobile through Android application or SMS. Moreover, communication can also be done via Website for Clients having internet access. And all these modules will be fully automated without any human intervention at server side. Attributing a cyber-operation through the use of multiple pieces of technical evidence (i.e., malware reverse-engineering and source tracking) and conventional intelligence sources (i.e., human or signals intelligence) is a difficult problem not only due to the effort required to obtain evidence, but the ease with which an adversary can plant false evidence. In this paper, we introduce a formal reasoning system called the InCA (Intelligent Cyber Attribution) framework that is designed to aid an analyst in the attribution of a cyber-operation even when the available information is conflicting and/or uncertain. Our approach combines argumentation-based reasoning, logic programming, and probabilistic models to not only attribute an operation but also explain to the analyst why the system reaches its conclusions. While perception tasks such as visual object recognition and text understanding play an important role in human intelligence, the subsequent tasks that involve inference, reasoning and planning require an even higher level of intelligence. The past few years have seen major advances in many perception tasks using deep learning models. For higher-level inference, however, probabilistic graphical models with their Bayesian nature are still more powerful and flexible. To achieve integrated intelligence that involves both perception and inference, it is naturally desirable to tightly integrate deep learning and Bayesian models within a principled probabilistic framework, which we call Bayesian deep learning. In this unified framework, the perception of text or images using deep learning can boost the performance of higher-level inference and in return, the feedback from the inference process is able to enhance the perception of text or images. This survey provides a general introduction to Bayesian deep learning and reviews its recent applications on recommender systems, topic models, and control. In this survey, we also discuss the relationship and differences between Bayesian deep learning and other related topics like Bayesian treatment of neural networks. In [1,2] a new class of intelligent controllers that can semantically embed an agent in a spatial context constraining its behavior in a goal-oriented manner was suggested. A controller of such a class can guide an agent in a stationary unknown environment to a fixed target zone along an obstacle-free trajectory. Here, an extension is suggested that would enable the interception of an intelligent target that is maneuvering to evade capture amidst stationary clutter (i.e. the target zone is moving). This is achieved by forcing the differential properties of the potential field used to induce the control action to satisfy the wave equation. Background of the problem, theoretical developments, as well as, proofs of the ability of the modified control to intercept the target along an obstacle-free trajectory are supplied. Simulation results are also provided. Software cost estimation based on multivariate data from completed projects requires the building of efficient models. These models essentially describe relations in the data, either on the basis of correlations between variables or of similarities between the projects. The continuous growth of the amount of data gathered and the need to perform preliminary analysis in order to discover patterns able to drive the building of reasonable models, leads the researchers towards intelligent and time-saving tools which can effectively describe data and their relationships. The goal of this paper is to suggest an innovative visualization tool, widely used in bioinformatics, which represents relations in data in an aesthetic and intelligent way. In order to illustrate the capabilities of the tool, we use a well known dataset from software engineering projects. Recent advances in communications, mobile computing, and artificial intelligence have greatly expanded the application space of intelligent distributed sensor networks. This in turn motivates the development of generalized Bayesian decentralized data fusion (DDF) algorithms for robust and efficient information sharing among autonomous agents using probabilistic belief models. However, DDF is significantly challenging to implement for general real-world applications requiring the use of dynamic/ad hoc network topologies and complex belief models, such as Gaussian mixtures or hybrid Bayesian networks. To tackle these issues, we first discuss some new key mathematical insights about exact DDF and conservative approximations to DDF. These insights are then used to develop novel generalized DDF algorithms for complex beliefs based on mixture pdfs and conditional factors. Numerical examples motivated by multi-robot target search demonstrate that our methods lead to significantly better fusion results, and thus have great potential to enhance distributed intelligent reasoning in sensor networks. In this paper we discuss and analyze some of the intelligent classifiers which allows for automatic detection and classification of networks attacks for any intrusion detection system. We will proceed initially with their analysis using the WEKA software to work with the classifiers on a well-known IDS (Intrusion Detection Systems) dataset like NSL-KDD dataset. The NSL-KDD dataset of network attacks was created in a military network by MIT Lincoln Labs. Then we will discuss and experiment some of the hybrid AI (Artificial Intelligence) classifiers that can be used for IDS, and finally we developed a Java software with three most efficient classifiers and compared it with other options. The outputs would show the detection accuracy and efficiency of the single and combined classifiers used. It has been quite a long time since AI researchers in the field of computer science stop talking about simulating human intelligence or trying to explain how brain works. Recently, represented by deep learning techniques, the field of machine learning is experiencing unprecedented prosperity and some applications with near human-level performance bring researchers confidence to imply that their approaches are the promising candidate for understanding the mechanism of human brain. However apart from several ancient philological criteria and some imaginary black box tests (Turing test, Chinese room) there is no computational level explanation, definition or criteria about intelligence or any of its components. Base on the common sense that learning ability is one critical component of intelligence and inspect from the viewpoint of mapping relations, this paper presents two laws which explains what is the "learning ability" as we familiar with and under what conditions a mapping relation can be acknowledged as "Learning Model". In this paper, we examine the problem of missing data in high-dimensional datasets by taking into consideration the Missing Completely at Random and Missing at Random mechanisms, as well as theArbitrary missing pattern. Additionally, this paper employs a methodology based on Deep Learning and Swarm Intelligence algorithms in order to provide reliable estimates for missing data. The deep learning technique is used to extract features from the input data via an unsupervised learning approach by modeling the data distribution based on the input. This deep learning technique is then used as part of the objective function for the swarm intelligence technique in order to estimate the missing data after a supervised fine-tuning phase by minimizing an error function based on the interrelationship and correlation between features in the dataset. The investigated methodology in this paper therefore has longer running times, however, the promising potential outcomes justify the trade-off. Also, basic knowledge of statistics is presumed. Efforts at understanding the computational processes in the brain have met with limited success, despite their importance and potential uses in building intelligent machines. We propose a simple new model which draws on recent findings in Neuroscience and the Applied Mathematics of interacting Dynamical Systems. The Feynman Machine is a Universal Computer for Dynamical Systems, analogous to the Turing Machine for symbolic computing, but with several important differences. We demonstrate that networks and hierarchies of simple interacting Dynamical Systems, each adaptively learning to forecast its evolution, are capable of automatically building sensorimotor models of the external and internal world. We identify such networks in mammalian neocortex, and show how existing theories of cortical computation combine with our model to explain the power and flexibility of mammalian intelligence. These findings lead directly to new architectures for machine intelligence. A suite of software implementations has been built based on these principles, and applied to a number of spatiotemporal learning tasks. The sampling based motion planning algorithm known as Rapidly-exploring Random Trees (RRT) has gained the attention of many researchers due to their computational efficiency and effectiveness. Recently, a variant of RRT called RRT* has been proposed that ensures asymptotic optimality. Subsequently its bidirectional version has also been introduced in the literature known as Bidirectional-RRT* (B-RRT*). We introduce a new variant called Intelligent Bidirectional-RRT* (IB-RRT*) which is an improved variant of the optimal RRT* and bidirectional version of RRT* (B-RRT*) algorithms and is specially designed for complex cluttered environments. IB-RRT* utilizes the bidirectional trees approach and introduces intelligent sample insertion heuristic for fast convergence to the optimal path solution using uniform sampling heuristics. The proposed algorithm is evaluated theoretically and experimental results are presented that compares IB-RRT* with RRT* and B-RRT*. Moreover, experimental results demonstrate the superior efficiency of IB-RRT* in comparison with RRT* and B-RRT in complex cluttered environments. Geometry theorem proving forms a major and challenging component in the K-12 mathematics curriculum. A particular difficult task is to add auxiliary constructions (i.e, additional lines or points) to aid proof discovery. Although there exist many intelligent tutoring systems proposed for geometry proofs, few teach students how to find auxiliary constructions. And the few exceptions are all limited by their underlying reasoning processes for supporting auxiliary constructions. This paper tackles these weaknesses of prior systems by introducing an interactive geometry tutor, the Advanced Geometry Proof Tutor (AGPT). It leverages a recent automated geometry prover to provide combined benefits that any geometry theorem prover or intelligent tutoring system alone cannot accomplish. In particular, AGPT not only can automatically process images of geometry problems directly, but also can interactively train and guide students toward discovering auxiliary constructions on their own. We have evaluated AGPT via a pilot study with 78 high school students. The study results show that, on training students how to find auxiliary constructions, there is no significant perceived difference between AGPT and human tutors, and AGPT is significantly more effective than the state-of-the-art geometry solver that produces human-readable proofs. Particle Swarm Optimization (PSO) is an Evolutionary Algorithm (EA) that utilizes a swarm of particles to solve an optimization problem. Slow Intelligence System (SIS) is a learning framework which slowly learns the solution to a problem performing a series of operations. Moreover, Learning Automata (LA) are minuscule but effective decision making entities which are best suited to act as a controller component. In this paper, we combine two isolate populations of PSO to forge the Adaptive Intelligence Optimizer (AIO) which harnesses the advantages of a bi-population PSO to escape from the local minimum and avoid premature convergence. Furthermore, using the rich framework of SIS and the nifty control theory that LA derived from, we find the perfect matching between SIS and LA where acting slowly is the pillar of both of them. Both SIS and LA need time to converge to the optimal decision where this enables AIO to outperform standard PSO having an incomparable performance on evolutionary optimization benchmark functions. Given a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula Phi holds given KB. If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, with domain {1,...,N} that satisfy KB, and compute the fraction of them in which Phi is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying Phi and KB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger, there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are the limitations on its scope. Most importantly, the restriction to unary predicates seems necessary. Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case. These observations suggest unexpected limitations to the applicability of maximum-entropy methods. Most modern formalisms used in Databases and Artificial Intelligence for describing an application domain are based on the notions of class (or concept) and relationship among classes. One interesting feature of such formalisms is the possibility of defining a class, i.e., providing a set of properties that precisely characterize the instances of the class. Many recent articles point out that there are several ways of assigning a meaning to a class definition containing some sort of recursion. In this paper, we argue that, instead of choosing a single style of semantics, we achieve better results by adopting a formalism that allows for different semantics to coexist. We demonstrate the feasibility of our argument, by presenting a knowledge representation formalism, the description logic muALCQ, with the above characteristics. In addition to the constructs for conjunction, disjunction, negation, quantifiers, and qualified number restrictions, muALCQ includes special fixpoint constructs to express (suitably interpreted) recursive definitions. These constructs enable the usual frame-based descriptions to be combined with definitions of recursive data structures such as directed acyclic graphs, lists, streams, etc. We establish several properties of muALCQ, including the decidability and the computational complexity of reasoning, by formulating a correspondence with a particular modal logic of programs called the modal mu-calculus. Existing plan synthesis approaches in artificial intelligence fall into two categories -- domain independent and domain dependent. The domain independent approaches are applicable across a variety of domains, but may not be very efficient in any one given domain. The domain dependent approaches need to be (re)designed for each domain separately, but can be very efficient in the domain for which they are designed. One enticing alternative to these approaches is to automatically synthesize domain independent planners given the knowledge about the domain and the theory of planning. In this paper, we investigate the feasibility of using existing automated software synthesis tools to support such synthesis. Specifically, we describe an architecture called CLAY in which the Kestrel Interactive Development System (KIDS) is used to derive a domain-customized planner through a semi-automatic combination of a declarative theory of planning, and the declarative control knowledge specific to a given domain, to semi-automatically combine them to derive domain-customized planners. We discuss what it means to write a declarative theory of planning and control knowledge for KIDS, and illustrate our approach by generating a class of domain-specific planners using state space refinements. Our experiments show that the synthesized planners can outperform classical refinement planners (implemented as instantiations of UCP, Kambhampati & Srivastava, 1995), using the same control knowledge. We will contrast the costs and benefits of the synthesis approach with conventional methods for customizing domain independent planners. In this paper we propose a random CSP model, called Model GB, which is a natural generalization of standard Model B. It is proved that Model GB in which each constraint is easy to satisfy exhibits non-trivial behaviour (not trivially satisfiable or unsatisfiable) as the number of variables approaches infinity. A detailed analysis to obtain an asymptotic estimate (good to 1+o(1)) of the average number of nodes in a search tree used by the backtracking algorithm on Model GB is also presented. It is shown that the average number of nodes required for finding all solutions or proving that no solution exists grows exponentially with the number of variables. So this model might be an interesting distribution for studying the nature of hard instances and evaluating the performance of CSP algorithms. In addition, we further investigate the behaviour of the average number of nodes as r (the ratio of constraints to variables) varies. The results indicate that as r increases, random CSP instances get easier and easier to solve, and the base for the average number of nodes that is exponential in r tends to 1 as r approaches infinity. Therefore, although the average number of nodes used by the backtracking algorithm on random CSP is exponential, many CSP instances will be very easy to solve when r is sufficiently large. The system presented here shows the feasibility of modeling the knowledge involved in a complex musical activity by integrating sub-symbolic and symbolic processes. This research focuses on the question of whether there is any advantage in integrating a neural network together with a distributed artificial intelligence approach within the music domain. The primary purpose of our work is to design a model that describes the different aspects a user might be interested in considering when involved in a musical activity. The approach we suggest in this work enables the musician to encode his knowledge, intuitions, and aesthetic taste into different modules. The system captures these aspects by computing and applying three distinct functions: rules, fuzzy concepts, and learning. As a case study, we began experimenting with first species two-part counterpoint melodies. We have developed a hybrid system composed of a connectionist module and an agent-based module to combine the sub-symbolic and symbolic levels to achieve this task. The technique presented here to represent musical knowledge constitutes a new approach for composing polyphonic music. In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Considering that the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly. The two standard branching schemes for CSPs are d-way and 2-way branching. Although it has been shown that in theory the latter can be exponentially more effective than the former, there is a lack of empirical evidence showing such differences. To investigate this, we initially make an experimental comparison of the two branching schemes over a wide range of benchmarks. Experimental results verify the theoretical gap between d-way and 2-way branching as we move from a simple variable ordering heuristic like dom to more sophisticated ones like dom/ddeg. However, perhaps surprisingly, experiments also show that when state-of-the-art variable ordering heuristics like dom/wdeg are used then d-way can be clearly more efficient than 2-way branching in many cases. Motivated by this observation, we develop two generic heuristics that can be applied at certain points during search to decide whether 2-way branching or a restricted version of 2-way branching, which is close to d-way branching, will be followed. The application of these heuristics results in an adaptive branching scheme. Experiments with instantiations of the two generic heuristics confirm that search with adaptive branching outperforms search with a fixed branching scheme on a wide range of problems. The constraint satisfaction problem (CSP) is a general problem central to computer science and artificial intelligence. Although the CSP is NP-hard in general, considerable effort has been spent on identifying tractable subclasses. The main two approaches consider structural properties (restrictions on the hypergraph of constraint scopes) and relational properties (restrictions on the language of constraint relations). Recently, some authors have considered hybrid properties that restrict the constraint hypergraph and the relations simultaneously. Our key contribution is the novel concept of a CSP pattern and classes of problems defined by forbidden patterns (which can be viewed as forbidding generic subproblems). We describe the theoretical framework which can be used to reason about classes of problems defined by forbidden patterns. We show that this framework generalises relational properties and allows us to capture known hybrid tractable classes. Although we are not close to obtaining a dichotomy concerning the tractability of general forbidden patterns, we are able to make some progress in a special case: classes of problems that arise when we can only forbid binary negative patterns (generic subproblems in which only inconsistent tuples are specified). In this case we are able to characterise very large classes of tractable and NP-hard forbidden patterns. This leaves the complexity of just one case unresolved and we conjecture that this last case is tractable. Human intuition has been simulated by several research projects using artificial intelligence techniques. Most of these algorithms or models lack the ability to handle complications or diversions. Moreover, they also do not explain the factors influencing intuition and the accuracy of the results from this process. In this paper, we present a simple series based model for implementation of human-like intuition using the principles of connectivity and unknown entities. By using Poker hand datasets and Car evaluation datasets, we compare the performance of some well-known models with our intuition model. The aim of the experiment was to predict the maximum accurate answers using intuition based models. We found that the presence of unknown entities, diversion from the current problem scenario, and identifying weakness without the normal logic based execution, greatly affects the reliability of the answers. Generally, the intuition based models cannot be a substitute for the logic based mechanisms in handling such problems. The intuition can only act as a support for an ongoing logic based model that processes all the steps in a sequential manner. However, when time and computational cost are very strict constraints, this intuition based model becomes extremely important and useful, because it can give a reasonably good performance. Factors affecting intuition are analyzed and interpreted through our model. As computational agents are developed for increasingly complicated e-commerce applications, the complexity of the decisions they face demands advances in artificial intelligence techniques. For example, an agent representing a seller in an auction should try to maximize the seller's profit by reasoning about a variety of possibly uncertain pieces of information, such as the maximum prices various buyers might be willing to pay, the possible prices being offered by competing sellers, the rules by which the auction operates, the dynamic arrival and matching of offers to buy and sell, and so on. A naive application of multiagent reasoning techniques would require the seller's agent to explicitly model all of the other agents through an extended time horizon, rendering the problem intractable for many realistically-sized problems. We have instead devised a new strategy that an agent can use to determine its bid price based on a more tractable Markov chain model of the auction process. We have experimentally identified the conditions under which our new strategy works well, as well as how well it works in comparison to the optimal performance the agent could have achieved had it known the future. Our results show that our new strategy in general performs well, outperforming other tractable heuristic strategies in a majority of experiments, and is particularly effective in a 'seller?s market', where many buy offers are available. Belief merging is an important but difficult problem in Artificial Intelligence, especially when sources of information are pervaded with uncertainty. Many merging operators have been proposed to deal with this problem in possibilistic logic, a weighted logic which is powerful for handling inconsistency and deal- ing with uncertainty. They often result in a possibilistic knowledge base which is a set of weighted formulas. Although possibilistic logic is inconsistency tolerant, it suers from the well-known "drowning effect". Therefore, we may still want to obtain a consistent possi- bilistic knowledge base as the result of merg- ing. In such a case, we argue that it is not always necessary to keep weighted informa- tion after merging. In this paper, we define a merging operator that maps a set of pos- sibilistic knowledge bases and a formula rep- resenting the integrity constraints to a clas- sical knowledge base by using lexicographic ordering. We show that it satisfies nine pos- tulates that generalize basic postulates for propositional merging given in [11]. These postulates capture the principle of minimal change in some sense. We then provide an algorithm for generating the resulting knowl- edge base of our merging operator. Finally, we discuss the compatibility of our merging operator with propositional merging and es- tablish the advantage of our merging opera- tor over existing semantic merging operators in the propositional case. Manipulation, bribery, and control are well-studied ways of changing the outcome of an election. Many voting rules are, in the general case, computationally resistant to some of these manipulative actions. However when restricted to single-peaked electorates, these rules suddenly become easy to manipulate. Recently, Faliszewski, Hemaspaandra, and Hemaspaandra studied the computational complexity of strategic behavior in nearly single-peaked electorates. These are electorates that are not single-peaked but close to it according to some distance measure. In this paper we introduce several new distance measures regarding single-peakedness. We prove that determining whether a given profile is nearly single-peaked is NP-complete in many cases. For one case we present a polynomial-time algorithm. In case the single-peaked axis is given, we show that determining the distance is always possible in polynomial time. Furthermore, we explore the relations between the new notions introduced in this paper and existing notions from the literature. The problems associated with scaling involve active and challenging research topics in the area of artificial intelligence. The purpose is to solve real world problems by means of AI technologies, in cases where the complexity of representation of the real world problem is potentially combinatorial. In this paper, we present a novel approach to cope with the scaling issues in Bayesian belief networks for ship classification. The proposed approach divides the conceptual model of a complex ship classification problem into a set of small modules that work together to solve the classification problem while preserving the functionality of the original model. The possible ways of explaining sensor returns (e.g., the evidence) for some features, such as portholes along the length of a ship, are sometimes combinatorial. Thus, using an exhaustive approach, which entails the enumeration of all possible explanations, is impractical for larger problems. We present a network structure (referred to as Sequential Decomposition, SD) in which each observation is associated with a set of legitimate outcomes which are consistent with the explanation of each observed piece of evidence. The results show that the SD approach allows one to represent feature-observation relations in a manageable way and achieve the same explanatory power as an exhaustive approach. Shafer's theory of belief and the Bayesian theory of probability are two alternative and mutually inconsistent approaches toward modelling uncertainty in artificial intelligence. To help reduce the conflict between these two approaches, this paper reexamines expected utility theory-from which Bayesian probability theory is derived. Expected utility theory requires the decision maker to assign a utility to each decision conditioned on every possible event that might occur. But frequently the decision maker cannot foresee all the events that might occur, i.e., one of the possible events is the occurrence of an unforeseen event. So once we acknowledge the existence of unforeseen events, we need to develop some way of assigning utilities to decisions conditioned on unforeseen events. The commonsensical solution to this problem is to assign similar utilities to events which are similar. Implementing this commonsensical solution is equivalent to replacing Bayesian subjective probabilities over the space of foreseen and unforeseen events by random set theory probabilities over the space of foreseen events. This leads to an expected utility principle in which normalized variants of Shafer's commonalities play the role of subjective probabilities. Hence allowing for unforeseen events in decision analysis causes Bayesian probability theory to become much more similar to Shaferian theory. The modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings. We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science. Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems We introduce GOTCHAs (Generating panOptic Turing Tests to Tell Computers and Humans Apart) as a way of preventing automated offline dictionary attacks against user selected passwords. A GOTCHA is a randomized puzzle generation protocol, which involves interaction between a computer and a human. Informally, a GOTCHA should satisfy two key properties: (1) The puzzles are easy for the human to solve. (2) The puzzles are hard for a computer to solve even if it has the random bits used by the computer to generate the final puzzle --- unlike a CAPTCHA. Our main theorem demonstrates that GOTCHAs can be used to mitigate the threat of offline dictionary attacks against passwords by ensuring that a password cracker must receive constant feedback from a human being while mounting an attack. Finally, we provide a candidate construction of GOTCHAs based on Inkblot images. Our construction relies on the usability assumption that users can recognize the phrases that they originally used to describe each Inkblot image --- a much weaker usability assumption than previous password systems based on Inkblots which required users to recall their phrase exactly. We conduct a user study to evaluate the usability of our GOTCHA construction. We also generate a GOTCHA challenge where we encourage artificial intelligence and security researchers to try to crack several passwords protected with our scheme. Prime implicates and prime implicants have proven relevant to a number of areas of artificial intelligence, most notably abductive reasoning and knowledge compilation. The purpose of this paper is to examine how these notions might be appropriately extended from propositional logic to the modal logic K. We begin the paper by considering a number of potential definitions of clauses and terms for K. The different definitions are evaluated with respect to a set of syntactic, semantic, and complexity-theoretic properties characteristic of the propositional definition. We then compare the definitions with respect to the properties of the notions of prime implicates and prime implicants that they induce. While there is no definition that perfectly generalizes the propositional notions, we show that there does exist one definition which satisfies many of the desirable properties of the propositional case. In the second half of the paper, we consider the computational properties of the selected definition. To this end, we provide sound and complete algorithms for generating and recognizing prime implicates, and we show the prime implicate recognition task to be PSPACE-complete. We also prove upper and lower bounds on the size and number of prime implicates. While the paper focuses on the logic K, all of our results hold equally well for multi-modal K and for concept expressions in the description logic ALC. Real-time heuristic search algorithms satisfy a constant bound on the amount of planning per action, independent of problem size. As a result, they scale up well as problems become larger. This property would make them well suited for video games where Artificial Intelligence controlled agents must react quickly to user commands and to other agents actions. On the downside, real-time search algorithms employ learning methods that frequently lead to poor solution quality and cause the agent to appear irrational by re-visiting the same problem states repeatedly. The situation changed recently with a new algorithm, D LRTA*, which attempted to eliminate learning by automatically selecting subgoals. D LRTA* is well poised for video games, except it has a complex and memory-demanding pre-computation phase during which it builds a database of subgoals. In this paper, we propose a simpler and more memory-efficient way of pre-computing subgoals thereby eliminating the main obstacle to applying state-of-the-art real-time search methods in video games. The new algorithm solves a number of randomly chosen problems off-line, compresses the solutions into a series of subgoals and stores them in a database. When presented with a novel problem on-line, it queries the database for the most similar previously solved case and uses its subgoals to solve the problem. In the domain of pathfinding on four large video game maps, the new algorithm delivers solutions eight times better while using 57 times less memory and requiring 14% less pre-computation time. Domain-independent planning is one of the foundational areas in the field of Artificial Intelligence. A description of a planning task consists of an initial world state, a goal, and a set of actions for modifying the world state. The objective is to find a sequence of actions, that is, a plan, that transforms the initial world state into a goal state. In optimal planning, we are interested in finding not just a plan, but one of the cheapest plans. A prominent approach to optimal planning these days is heuristic state-space search, guided by admissible heuristic functions. Numerous admissible heuristics have been developed, each with its own strengths and weaknesses, and it is well known that there is no single "best heuristic for optimal planning in general. Thus, which heuristic to choose for a given planning task is a difficult question. This difficulty can be avoided by combining several heuristics, but that requires computing numerous heuristic estimates at each state, and the tradeoff between the time spent doing so and the time saved by the combined advantages of the different heuristics might be high. We present a novel method that reduces the cost of combining admissible heuristics for optimal planning, while maintaining its benefits. Using an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for learning a classifier with that decision rule as the target concept, and employ the learned classifier to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms the standard method for combining several heuristics via their pointwise maximum. This book-length article combines several peer reviewed papers and new material to analyze the issues of ethical artificial intelligence (AI). The behavior of future AI systems can be described by mathematical equations, which are adapted to analyze possible unintended AI behaviors and ways that AI designs can avoid them. This article makes the case for utility-maximizing agents and for avoiding infinite sets in agent definitions. It shows how to avoid agent self-delusion using model-based utility functions and how to avoid agents that corrupt their reward generators (sometimes called "perverse instantiation") using utility functions that evaluate outcomes at one point in time from the perspective of humans at a different point in time. It argues that agents can avoid unintended instrumental actions (sometimes called "basic AI drives" or "instrumental goals") by accurately learning human values. This article defines a self-modeling agent framework and shows how it can avoid problems of resource limits, being predicted by other agents, and inconsistency between the agent's utility function and its definition (one version of this problem is sometimes called "motivated value selection"). This article also discusses how future AI will differ from current AI, the politics of AI, and the ultimate use of AI to help understand the nature of the universe and our place in it. In this position paper, I argue that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science. The primary problem is that these tests are designed to test aspects of knowledge and ability that are challenging for people; the aspects that are challenging for AI systems are very different. In particular, standardized tests do not test knowledge that is obvious for people; none of this knowledge can be assumed in AI systems. Individual standardized tests also have specific features that are not necessarily appropriate for an AI benchmark. I analyze the Physics subject SAT in some detail and the New York State Regents Science test more briefly. I also argue that the apparent advantages offered by using standardized tests are mostly either minor or illusory. The one major real advantage is that the significance is easily explained to the public; but I argue that even this is a somewhat mixed blessing. I conclude by arguing that, first, more appropriate collections of exam style problems could be assembled, and second, that there are better kinds of benchmarks than exam-style problems. In an appendix I present a collection of sample exam-style problems that test kinds of knowledge missing from the standardized tests. Undergraduate students of artificial intelligence often struggle with representing knowledge as logical sentences. This is a skill that seems to require extensive practice to obtain, suggesting a teaching strategy that involves the assignment of numerous exercises involving the formulation of some bit of knowledge, communicated using a natural language such as English, as a sentence in some logic. The number of such exercises needed to master this skill is far too large to allow typical artificial intelligence course teaching teams to provide prompt feedback on student efforts. Thus, an automated assessment system for such exercises is needed to ensure that students receive an adequate amount of practice, with the rapid delivery of feedback allowing students to identify errors in their understanding and correct them. This paper describes an automated grading system for knowledge representation exercises using first-order logic. A resolution theorem prover, \textit{Prover9}, is used to check if a student-submitted formula is logically equivalent to a solution provided by the instructor. This system has been used by students enrolled in undergraduate artificial intelligence classes for several years. Use of this teaching tool resulted in a statistically significant improvement on first-order logic knowledge representation questions appearing on the course final examination. This article explains how this system works, provides an analysis of changes in student learning outcomes, and explores potential enhancements of this system, including the possibility of providing rich formative feedback by replacing the resolution theorem prover with a tableaux-based method. This article addresses an open problem in the area of cognitive systems and architectures: namely the problem of handling (in terms of processing and reasoning capabilities) complex knowledge structures that can be at least plausibly comparable, both in terms of size and of typology of the encoded information, to the knowledge that humans process daily for executing everyday activities. Handling a huge amount of knowledge, and selectively retrieve it ac- cording to the needs emerging in different situational scenarios, is an important aspect of human intelligence. For this task, in fact, humans adopt a wide range of heuristics (Gigerenzer and Todd) due to their bounded rationality (Simon, 1957). In this perspective, one of the re- quirements that should be considered for the design, the realization and the evaluation of intelligent cognitively inspired systems should be represented by their ability of heuristically identify and retrieve, from the general knowledge stored in their artificial Long Term Memory (LTM), that one which is synthetically and contextually relevant. This require- ment, however, is often neglected. Currently, artificial cognitive systems and architectures are not able, de facto, to deal with complex knowledge structures that can be even slightly comparable to the knowledge heuris- tically managed by humans. In this paper I will argue that this is not only a technological problem but also an epistemological one and I will briefly sketch a proposal for a possible solution. In recent years, researchers in decision analysis and artificial intelligence (Al) have used Bayesian belief networks to build models of expert opinion. Using standard methods drawn from the theory of computational complexity, workers in the field have shown that the problem of probabilistic inference in belief networks is difficult and almost certainly intractable. K N ET, a software environment for constructing knowledge-based systems within the axiomatic framework of decision theory, contains a randomized approximation scheme for probabilistic inference. The algorithm can, in many circumstances, perform efficient approximate inference in large and richly interconnected models of medical diagnosis. Unlike previously described stochastic algorithms for probabilistic inference, the randomized approximation scheme computes a priori bounds on running time by analyzing the structure and contents of the belief network. In this article, we describe a randomized algorithm for probabilistic inference and analyze its performance mathematically. Then, we devote the major portion of the paper to a discussion of the algorithm's empirical behavior. The results indicate that the generation of good trials (that is, trials whose distribution closely matches the true distribution), rather than the computation of numerous mediocre trials, dominates the performance of stochastic simulation. Key words: probabilistic inference, belief networks, stochastic simulation, computational complexity theory, randomized algorithms. Many Artificial Intelligence systems depend on the agent's updating its beliefs about the world on the basis of experience. Experiments constitute one type of experience, so scientific methodology offers a natural environment for examining the issues attendant to using this class of evidence. This paper presents a framework which structures the process of using scientific data from research reports for the purpose of making decisions, using decision analysis as the basis for the structure and using medical research as the general scientific domain. The structure extends the basic influence diagram for updating belief in an object domain parameter of interest by expanding the parameter into four parts: those of the patient, the population, the study sample, and the effective study sample. The structure uses biases to perform the transformation of one parameter into another, so that, for instance, selection biases, in concert with the population parameter, yield the study sample parameter. The influence diagram structure provides decision theoretic justification for practices of good clinical research such as randomized assignment and blindfolding of care providers. The model covers most research designs used in medicine: case-control studies, cohort studies, and controlled clinical trials, and provides an architecture to separate clearly between statistical knowledge and domain knowledge. The proposed general model can be the basis for clinical epidemiological advisory systems, when coupled with heuristic pruning of irrelevant biases; of statistical workstations, when the computational machinery for calculation of posterior distributions is added; and of meta-analytic reviews, when multiple studies may impact on a single population parameter. This document contains supplementary material for the paper "Multi-objective Reinforcement Learning with Continuous Pareto Frontier Approximation", published at the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15). The paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Markov Decision Problems (MOMDPs). We propose a policy-based approach that exploits gradient information to generate solutions close to the Pareto ones. Differently from previous policy-gradient multi-objective algorithms, where n optimization routines are use to have n solutions, our approach performs a single gradient-ascent run that at each step generates an improved continuous approximation of the Pareto frontier. The idea is to exploit a gradient-based approach to optimize the parameters of a function that defines a manifold in the policy parameter space so that the corresponding image in the objective space gets as close as possible to the Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two interesting MOMDPs. Motivated by recent progress on pricing in the AI literature, we study marketplaces that contain multiple vendors offering identical or similar products and unit-demand buyers with different valuations on these vendors. The objective of each vendor is to set the price of its product to a fixed value so that its profit is maximized. The profit depends on the vendor's price itself and the total volume of buyers that find the particular price more attractive than the price of the vendor's competitors. We model the behaviour of buyers and vendors as a two-stage full-information game and study a series of questions related to the existence, efficiency (price of anarchy) and computational complexity of equilibria in this game. To overcome situations where equilibria do not exist or exist but are highly inefficient, we consider the scenario where some of the vendors are subsidized in order to keep prices low and buyers highly satisfied. Learning models of artificial intelligence can nowadays perform very well on a large variety of tasks. However, in practice different task environments are best handled by different learning models, rather than a single, universal, approach. Most non-trivial models thus require the adjustment of several to many learning parameters, which is often done on a case-by-case basis by an external party. Meta-learning refers to the ability of an agent to autonomously and dynamically adjust its own learning parameters, or meta-parameters. In this work we show how projective simulation, a recently developed model of artificial intelligence, can naturally be extended to account for meta-learning in reinforcement learning settings. The projective simulation approach is based on a random walk process over a network of clips. The suggested meta-learning scheme builds upon the same design and employs clip networks to monitor the agent's performance and to adjust its meta-parameters "on the fly". We distinguish between "reflexive adaptation" and "adaptation through learning", and show the utility of both approaches. In addition, a trade-off between flexibility and learning-time is addressed. The extended model is examined on three different kinds of reinforcement learning tasks, in which the agent has different optimal values of the meta-parameters, and is shown to perform well, reaching near-optimal to optimal success rates in all of them, without ever needing to manually adjust any meta-parameter. Linguistic relations in oral conversations present how opinions are constructed and developed in a restricted time. The relations bond ideas, arguments, thoughts, and feelings, re-shape them during a speech, and finally build knowledge out of all information provided in the conversation. Speakers share a common interest to discuss. It is expected that each speaker's reply includes duplicated forms of words from previous speakers. However, linguistic adaptation is observed and evolves in a more complex path than just transferring slightly modified versions of common concepts. A conversation aiming a benefit at the end shows an emergent cooperation inducing the adaptation. Not only cooperation, but also competition drives the adaptation or an opposite scenario and one can capture the dynamic process by tracking how the concepts are linguistically linked. To uncover salient complex dynamic events in verbal communications, we attempt to discover self-organized linguistic relations hidden in a conversation with explicitly stated winners and losers. We examine open access data of the United States Supreme Court. Our understanding is crucial in big data research to guide how transition states in opinion mining and decision-making should be modeled and how this required knowledge to guide the model should be pinpointed, by filtering large amount of data. Given recent proposals to synthesize consciousness, how many forms of conscious machines can one distinguish and on what grounds? Based on current clinical scales of consciousness, that measure cognitive awareness and wakefulness, we take a perspective on how contemporary artificially intelligent machines and synthetically engineered life forms would measure on these scales. To do so, we argue that awareness and wakefulness can be associated to computational and autonomous complexity respectively. Then, building on insights from cognitive robotics, we ask what function consciousness serves, and interpret it as an evolutionary game-theoretic strategy. We make the case for a third type of complexity necessary for describing consciousness, namely, social complexity. Having identified these complexity types, allows us to represent both, biological and synthetic systems in a common morphospace. This suggests an embodiment-based taxonomy of consciousness. In particular, we distinguish four forms of consciousness, based on embodiment: biological, synthetic, group (resulting from group interactions) and simulated consciousness (embodied by virtual agents within a simulated reality). Such a taxonomy is useful for studying comparative signatures of consciousness across domains, in order to highlight design principles necessary to engineer conscious machines. This is particularly relevant in the light of recent developments at the crossroads of neuroscience, biomedical engineering, artificial intelligence and biomimetics. Molecular variants of vitamin B12, siderophores and glycans occur. To take up variant forms, bacteria may express an array of receptors. The gut microbe Bacteroides thetaiotaomicron has three different receptors to take up variants of vitamin B12 and 88 receptors to take up various glycans. The design of receptor arrays reflects key processes that shape cellular evolution. Competition may focus each species on a subset of the available nutrient diversity. Some gut bacteria can take up only a narrow range of carbohydrates, whereas species such as B.~thetaiotaomicron can digest many different complex glycans. Comparison of different nutrients, habitats, and genomes provide opportunity to test hypotheses about the breadth of receptor arrays. Another important process concerns fluctuations in nutrient availability. Such fluctuations enhance the value of cellular sensors, which gain information about environmental availability and adjust receptor deployment. Bacteria often adjust receptor expression in response to fluctuations of particular carbohydrate food sources. Some species may adjust expression of uptake receptors for specific siderophores. How do cells use sensor information to control the response to fluctuations? That question about regulatory wiring relates to problems that arise in control theory and artificial intelligence. Control theory clarifies how to analyze environmental fluctuations in relation to the design of sensors and response systems. Recent advances in deep learning studies of artificial intelligence focus on the architecture of regulatory wiring and the ways in which complex control networks represent and classify environmental states. I emphasize the similar design problems that arise in cellular evolution, control theory, and artificial intelligence. I connect those broad concepts to testable hypotheses for bacterial uptake of B12, siderophores and glycans. Syntactic parsing, the process of obtaining the internal structure of sentences in natural languages, is a crucial task for artificial intelligence applications that need to extract meaning from natural language text or speech. Sentiment analysis is one example of application for which parsing has recently proven useful. In recent years, there have been significant advances in the accuracy of parsing algorithms. In this article, we perform an empirical, task-oriented evaluation to determine how parsing accuracy influences the performance of a state-of-the-art rule-based sentiment analysis system that determines the polarity of sentences from their parse trees. In particular, we evaluate the system using four well-known dependency parsers, including both current models with state-of-the-art accuracy and more innacurate models which, however, require less computational resources. The experiments show that all of the parsers produce similarly good results in the sentiment analysis task, without their accuracy having any relevant influence on the results. Since parsing is currently a task with a relatively high computational cost that varies strongly between algorithms, this suggests that sentiment analysis researchers and users should prioritize speed over accuracy when choosing a parser; and parsing researchers should investigate models that improve speed further, even at some cost to accuracy. The study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most prominent tools in the modelling of behaviour are computational-logic systems, connectionist models of cognition, and models of uncertainty. Recent studies in cognitive science, artificial intelligence, and psychology have produced a number of cognitive models of reasoning, learning, and language that are underpinned by computation. In addition, efforts in computer science research have led to the development of cognitive computational systems integrating machine learning and automated reasoning. Such systems have shown promise in a range of applications, including computational biology, fault diagnosis, training and assessment in simulators, and software verification. This joint survey reviews the personal ideas and views of several researchers on neural-symbolic learning and reasoning. The article is organised in three parts: Firstly, we frame the scope and goals of neural-symbolic computation and have a look at the theoretical foundations. We then proceed to describe the realisations of neural-symbolic computation, systems, and applications. Finally we present the challenges facing the area and avenues for further research. There have been numerous breakthroughs with reinforcement learning in the recent years, perhaps most notably on Deep Reinforcement Learning successfully playing and winning relatively advanced computer games. There is undoubtedly an anticipation that Deep Reinforcement Learning will play a major role when the first AI masters the complicated game plays needed to beat a professional Real-Time Strategy game player. For this to be possible, there needs to be a game environment that targets and fosters AI research, and specifically Deep Reinforcement Learning. Some game environments already exist, however, these are either overly simplistic such as Atari 2600 or complex such as Starcraft II from Blizzard Entertainment. We propose a game environment in between Atari 2600 and Starcraft II, particularly targeting Deep Reinforcement Learning algorithm research. The environment is a variant of Tower Line Wars from Warcraft III, Blizzard Entertainment. Further, as a proof of concept that the environment can harbor Deep Reinforcement algorithms, we propose and apply a Deep Q-Reinforcement architecture. The architecture simplifies the state space so that it is applicable to Q-learning, and in turn improves performance compared to current state-of-the-art methods. Our experiments show that the proposed architecture can learn to play the environment well, and score 33% better than standard Deep Q-learning which in turn proves the usefulness of the game environment. The hard problem in artificial intelligence asks how the shuffling of syntactical symbols in a program can lead to systems which experience semantics and qualia. We address this question in three stages. First, we introduce a new class of human semantic symbols which appears when unexpected and drastic environmental change causes humans to become surprised, confused, uncertain, and in extreme cases, unresponsive, passive and dysfunctional. For this class of symbols, pre-learned programs become inoperative so these syntactical programs cannot be the source of experienced qualia. Second, we model the dysfunctional human response to a radically changed environment as being the natural response of any learning machine facing novel inputs from well outside its previous training set. In this situation, learning machines are unable to extract information from their input and will typically enter a dynamical state characterized by null outputs and a lack of response. This state immediately predicts and explains the characteristics of the semantic experiences of humans in similar circumstances. In the third stage, we consider learning machines trained to implement multiple functions in simple sequential programs using environmental data to specify subroutine names, control flow instructions, memory calls, and so on. Drastic change in any of these environmental inputs can again lead to inoperative programs. By examining changes specific to people or locations we can model human cognitive symbols featuring these dependencies, such as attachment and grief. Our approach links known dynamical machines states with human qualia and thus offers new insight into the hard problem of artificial intelligence. All artificial Intelligence (AI) systems make errors. These errors are unexpected, and differ often from the typical human mistakes ("non-human" errors). The AI errors should be corrected without damage of existing skills and, hopefully, avoiding direct human expertise. This paper presents an initial summary report of project taking new and systematic approach to improving the intellectual effectiveness of the individual AI by communities of AIs. We combine some ideas of learning in heterogeneous multiagent systems with new and original mathematical approaches for non-iterative corrections of errors of legacy AI systems. The mathematical foundations of AI non-destructive correction are presented and a series of new stochastic separation theorems is proven. These theorems provide a new instrument for the development, analysis, and assessment of machine learning methods and algorithms in high dimension. They demonstrate that in high dimensions and even for exponentially large samples, linear classifiers in their classical Fisher's form are powerful enough to separate errors from correct responses with high probability and to provide efficient solution to the non-destructive corrector problem. In particular, we prove some hypotheses formulated in our paper `Stochastic Separation Theorems' (Neural Networks, 94, 255--259, 2017), and answer one general problem published by Donoho and Tanner in 2009. Despite significant advances in artificial intelligence (AI) for computer vision, its application in medical imaging has been limited by the burden and limits of expert-generated labels. We used images from optical coherence tomography angiography (OCTA), a relatively new imaging modality that measures perfusion of the retinal vasculature, to train an AI algorithm to generate vasculature maps from standard structural optical coherence tomography (OCT) images of the same retinae, both exceeding the ability and bypassing the need for expert labeling. Deep learning was able to infer perfusion of microvasculature from structural OCT images with similar fidelity to OCTA and significantly better than expert clinicians (P < 0.00001). OCTA suffers from need of specialized hardware, laborious acquisition protocols, and motion artifacts; whereas our model works directly from standard OCT which are ubiquitous and quick to obtain, and allows unlocking of large volumes of previously collected standard OCT data both in existing clinical trials and clinical practice. This finding demonstrates a novel application of AI to medical imaging, whereby subtle regularities between different modalities are used to image the same body part and AI is used to generate detailed and accurate inferences of tissue function from structure imaging. This article presents an extensive literature review of technology based intervention methodologies for individuals facing Autism Spectrum Disorder (ASD). Reviewed methodologies include: contemporary Computer Aided Systems (CAS), Computer Vision Assisted Technologies (CVAT) and Virtual Reality (VR) or Artificial Intelligence-Assisted interventions. The research over the past decade has provided enough demonstrations that individuals of ASD have a strong interest in technology based interventions and can connect with them for longer durations without facing any trouble(s). Theses technology based interventions are useful for individuals facing autism in clinical settings as well as at home and classrooms. Despite showing great promise, research in developing an advanced technology based intervention that is clinically quantitative for ASD is minimal. Moreover, the clinicians are generally not convinced about the potential of the technology based interventions due to non-empirical nature of published results. A major reason behind this non-acceptability is a vast majority of studies on distinct intervention methodologies do not follow any specific standard or research design. Consequently, the data produced by these studies is minimally appealing to the clinical community. This research domain has a vast social impact as per official statistics given by the Autism Society of America, autism is the fastest growing developmental disability in the United States (US). The estimated annual cost in the US for diagnosis and treatment for ASD is 236-262 Billion US Dollars. The cost of up-bringing an ASD individual is estimated to be 1.4 million USD while statistics show 1% of the worlds' total population is suffering from ASD. Self-organizing complex systems typically are comprised of a large number of frequently similar components or events. Through their process, a pattern at the global-level of a system emerges solely from numerous interactions among the lower-level components of the system. Moreover, the rules specifying interactions among the system's components are executed using only local information, without reference to the global pattern, which, as in many real-world problems is not easily accessible or possible to be found. Stigmergy, a kind of indirect communication and learning by the environment found in social insects is a well know example of self-organization, providing not only vital clues in order to understand how the components can interact to produce a complex pattern, as can pinpoint simple biological non-linear rules and methods to achieve improved artificial intelligent adaptive categorization systems, critical for Data-Mining. On the present work it is our intention to show that a new type of Data-Mining can be designed based on Stigmergic paradigms, taking profit of several natural features of this phenomenon. By hybridizing bio-inspired Swarm Intelligence with Evolutionary Computation we seek for an entire distributed, adaptive, collective and cooperative self-organized Data-Mining. As a real-world, real-time test bed for our proposal, World-Wide-Web Mining will be used. Having that purpose in mind, Web usage Data was collected from the Monash University's Web site (Australia), with over 7 million hits every week. Results are compared to other recent systems, showing that the system presented is by far promising. The explosive growth of spatial data and extensive utilization of spatial databases emphasize the necessity for the automated discovery of spatial knowledge. In modern times, spatial data mining has emerged as an area of voluminous research. Forest fires are a chief environmental concern, causing economical and ecological damage while endangering human lives across the world. The fast or early detection of forest fires is a vital element for controlling such phenomenon. The application of remote sensing is at present a significant method for forest fires monitoring, particularly in vast and remote areas. Different methods have been presented by researchers for forest fire detection. The motivation behind this research is to obtain beneficial information from images in the forest spatial data and use the same in the determination of regions at the risk of fires by utilizing Image Processing and Artificial Intelligence techniques. This paper presents an intelligent system to detect the presence of forest fires in the forest spatial data using Artificial Neural Networks. The digital images in the forest spatial data are converted from RGB to XYZ color space and then segmented by employing anisotropic diffusion to identify the fire regions. Subsequently, Radial Basis Function Neural Network is employed in the design of the intelligent system, which is trained with the color space values of the segmented fire regions. Extensive experimental assessments on publicly available spatial data illustrated the efficiency of the proposed system in effectively detecting forest fires. The main purpose of this article is to describe potential benefits and applications of the SP theory, a unique attempt to simplify and integrate ideas across artificial intelligence, mainstream computing and human cognition, with information compression as a unifying theme. The theory, including a concept of multiple alignment, combines conceptual simplicity with descriptive and explanatory power in several areas including representation of knowledge, natural language processing, pattern recognition, several kinds of reasoning, the storage and retrieval of information, planning and problem solving, unsupervised learning, information compression, and human perception and cognition. In the SP machine -- an expression of the SP theory which is currently realised in the form of computer models -- there is potential for an overall simplification of computing systems, including software. As a theory with a broad base of support, the SP theory promises useful insights in many areas and the integration of structures and functions, both within a given area and amongst different areas. There are potential benefits in natural language processing (with potential for the understanding and translation of natural languages), the need for a versatile intelligence in autonomous robots, computer vision, intelligent databases, maintaining multiple versions of documents or web pages, software engineering, criminal investigations, the management of big data and gaining benefits from it, the semantic web, medical diagnosis, the detection of computer viruses, the economical transmission of data, and data fusion. Further development of these ideas would be facilitated by the creation of a high-parallel, web-based, open-source version of the SP machine, with a good user interface. This would provide a means for researchers to explore what can be done with the system and to refine it. In current perception systems applied to the rebuilding of the environment for intelligent vehicles, the part reserved to object association for the tracking is increasingly significant. This allows firstly to follow the objects temporal evolution and secondly to increase the reliability of environment perception. We propose in this communication the development of a multi-objects association algorithm with ambiguity removal entering into the design of such a dynamic perception system for intelligent vehicles. This algorithm uses the belief theory and data modelling with fuzzy mathematics in order to be able to handle inaccurate as well as uncertain information due to imperfect sensors. These theories also allow the fusion of numerical as well as symbolic data. We develop in this article the problem of matching between known and perceived objects. This makes it possible to update a dynamic environment map for a vehicle. The belief theory will enable us to quantify the belief in the association of each perceived object with each known object. Conflicts can appear in the case of object appearance or disappearance, or in the case of a confused situation or bad perception. These conflicts are removed or solved using an assignment algorithm, giving a solution called the " best " and so ensuring the tracking of some objects present in our environment. During the last two decades there has been a growing interest in Particle Filtering (PF). However, PF suffers from two long-standing problems that are referred to as sample degeneracy and impoverishment. We are investigating methods that are particularly efficient at Particle Distribution Optimization (PDO) to fight sample degeneracy and impoverishment, with an emphasis on intelligence choices. These methods benefit from such methods as Markov Chain Monte Carlo methods, Mean-shift algorithms, artificial intelligence algorithms (e.g., Particle Swarm Optimization, Genetic Algorithm and Ant Colony Optimization), machine learning approaches (e.g., clustering, splitting and merging) and their hybrids, forming a coherent standpoint to enhance the particle filter. The working mechanism, interrelationship, pros and cons of these approaches are provided. In addition, Approaches that are effective for dealing with high-dimensionality are reviewed. While improving the filter performance in terms of accuracy, robustness and convergence, it is noted that advanced techniques employed in PF often causes additional computational requirement that will in turn sacrifice improvement obtained in real life filtering. This fact, hidden in pure simulations, deserves the attention of the users and designers of new filters. The majority of big data is unstructured and of this majority the largest chunk is text. While data mining techniques are well developed and standardized for structured, numerical data, the realm of unstructured data is still largely unexplored. The general focus lies on information extraction, which attempts to retrieve known information from text. The Holy Grail, however is knowledge discovery, where machines are expected to unearth entirely new facts and relations that were not previously known by any human expert. Indeed, understanding the meaning of text is often considered as one of the main characteristics of human intelligence. The ultimate goal of semantic artificial intelligence is to devise software that can understand the meaning of free text, at least in the practical sense of providing new, actionable information condensed out of a body of documents. As a stepping stone on the road to this vision I will introduce a totally new approach to drug research, namely that of identifying relevant information by employing a self-organizing semantic engine to text mine large repositories of biomedical research papers, a technique pioneered by Merck with the InfoCodex software. I will describe the methodology and a first successful experiment for the discovery of new biomarkers and phenotypes for diabetes and obesity on the basis of PubMed abstracts, public clinical trials and Merck internal documents. The reported approach shows much promise and has potential to impact fundamentally pharmaceutical research as a way to shorten time-to-market of novel drugs, and for early recognition of dead ends. Remote science operations require automated systems that can both act and react with minimal human intervention. One such vision is that of an intelligent instrument that collects data in an automated fashion, and based on what it learns, decides which new measurements to take. This innovation implements experimental design and unites it with data analysis in such a way that it completes the cycle of learning. This cycle is the basis of the Scientific Method. The three basic steps of this cycle are hypothesis generation, inquiry, and inference. Hypothesis generation is implemented by artificially supplying the instrument with a parameterized set of possible hypotheses that might be used to describe the physical system. The act of inquiry is handled by an inquiry engine that relies on Bayesian adaptive exploration where the optimal experiment is chosen as the one which maximizes the expected information gain. The inference engine is implemented using the nested sampling algorithm, which provides the inquiry engine with a set of posterior samples from which the expected information gain can be estimated. With these computational structures in place, the instrument will refine its hypotheses, and repeat the learning cycle by taking measurements until the system under study is described within a pre-specified tolerance. We will demonstrate our first attempts toward achieving this goal with an intelligent instrument constructed using the LEGO MINDSTORMS NXT robotics platform. A typical modern optimization technique is usually either heuristic or metaheuristic. This technique has managed to solve some optimization problems in the research area of science, engineering, and industry. However, implementation strategy of metaheuristic for accuracy improvement on convolution neural networks (CNN), a famous deep learning method, is still rarely investigated. Deep learning relates to a type of machine learning technique, where its aim is to move closer to the goal of artificial intelligence of creating a machine that could successfully perform any intellectual tasks that can be carried out by a human. In this paper, we propose the implementation strategy of three popular metaheuristic approaches, that is, simulated annealing, differential evolution, and harmony search, to optimize CNN. The performances of these metaheuristic methods in optimizing CNN on classifying MNIST and CIFAR dataset were evaluated and compared. Furthermore, the proposed methods are also compared with the original CNN. Although the proposed methods show an increase in the computation time, their accuracy has also been improved (up to 7.14 percent). The next generation wireless networks (i.e. 5G and beyond), which would be extremely dynamic and complex due to the ultra-dense deployment of heterogeneous networks (HetNets), poses many critical challenges for network planning, operation, management and troubleshooting. At the same time, generation and consumption of wireless data are becoming increasingly distributed with ongoing paradigm shift from people-centric to machine-oriented communications, making the operation of future wireless networks even more complex. In mitigating the complexity of future network operation, new approaches of intelligently utilizing distributed computational resources with improved context-awareness becomes extremely important. In this regard, the emerging fog (edge) computing architecture aiming to distribute computing, storage, control, communication, and networking functions closer to end users, have a great potential for enabling efficient operation of future wireless networks. These promising architectures make the adoption of artificial intelligence (AI) principles which incorporate learning, reasoning and decision-making mechanism, as natural choices for designing a tightly integrated network. Towards this end, this article provides a comprehensive survey on the utilization of AI integrating machine learning, data analytics and natural language processing (NLP) techniques for enhancing the efficiency of wireless network operation. In particular, we provide comprehensive discussion on the utilization of these techniques for efficient data acquisition, knowledge discovery, network planning, operation and management of the next generation wireless networks. A brief case study utilizing the AI techniques for this network has also been provided. We present the UK Robotics and Artificial Intelligence Hub for Offshore Robotics for Certification of Assets (ORCA Hub), a 3.5 year EPSRC funded, multi-site project. The ORCA Hub vision is to use teams of robots and autonomous intelligent systems (AIS) to work on offshore energy platforms to enable cheaper, safer and more efficient working practices. The ORCA Hub will research, integrate, validate and deploy remote AIS solutions that can operate with existing and future offshore energy assets and sensors, interacting safely in autonomous or semi-autonomous modes in complex and cluttered environments, co-operating with remote operators. The goal is that through the use of such robotic systems offshore, the need for personnel will decrease. To enable this to happen, the remote operator will need a high level of situation awareness and key to this is the transparency of what the autonomous systems are doing and why. This increased transparency will facilitate a trusting relationship, which is particularly key in high-stakes, hazardous situations. We show that several constraint propagation algorithms (also called (local) consistency, consistency enforcing, Waltz, filtering or narrowing algorithms) are instances of algorithms that deal with chaotic iteration. To this end we propose a simple abstract framework that allows us to classify and compare these algorithms and to establish in a uniform way their basic properties. The assumptions needed to prove Cox's Theorem are discussed and examined. Various sets of assumptions under which a Cox-style theorem can be proved are provided, although all are rather strong and, arguably, not natural. The goal of this paper is to extend classical logic with a generalized notion of inductive definition supporting positive and negative induction, to investigate the properties of this logic, its relationships to other logics in the area of non-monotonic reasoning, logic programming and deductive databases, and to show its application for knowledge representation by giving a typology of definitional knowledge. In this paper, we outline the prototype of an automated inference tool, called QUIP, which provides a uniform implementation for several nonmonotonic reasoning formalisms. The theoretical basis of QUIP is derived from well-known results about the computational complexity of nonmonotonic logics and exploits a representation of the different reasoning tasks in terms of quantified boolean formulae. We construct a probabilistic coherence measure for information sets which determines a partial coherence ordering. This measure is applied in constructing a criterion for expanding our beliefs in the face of new information. A number of idealizations are being made which can be relaxed by an appeal to Bayesian Networks. The paper reports on first preliminary results and insights gained in a project aiming at implementing the fluent calculus using methods and techniques based on binary decision diagrams. After reporting on an initial experiment showing promising results we discuss our findings concerning various techniques and heuristics used to speed up the reasoning process. The current document contains a brief description of a system for Reasoning about Actions and Change called PAL (Pertinence Action Language) which makes use of several reasoning properties extracted from a Temporal Expert Systems tool called Medtool. In an earlier work, we have presented operations of belief change which only affect the relevant part of a belief base. In this paper, we propose the application of the same strategy to the problem of model-based diangosis. We first isolate the subset of the system description which is relevant for a given observation and then solve the diagnosis problem for this subset. XNMR is a system designed to explore the results of combining the well-founded semantics system XSB with the stable-models evaluator SMODELS. Its main goal is to work as a tool for fast and interactive exploration of knowledge bases. In this paper we present a rule based formalism for filtering variables domains of constraints. This formalism is well adapted for solving dynamic CSP. We take diagnosis as an instance problem to illustrate the use of these rules. A diagnosis problem is seen like finding all the minimal sets of constraints to be relaxed in the constraint network that models the device to be diagnosed We study the topological models of a logic of knowledge for topological reasoning, introduced by Larry Moss and Rohit Parikh. Among our results is a solution of a conjecture by the formentioned authors, finite satisfiability property and decidability for the theory of topological models. In this thesis we shall present two logical systems, MP and MP, for the purpose of reasoning about knowledge and effort. These logical systems will be interpreted in a spatial context and therefore, the abstract concepts of knowledge and effort will be defined by concrete mathematical concepts. In fuzzy propositional logic, to a proposition a partial truth in [0,1] is assigned. It is well known that under certain circumstances, fuzzy logic collapses to classical logic. In this paper, we will show that under dual conditions, fuzzy logic collapses to four-valued (relevance) logic, where propositions have truth-value true, false, unknown, or contradiction. As a consequence, fuzzy entailment may be considered as ``in between'' four-valued (relevance) entailment and classical entailment. The constraint of difference is known to the constraint programming community since Lauriere introduced Alice in 1978. Since then, several solving strategies have been designed for this constraint. In this paper we give both a practical overview and an abstract comparison of these different strategies. We make a connection between classical polytopes called zonotopes and Support Vector Machine (SVM) classifiers. We combine this connection with the ellipsoid method to give some new theoretical results on training SVMs. We also describe some special properties of soft margin C-SVMs as parameter C goes to infinity. The paper outlines ongoing research on logic-based tools for the analysis and representation of legal contracts of the kind frequently encountered in large-scale engineering projects and complex, long-term trading agreements. We consider both contract formation and contract performance, in each case identifying the representational issues and the prospects for providing automated support tools. The Expansion property considered by researchers in Social Choice is shown to correspond to a logical property of nonmonotonic consequence relations that is the {\em pure}, i.e., not involving connectives, version of a previously known weak rationality condition. The assumption that the union of two definable sets of models is definable is needed for the soundness part of the result. Information personalization is fertile ground for application of AI techniques. In this article I relate personalization to the ability to capture partial information in an information-seeking interaction. The specific focus is on personalizing interactions at web sites. Using ideas from partial evaluation and explanation-based generalization, I present a modeling methodology for reasoning about personalization. This approach helps identify seven tiers of `personable traits' in web sites. We address a general representation problem for belief change, and describe two interrelated representations for iterative non-prioritized change: a logical representation in terms of persistent epistemic states, and a constructive representation in terms of flocks of bases. An extension of an abstract argumentation framework, called collective argumentation, is introduced in which the attack relation is defined directly among sets of arguments. The extension turns out to be suitable, in particular, for representing semantics of disjunctive logic programs. Two special kinds of collective argumentation are considered in which the opponents can share their arguments. It has long been an open question whether the formula XCB = EpEEEpqErqr is, with the rules of substitution and detachment, a single axiom for the classical equivalential calculus. This paper answers that question affirmatively, thus completing a search for all such eleven-symbol single axioms that began seventy years ago. This paper describes a novel approach to unsupervised learning that has been developed within a framework of "information compression by multiple alignment, unification and search" (ICMAUS), designed to integrate learning with other AI functions such as parsing and production of language, fuzzy pattern recognition, probabilistic and exact forms of reasoning, and others. We give a brief introduction to the AIXI model, which unifies and overcomes the limitations of sequential decision theory and universal Solomonoff induction. While the former theory is suited for active agents in known environments, the latter is suited for passive prediction of unknown environments. An experimental server for stock trading autonomous agents is presented and made available, together with an agent shell for swift development. The server, written in Java, was implemented as proof-of-concept for an agent trade server for a real financial exchange. Diversity is an important aspect of highly efficient multi-agent teams. We introduce the main factors that drive a multi-agent system in either direction along the diversity scale. A metric for diversity is described, and we speculate on the concept of transient diversity. Finally, an experiment on social entropy using a RoboCup simulated soccer team is presented. We describe WSAT(cc), a local-search solver for computing models of theories in the language of propositional logic extended by cardinality atoms. WSAT(cc) is a processing back-end for the logic PS+, a recently proposed formalism for answer-set programming. Many different rules for decision making have been introduced in the literature. We show that a notion of generalized expected utility proposed in Part I of this paper is a universal decision rule, in the sense that it can represent essentially all other decision rules. Searle's Chinese Room argument is refuted by showing that he has actually given two different versions of the room, which fail for different reasons. Hence, Searle does not achieve his stated goal of showing ``that a system could have input and output capabilities that duplicated those of a native Chinese speaker and still not understand Chinese''. Reiter's original definition of default logic allows for the application of a default that contradicts a previously applied one. We call failure this condition. The possibility of generating failures has been in the past considered as a semantical problem, and variants have been proposed to solve it. We show that it is instead a computational feature that is needed to encode some domains into default logic. This document describes syntax, semantics and implementation guidelines in order to enrich the DLV system with the possibility to make external C function calls. This feature is realized by the introduction of parametric external predicates, whose extension is not specified through a logic program but implicitly computed through external code. Defeasible logic is a rule-based nonmonotonic logic, with both strict and defeasible rules, and a priority relation on rules. We show that inference in the propositional form of the logic can be performed in linear time. This contrasts markedly with most other propositional nonmonotonic logics, in which inference is intractable. Generalized evolutionary algorithm based on Tsallis canonical distribution is proposed. The algorithm uses Tsallis generalized canonical distribution to weigh the configurations for `selection' instead of Gibbs-Boltzmann distribution. Our simulation results show that for an appropriate choice of non-extensive index that is offered by Tsallis statistics, evolutionary algorithms based on this generalization outperform algorithms based on Gibbs-Boltzmann distribution. We consider the well-known family ALC(D) of description logics with a concrete domain, and provide first results on a framework obtained by augmenting ALC(D) atemporal roles and aspatial concrete domain with temporal roles and a spatial concrete domain. We define a quantitative constraint language subsuming two calculi well-known in QSR (Qualitative Spatial Reasoning): Frank's cone-shaped and projection-based calculi of cardinal direction relations. We show how to solve a CSP (Constraint Satisfaction Problem) expressed in the language. This paper reports about experiments with GermaNet as a resource within domain specific document analysis. The main question to be answered is: How is the coverage of GermaNet in a specific domain? We report about results of a field test of GermaNet for analyses of autopsy protocols and present a sketch about the integration of GermaNet inside XDOC. Our remarks will contribute to a GermaNet user's wish list. The aim of the project presented in this paper is to design a system for an NLG architecture, which supports the documentation process of eBusiness models. A major task is to enrich the formal description of an eBusiness model with additional information needed in an NLG task. The optimal complexity of neural networks is achieved when the self-organization principles is used to eliminate the contradictions existing in accordance with the K. Godel theorem about incompleteness of the systems based on axiomatics. The principle of S. Beer exterior addition the Heuristic Group Method of Data Handling by A. Ivakhnenko realized is used. This paper describes and evaluates the Metalinguistic Operation Processor (MOP) system for automatic compilation of metalinguistic information from technical and scientific documents. This system is designed to extract non-standard terminological resources that we have called Metalinguistic Information Databases (or MIDs), in order to help update changing glossaries, knowledge bases and ontologies, as well as to reflect the metastable dynamics of special-domain knowledge. To the reduct problems of decision system, the paper proposes the notion of dynamic core according to the dynamic reduct model. It describes various formal definitions of dynamic core, and discusses some properties about dynamic core. All of these show that dynamic core possesses the essential characters of the feature core. We study and compare the learning dynamics of two universal learning algorithms, one based on Bayesian learning and the other on prediction with expert advice. Both approaches have strong asymptotic performance guarantees. When confronted with the task of finding good long-term strategies in repeated 2x2 matrix games, they behave quite differently. Evolutionary computing (EC) is an exciting development in Computer Science. It amounts to building, applying and studying algorithms based on the Darwinian principles of natural selection. In this paper we briefly introduce the main concepts behind evolutionary computing. We present the main components all evolutionary algorithms (EA), sketch the differences between different types of EAs and survey application areas ranging from optimization, modeling and simulation to entertainment. Two example of Evolutionary System Identification are presented to highlight the importance of incorporating Domain Knowledge: the discovery of an analytical indentation law in Structural Mechanics using constrained Genetic Programming, and the identification of the repartition of underground velocities in Seismic Prospection. Critical issues for sucessful ESI are discussed in the light of these results. In this short paper we present a linear constraint solver for the UniCalc system, an environment for reliable solution of mathematical modeling problems. In this paper we propose a new family of Belief Conditioning Rules (BCRs) for belief revision. These rules are not directly related with the fusion of several sources of evidence but with the revision of a belief assignment available at a given time according to the new truth (i.e. conditioning constraint) one has about the space of solutions of the problem. A modification of OWL-S regarding parameter description is proposed. It is strictly based on Description Logic. In addition to class description of parameters it also allows the modelling of relations between parameters and the precise description of the size of data to be supplied to a service. In particular, it solves two major issues identified within current proposals for a Semantic Web Service annotation standard. The aim of this paper is to provide a sound framework for addressing a difficult problem: the automatic construction of an autonomous agent's modular architecture. We combine results from two apparently uncorrelated domains: Autonomous planning through Markov Decision Processes and a General Data Clustering Approach using a kernel-like method. Our fundamental idea is that the former is a good framework for addressing autonomy whereas the latter allows to tackle self-organizing problems. We develop a system which must be able to perform the same inferences that a human reader of an accident report can do and more particularly to determine the apparent causes of the accident. We describe the general framework in which we are situated, linguistic and semantic levels of the analysis and the inference rules used by the system. Kanerva's Binary Spatter Codes are reformulated in terms of geometric algebra. The key ingredient of the construction is the representation of XOR binding in terms of geometric product. Creation procedure of associative patterns ensemble in terms of formal logic with using neural net-work (NN) model is formulated. It is shown that the associative patterns set is created by means of unique procedure of NN work which having individual parameters of entrance stimulus transformation. It is ascer-tained that the quantity of the selected associative patterns possesses is a constant. In this paper, the traditional k-modes clustering algorithm is extended by weighting attribute value matches in dissimilarity computation. The use of attribute value weighting technique makes it possible to generate clusters with stronger intra-similarities, and therefore achieve better clustering performance. Experimental results on real life datasets show that these value weighting based k-modes algorithms are superior to the standard k-modes algorithm with respect to clustering accuracy. In this paper, we determine the complexity of the satisfiability problem for various logics obtained by adding numerical quantifiers, and other constructions, to the traditional syllogistic. In addition, we demonstrate the incompleteness of some recently proposed proof-systems for these logics. The problem of matchmaking in electronic social networks is formulated as an optimization problem. In particular, a function measuring the matching degree of fields of interest of a search profile with those of an advertising profile is proposed. We try a conceptual analysis of inheritance diagrams, first in abstract terms, and then compare to "normality" and the "small/big sets" of preferential and related reasoning. The main ideas are about nodes as truth values and information sources, truth comparison by paths, accessibility or relevance of information by paths, relative normality, and prototypical reasoning. One of the outstanding problems of philosophy of science and mathematics today is whether there is just "one" unique mathematics or the same can be bifurcated into "pure" and "applied" categories. A novel solution for this problem is offered here. This will allow us to appreciate the manner in which mathematics acts as an exact and precise language of nature. This has significant implications for Artificial Intelligence. Goedel's Incompleteness Theorems have the same scientific status as Einstein's principle of relativity, Heisenberg's uncertainty principle, and Watson and Crick's double helix model of DNA. Our aim is to discuss some new faces of the incompleteness phenomenon unveiled by an information-theoretic approach to randomness and recent developments in quantum computing. This paper has been withdrawn by the author. This draft is withdrawn for its poor quality in english, unfortunately produced by the author when he was just starting his science route. Look at the ICML version instead: http://icml2008.cs.helsinki.fi/papers/111.pdf When will the Internet become aware of itself? In this note the problem is approached by asking an alternative question: Can the Internet cope with stress? By extrapolating the psychological difference between coping and defense mechanisms a distributed software experiment is outlined which could reject the hypothesis that the Internet is not a conscious entity. We argue for a compositional semantics grounded in a strongly typed ontology that reflects our commonsense view of the world and the way we talk about it in ordinary language. Assuming the existence of such a structure, we show that the semantics of various natural language phenomena may become nearly trivial. This paper provides a new, decidable definition of the higher- order recursive path ordering in which type comparisons are made only when needed, therefore eliminating the need for the computability clo- sure, and bound variables are handled explicitly, making it possible to handle recursors for arbitrary strictly positive inductive types. We present an algorithm for effectively generating binary sequences which would be rated by people as highly likely to have been generated by a random process, such as flipping a fair coin. This theoretical work defines the measure of autocorrelation of evolvability in the context of neutral fitness landscape. This measure has been studied on the classical MAX-SAT problem. This work highlight a new characteristic of neutral fitness landscapes which allows to design new adapted metaheuristic. The mnesor theory is the adaptation of vectors to artificial intelligence. The scalar field is replaced by a lattice. Addition becomes idempotent and multiplication is interpreted as a selection operation. We also show that mnesors can be the foundation for a linear calculus. In this article the algorithm for transformation of logic functions which are given by truth tables is considered. The suggested algorithm allows the transformation of many-valued logic functions with the required number of variables and can be looked in this sense as universal. The treatment of both aleatory and epistemic uncertainty by recent methods often requires an high computational effort. In this abstract, we propose a numerical sampling method allowing to lighten the computational burden of treating the information by means of so-called fuzzy random variables. This paper addresses the question of which models fit with information concerning the preferences of the decision maker over each attribute, and his preferences about aggregation of criteria (interacting criteria). We show that the conditions induced by these information plus some intuitive conditions lead to a unique possible aggregation operator: the Choquet integral. The data-complexity of both satisfiability and finite satisfiability for the two-variable fragment with counting is NP-complete; the data-complexity of both query-answering and finite query-answering for the two-variable guarded fragment with counting is co-NP-complete. We discuss metacognitive modelling as an enhancement to cognitive modelling and computing. Metacognitive control mechanisms should enable AI systems to self-reflect, reason about their actions, and to adapt to new situations. In this respect, we propose implementation details of a knowledge taxonomy and an augmented data mining life cycle which supports a live integration of obtained models. This report describes experimental results for a set of benchmarks on program verification. It compares the capabilities of CPBVP "Constraint Programming framework for Bounded Program Verification" [4] with the following frameworks: ESC/Java, CBMC, Blast, EUREKA and Why. This paper generalizes the traditional statistical concept of prediction intervals for arbitrary probability density functions in high-dimensional feature spaces by introducing significance level distributions, which provides interval-independent probabilities for continuous random variables. The advantage of the transformation of a probability density function into a significance level distribution is that it enables one-class classification or outlier detection in a direct manner. We present a domain-independent algorithm that computes macros in a novel way. Our algorithm computes macros "on-the-fly" for a given set of states and does not require previously learned or inferred information, nor prior domain knowledge. The algorithm is used to define new domain-independent tractable classes of classical planning that are proved to include \emph{Blocksworld-arm} and \emph{Towers of Hanoi}. The effects of policy sharing between agents in a multi-agent dynamical system has not been studied extensively. I simulate a system of agents optimizing the same task using reinforcement learning, to study the effects of different population densities and policy sharing. I demonstrate that sharing policies decreases the time to reach asymptotic behavior, and results in improved asymptotic behavior. We present a Monte Carlo algorithm for efficiently finding near optimal moves and bids in the game of Bidding Hex. The algorithm is based on the recent solution of Random-Turn Hex by Peres, Schramm, Sheffield, and Wilson together with Richman's work connecting random-turn games to bidding games. In this paper we present the N-norms/N-conorms in neutrosophic logic and set as extensions of T-norms/T-conorms in fuzzy logic and set. Also, as an extension of the Intuitionistic Fuzzy Topology we present the Neutrosophic Topologies. We propose a new extended format to represent constraint networks using XML. This format allows us to represent constraints defined either in extension or in intension. It also allows us to reference global constraints. Any instance of the problems CSP (Constraint Satisfaction Problem), QCSP (Quantified CSP) and WCSP (Weighted CSP) can be represented using this format. We present a convenient notation for positive/negative-conditional equations. The idea is to merge rules specifying the same function by using case-, if-, match-, and let-expressions. Based on the presented macro-rule-construct, positive/negative-conditional equational specifications can be written on a higher level. A rewrite system translates the macro-rule-constructs into positive/negative-conditional equations. We propose a new family of constraints which combine together lexicographical ordering constraints for symmetry breaking with other common global constraints. We give a general purpose propagator for this family of constraints, and show how to improve its complexity by exploiting properties of the included global constraints. Many reinforcement learning exploration techniques are overly optimistic and try to explore every state. Such exploration is impossible in environments with the unlimited number of states. I propose to use simulated exploration with an optimistic model to discover promising paths for real exploration. This reduces the needs for the real exploration. We describe a variant of resolution rule of proof and show that it is complete for stable semantics of logic programs. We show applications of this result. We discuss the role of random basis function approximators in modeling and control. We analyze the published work on random basis function approximators and demonstrate that their favorable error rate of convergence O(1/n) is guaranteed only with very substantial computational resources. We also discuss implications of our analysis for applications of neural networks in modeling and control. We present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult. We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives. In a case study we investigate whether off the shelf higher-order theorem provers and model generators can be employed to automate reasoning in and about quantified multimodal logics. In our experiments we exploit the new TPTP infrastructure for classical higher-order logic. Could we define I? Throughout this article we give a negative answer to this question. More exactly, we show that there is no definition for I in a certain way. But this negative answer depends on our definition of definability. Here, we try to consider sufficient generalized definition of definability. In the middle of paper a paradox will arise which makes us to modify the way we use the concept of property and definability. I discuss (ontologies_and_ontological_knowledge_bases / formal_methods_and_theories) duality and its category theory extensions as a step toward a solution to Knowledge-Based Systems Theory. In particular I focus on the example of the design of elements of ontologies and ontological knowledge bases of next three electronic courses: Foundations of Research Activities, Virtual Modeling of Complex Systems and Introduction to String Theory. Mnesors are defined as elements of a semimodule over the min-plus integers. This two-sorted structure is able to merge graduation properties of vectors and idempotent properties of boolean numbers, which makes it appropriate for hybrid systems. We apply it to the control of an inverted pendulum and design a full logical controller, that is, without the usual algebra of real numbers. In this paper we introduce two new DSm fusion conditioning rules with example, and as a generalization of them a class of DSm fusion conditioning rules, and then extend them to a class of DSm conditioning rules. To capture the uncertainty of information or knowledge in information systems, various information granulations, also known as knowledge granulations, have been proposed. Recently, several axiomatic definitions of information granulation have been introduced. In this paper, we try to improve these axiomatic definitions and give a universal construction of information granulation by relating information granulations with a class of functions of multiple variables. We show that the improved axiomatic definition has some concrete information granulations in the literature as instances. The aim of this paper is to introduce a logic in which nouns and verbs are handled together as a deductive reasoning, and also to observe the relationship between nouns and verbs as well as between logics and conversations. We introduce the weighted CFG constraint and propose a propagation algorithm that enforces domain consistency in $O(n^3|G|)$ time. We show that this algorithm can be decomposed into a set of primitive arithmetic constraints without hindering propagation. LSCS is a satellite workshop of the international conference on principles and practice of Constraint Programming (CP), since 2004. It is devoted to local search techniques in constraint satisfaction, and focuses on all aspects of local search techniques, including: design and implementation of new algorithms, hybrid stochastic-systematic search, reactive search optimization, adaptive search, modeling for local-search, global constraints, flexibility and robustness, learning methods, and specific applications. The fundamental laws of quantum world upsets the logical foundation of classic physics. They are completely counter-intuitive with many bizarre behaviors. However, this paper shows that they may make sense from the perspective of a general decision-optimization principle for cooperation. This principle also offers a generalization of Nash equilibrium, a key concept in game theory, for better payoffs and stability of game playing. Traditional staging is based on a formal approach of similarity leaning on dramaturgical ontologies and instanciation variations. Inspired by interactive data mining, that suggests different approaches, we give an overview of computer science and theater researches using computers as partners of the actor to escape the a priori specification of roles. We present an approach to semi-supervised learning based on an exponential family characterization. Our approach generalizes previous work on coupled priors for hybrid generative/discriminative models. Our model is more flexible and natural than previous approaches. Experimental results on several data sets show that our approach also performs better in practice. RefereeToolbox is a java package implementing combination operators for fusing evidences. It is downloadable from: http://refereefunction.fredericdambreville.com/releases RefereeToolbox is based on an interpretation of the fusion rules by means of Referee Functions. This approach implies a dissociation between the definition of the combination and its actual implementation, which is common to all referee-based combinations. As a result, RefereeToolbox is designed with the aim to be generic and evolutive. We present in this paper some examples of how to compute by hand the PCR5 fusion rule for three sources, so the reader will better understand its mechanism. We also take into consideration the importance of sources, which is different from the classical discounting of sources. In case of realization of successful business, gain analysis is essential. In this paper we have cited some new techniques of gain expectation on the basis of neural property of perceptron. Support rule and Sequence mining based artificial intelligence oriented practices have also been done in this context. In the view of above fuzzy and statistical based gain sensing is also pointed out. We have an audacious dream, we would like to develop a simulation and virtual reality system to support the decision making in European football (soccer). In this review, we summarize the efforts that we have made to fulfil this dream until recently. In addition, an introductory version of FerSML (Footballer and Football Simulation Markup Language) is presented in this paper. We report recent research on computing with biology-based neural network models by means of physics-based opto-electronic hardware. New technology provides opportunities for very-high-speed computation and uncovers problems obstructing the wide-spread use of this new capability. The Computation Modeling community may be able to offer solutions to these cross-boundary research problems. One possible escape from the Gibbard-Satterthwaite theorem is computational complexity. For example, it is NP-hard to compute if the STV rule can be manipulated. However, there is increasing concern that such results may not re ect the difficulty of manipulation in practice. In this tutorial, I survey recent results in this area. Symmetry is a common feature of many combinatorial problems. Unfortunately eliminating all symmetry from a problem is often computationally intractable. This paper argues that recent parameterized complexity results provide insight into that intractability and help identify special cases in which symmetry can be dealt with more tractably The Border algorithm and the iPred algorithm find the Hasse diagrams of FCA lattices. We show that they can be generalized to arbitrary lattices. In the case of iPred, this requires the identification of a join-semilattice homomorphism into a distributive lattice. This research applies ideas from argumentation theory in the context of semantic wikis, aiming to provide support for structured-large scale argumentation between human agents. The implemented prototype is exemplified by modelling the MMR vaccine controversy. Semantic wikis, wikis enhanced with Semantic Web technologies, are appropriate systems for community-authored knowledge models. They are particularly suitable for scientific collaboration. This paper details the design principles ofWikiBridge, a semantic wiki. In this report, we propose a quick survey of the currently known techniques for encoding a Boolean cardinality constraint into a CNF formula, and we discuss about the relevance of these encodings. We also propose models to facilitate analysis and design of CNF encodings for Boolean constraints. BoolVar/PB is an open source java library dedicated to the translation of pseudo-Boolean constraints into CNF formulae. Input constraints can be categorized with tags. Several encoding schemes are implemented in a way that each input constraint can be translated using one or several encoders, according to the related tags. The library can be easily extended by adding new encoders and / or new output formats. Four options for assigning a meaning to Islamic Logic are surveyed including a new proposal for an option named "Real Islamic Logic" (RIL). That approach to Islamic Logic should serve modern Islamic objectives in a way comparable to the functionality of Islamic Finance. The prospective role of RIL is analyzed from several perspectives: (i) parallel distributed systems design, (ii) reception by a community structured audience, (iii) informal logic and applied non-classical logics, and (iv) (in)tractability and artificial intelligence. We solve constraint satisfaction problems through translation to answer set programming (ASP). Our reformulations have the property that unit-propagation in the ASP solver achieves well defined local consistency properties like arc, bound and range consistency. Experiments demonstrate the computational value of this approach. Two distinct algorithms are presented to extract (schemata of) resolution proofs from closed tableaux for propositional schemata. The first one handles the most efficient version of the tableau calculus but generates very complex derivations (denoted by rather elaborate rewrite systems). The second one has the advantage that much simpler systems can be obtained, however the considered proof procedure is less efficient. Logic programming is a powerful paradigm for programming autonomous agents in dynamic domains, as witnessed by languages such as Golog and Flux. In this work we present ALPprolog, an expressive, yet efficient, logic programming language for the online control of agents that have to reason about incomplete information and sensing actions. In this paper, we investigate the following question: how could you write such computer programs that can work like conscious beings? The motivation behind this question is that we want to create such applications that can see the future. The aim of this paper is to provide an overall conceptual framework for this new approach to machine consciousness. So we introduce a new programming paradigm called Consciousness Oriented Programming (COP). Self-Organizing Maps are commonly used for unsupervised learning purposes. This paper is dedicated to the certain modification of SOM called SOMN (Self-Organizing Mixture Networks) used as a mechanism for representing grayscale digital images. Any grayscale digital image regarded as a distribution function can be approximated by the corresponding Gaussian mixture. In this paper, the use of SOMN is proposed in order to obtain such approximations for input grayscale images in unsupervised manner. Two different conceptions of emergence are reconciled as two instances of the phenomenon of detection. In the process of comparing these two conceptions, we find that the notions of complexity and detection allow us to form a unified definition of emergence that clearly delineates the role of the observer. The aim of this paper is to announce the release of a novel system for abstract argumentation which is based on decomposition and dynamic programming. We provide first experimental evaluations to show the feasibility of this approach. In this article, we present a corpus of dialogues between a schizophrenic speaker and an interlocutor who drives the dialogue. We had identified specific discontinuities for paranoid schizophrenics. We propose a modeling of these discontinuities with S-DRT (its pragmatic part) We present the Linux package configuration tool aspcud based on Answer Set Programming. In particular, we detail aspcud's preprocessor turning a CUDF specification into a set of logical facts. We study the complexity of reasoning in abstracts argumentation frameworks close to graph classes that allow for efficient reasoning methods, i.e.\ to one of the classes of acyclic, noeven, biparite and symmetric AFs. In this work we show that certain reasoning problems on the second level of the polynomial hierarchy still maintain their full complexity when restricted to instances of fixed distance to one of the above graph classes. We discuss the frequent pattern mining problem in a general setting. From an analysis of abstract representations, summarization and frequent pattern mining, we arrive at a generalization of the problem. Then, we show how the problem can be cast into the powerful language of algorithmic information theory. This allows us to formulate a simple algorithm to mine for all frequent patterns. We propose a method called EDML for learning MAP parameters in binary Bayesian networks under incomplete data. The method assumes Beta priors and can be used to learn maximum likelihood parameters when the priors are uninformative. EDML exhibits interesting behaviors, especially when compared to EM. We introduce EDML, explain its origin, and study some of its properties both analytically and empirically. We show that the existence of a computationally efficient calibration algorithm, with a low weak calibration rate, would imply the existence of an efficient algorithm for computing approximate Nash equilibria - thus implying the unlikely conclusion that every problem in PPAD is solvable in polynomial time. Relational representations in reinforcement learning allow for the use of structural information like the presence of objects and relationships between them in the description of value functions. Through this paper, we show that such representations allow for the inclusion of background knowledge that qualitatively describes a state and can be used to design agents that demonstrate learning behavior in domains with large state and actions spaces such as computer games. We introduce a family of new equational semantics for argumentation networks which can handle odd and even loops in a uniform manner. We offer one version of equational semantics which is equivalent to CF2 semantics, and a better version which gives the same results as traditional Dung semantics for even loops but can still handle odd loops. This paper takes new look on language and knowledge modelling for corpus linguistics. Using ideas of Chaitin, a line of argument is made against language/knowledge separation in Natural Language Processing. A simplistic model, that generalises approaches to language and knowledge, is proposed. One of hypothetical consequences of this model is Strong AI. The article discusses some applications of fuzzy logic ideas to formalizing of the Case-Based Reasoning (CBR) process and to measuring the effectiveness of CBR systems Learning in Riemannian orbifolds is motivated by existing machine learning algorithms that directly operate on finite combinatorial structures such as point patterns, trees, and graphs. These methods, however, lack statistical justification. This contribution derives consistency results for learning problems in structured domains and thereby generalizes learning in vector spaces and manifolds. Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Level - Adult (CARLA) baseline score was most significant in predicting outcome over time (odds ratio 4.1 + 2.27). Other variables with consistently significant impact on outcome included payer, diagnostic category, location and provision of case management services. Conclusions: This approach represents a promising avenue toward reducing the current gap between research and practice across healthcare, developing data-driven clinical decision support based on real-world populations, and serving as a component of embedded clinical artificial intelligences that "learn" over time. This paper deals with chain graphs under the alternative Andersson-Madigan-Perlman (AMP) interpretation. In particular, we present a constraint based algorithm for learning an AMP chain graph a given probability distribution is faithful to. We also show that the extension of Meek's conjecture to AMP chain graphs does not hold, which compromises the development of efficient and correct score+search learning algorithms under assumptions weaker than faithfulness. A semantic embedding of (constant domain) quantified conditional logic in classical higher-order logic is presented. We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multi-level heuristic refinement. The method is computationally efficient and achieves better quantization errors than the A simplified description of Fuzzy TOPSIS (Technique for Order Preference by Similarity to Ideal Situation) is presented. We have adapted the TOPSIS description from existing Fuzzy theory literature and distilled the bare minimum concepts required for understanding and applying TOPSIS. An example has been worked out to illustrate the application of TOPSIS for a multi-criteria group decision making scenario. The causal structure of cognition can be simulated but not implemented computationally, just as the causal structure of a comet can be simulated but not implemented computationally. The only thing that allows us even to imagine otherwise is that cognition, unlike a comet, is invisible (to all but the cognizer). "The hardest logic puzzle ever" presented by George Boolos became a target for philosophers and logicians who tried to modify it and make it even tougher. I propose further modification of the original puzzle where part of the available information is eliminated but the solution is still possible. The solution also gives interesting ideas on logic behind discovery of unknown language. We present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006). This paper summarises the current state-of-the art in the study of compositionality in distributional semantics, and major challenges for this area. We single out generalised quantifiers and intensional semantics as areas on which to focus attention for the development of the theory. Once suitable theories have been developed, algorithms will be needed to apply the theory to tasks. Evaluation is a major problem; we single out application to recognising textual entailment and machine translation for this purpose. In this paper, we present a new approach dedicated to correcting the spelling errors of the Arabic language. This approach corrects typographical errors like inserting, deleting, and permutation. Our method is inspired from the Levenshtein algorithm, and allows a finer and better scheduling than Levenshtein. The results obtained are very satisfactory and encouraging, which shows the interest of our new approach. Ant-based algorithms are successful tools for solving complex problems. One of these problems is the Linear Ordering Problem (LOP). The paper shows new results on some LOP instances, using Ant Colony System (ACS) and the Step-Back Sensitive Ant Model (SB-SAM). In this paper, we show the possibility of using a linear Conditional Random Fields (CRF) for terminology extraction from a specialized text corpus. We propose a solution to the problem of estimating a Riemannian metric associated with a given differentiable manifold. The metric learning problem is based on minimizing the relative volume of a given set of points. We derive the details for a family of metrics on the multinomial simplex. The resulting metric has applications in text classification and bears some similarity to TFIDF representation of text documents. We use princiles of fuzzy logic to develop a general model representing several processes in a system's operation characterized by a degree of vagueness and/or uncertainy. Further, we introduce three altenative measures of a fuzzy system's effectiveness connected to the above model. An applcation is also developed for the Mathematical Modelling process illustrating our results. We know anything because we learn about it, there is anything we ever share about it, but now a lot of media that can represent how it happened as infrastructure of the knowledge sharing. This paper aims to introduce a model for understanding a problem in knowledge sharing based on interaction. We present a new algorithm for approximate inference in probabilistic programs, based on a stochastic gradient for variational programs. This method is efficient without restrictions on the probabilistic program; it is particularly practical for distributions which are not analytically tractable, including highly structured distributions that arise in probabilistic programs. We show how to automatically derive mean-field probabilistic programs and optimize them, and demonstrate that our perspective improves inference efficiency over other algorithms. Recent theoretical work has identified random projection as a promising dimensionality reduction technique for learning mixtures of Gausians. Here we summarize these results and illustrate them by a wide variety of experiments on synthetic and real data. In our work we define a new algebra of operators as a substitute for fuzzy logic. Its primary purpose is for construction of binary discriminators for phonemes based on spectral content. It is optimized for design of non-parametric computational circuits, and makes uses of 4 operations: $\min$, $\max$, the difference and generalized additively homogenuous means. Recent improvements of the LEO-II theorem prover are presented. These improvements include a revised ATP interface, new translations into first-order logic, rule support for the axiom of choice, detection of defined equality, and more flexible strategy scheduling. With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O({\kappa}/T) for strongly convex functions, instead of O({\kappa} ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O({\kappa}/T). We examine three different algorithms that enable the collision certificate method from [Bialkowski, et al.] to handle the case of a centralized multi-robot team. By taking advantage of symmetries in the configuration space of multi-robot teams, our methods can significantly reduce the number of collision checks vs. both [Bialkowski, et al.] and standard collision checking implementations. We present Exponentiated Gradient LINUCB, an algorithm for con-textual multi-armed bandits. This algorithm uses Exponentiated Gradient to find the optimal exploration of the LINUCB. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms. We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. We study a generic program to investigate the scope for automatically customising it for a vital current task, which was not considered when it was first written. In detail, we show genetic programming (GP) can evolve models of aspects of BLAST's output when it is used to map Solexa Next-Gen DNA sequences to the human genome. This work uses the L-system to construct a tree structure for the text sequence and derives its complexity. It serves as a measure of structural complexity of the text. It is applied to anomaly detection in data transmission. This paper describes a program that solves elementary mathematical problems, mostly in metric space theory, and presents solutions that are hard to distinguish from solutions that might be written by human mathematicians. The program is part of a more general project, which we also discuss. We present a complete finite axiomatization of the unrestricted implication problem for inclusion and conditional independence atoms in the context of dependence logic. For databases, our result implies a finite axiomatization of the unrestricted implication problem for inclusion, functional, and embedded multivalued dependencies in the unirelational case. We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate the updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems. To know which operators to apply and in which order, as well as attributing good values to their parameters is a challenge for users of computer vision. This paper proposes a solution to this problem as a multi-agent system modeled according to the Vowel approach and using the Q-learning algorithm to optimize its choice. An implementation is given to test and validate this method. In this paper we study the complexity of strategic argumentation for dialogue games. A dialogue game is a 2-player game where the parties play arguments. We show how to model dialogue games in a skeptical, non-monotonic formalism, and we show that the problem of deciding what move (set of rules) to play at each turn is an NP-complete problem. Abstract dialectical frameworks (ADFs) are a powerful generalisation of Dung's abstract argumentation frameworks. In this paper we present an answer set programming based software system, called DIAMOND (DIAlectical MOdels eNcoDing). It translates ADFs into answer set programs whose stable models correspond to models of the ADF with respect to several semantics (i.e. admissible, complete, stable, grounded). In this paper we explore various parameter settings of the state-of-art Statistical Machine Translation system to improve the quality of the translation for a `distant' language pair like English-Hindi. We proposed new techniques for efficient reordering. A slight improvement over the baseline is reported using these techniques. We also show that a simple pre-processing step can improve the quality of the translation significantly. This paper presents a microkernel architecture for constraint programming organized around a number of small number of core functionalities and minimal interfaces. The architecture contrasts with the monolithic nature of many implementations. Experimental results indicate that the software engineering benefits are not incompatible with runtime efficiency. Airspace sectorisation provides a partition of a given airspace into sectors, subject to geometric constraints and workload constraints, so that some cost metric is minimised. We make a study of the constraints that arise in airspace sectorisation. For each constraint, we give an analysis of what algorithms and properties are required under systematic search and stochastic local search. These notes pose a "proof challenge": a proof, or disproof, of the proposition that "For any given body of information, I, expressed as a one-dimensional sequence of atomic symbols, a multiple alignment concept, described in the document, provides a means of encoding all the redundancy that may exist in I. Aspects of the challenge are described. Correlation clustering is a concept of machine learning. The ultimate goal of such a clustering is to find a partition with minimal conflicts. In this paper we investigate a correlation clustering of integers, based upon the greatest common divisor. In this paper we discuss some reasons why temporal logic might not be suitable to model real life norms. To show this, we present a novel deontic logic contrary-to-duty/derived permission paradox based on the interaction of obligations, permissions and contrary-to-duty obligations. The paradox is inspired by real life norms. This paper discloses the potential of OWL (Web Ontology Language) ontologies for generation of rules. The main purpose of this paper is to identify new types of rules, which may be generated from OWL ontologies. Rules, generated from OWL ontologies, are necessary for the functioning of the Semantic Web Expert System. It is expected that the Semantic Web Expert System (SWES) will be able to process ontologies from the Web with the purpose to supplement or even to develop its knowledge base. TurKontrol, and algorithm presented in (Dai et al. 2010), uses a POMDP to model and control an iterative workflow for crowdsourced work. Here, TurKontrol is re-implemented as "TurKPF," which uses a Particle Filter to reduce computation time & memory usage. Most importantly, in our experimental environment with default parameter settings, the action is chosen nearly instantaneously. Through a series of experiments we see that TurKPF and TurKontrol perform similarly. In this essay the stance on robots is discussed. The attitude against robots in history, starting in Ancient Greek culture until the industrial revolution is described. The uncanny valley and some possible explanations are given. Some differences in Western and Asian understanding of robots are listed and finally we answer the question raised with the title. The NMR community would like to build a repository of benchmarks to push forward the design of systems implementing NMR as it has been the case for many other areas in AI. There are a number of lessons which can be learned from the experience of other communi- ties. Here are a few thoughts about the requirements and choices to make before building such a repository. Dialogue games are a two-player semantics for a variety of logics, including intuitionistic and classical logic. Dialogues can be viewed as a kind of analytic calculus not unlike tableaux. Can dialogue games be an effective foundation for proof search in intuitionistic logic (both first-order and propositional)? We announce Kuno, an automated theorem prover for intuitionistic first-order logic based on dialogue games. In this treatise we aim to build a hybrid network automated (self-adaptive) security threats discovery and prevention system; by using unconventional techniques and methods, including fuzzy logic and biological inspired algorithms under the context of soft computing. The paper presents a knowledge representation language $\mathcal{A}log$ which extends ASP with aggregates. The goal is to have a language based on simple syntax and clear intuitive and mathematical semantics. We give some properties of $\mathcal{A}log$, an algorithm for computing its answer sets, and comparison with other approaches. This paper presents an overview of `Lexpresso', a Controlled Natural Language developed at the Defence Science & Technology Organisation as a bidirectional natural language interface to a high-level information fusion system. The paper describes Lexpresso's main features including lexical coverage, expressiveness and range of linguistic syntactic and semantic structures. It also touches on its tight integration with a formal semantic formalism and tentatively classifies it against the PENS system. In this paper, concept of possibility neutrosophic soft set and its operations are defined, and their properties are studied. An application of this theory in decision making is investigated. Also a similarity measure of two possibility neutrosophic soft sets is introduced and discussed. Finally an application of this similarity measure is given to select suitable person for position in a firm. In Inverse subsumption for complete explanatory induction Yamamoto et al. investigate which inductive logic programming systems can learn a correct hypothesis $H$ by using the inverse subsumption instead of inverse entailment. We prove that inductive logic programming system Imparo is complete by inverse subsumption for learning a correct definite hypothesis $H$ wrt the definite background theory $B$ and ground atomic examples $E$, by establishing that there exists a connected theory $T$ for $B$ and $E$ such that $H$ subsumes $T$. We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale. This has several useful effects: there is no need to pre-normalize data, the test-time and test-space complexity are reduced, and the algorithms are more robust. The paper attempts to describe the space of possible mind designs by first equating all minds to software. Next it proves some interesting properties of the mind design space such as infinitude of minds, size and representation complexity of minds. A survey of mind design taxonomies is followed by a proposal for a new field of investigation devoted to study of minds, intellectology, a list of open problems for this new field is presented. In Reinforcement Learning (RL), it is common to use optimistic initialization of value functions to encourage exploration. However, such an approach generally depends on the domain, viz., the scale of the rewards must be known, and the feature representation must have a constant norm. We present a simple approach that performs optimistic initialization with less dependence on the domain. The main goal of this work is to analyze the behaviour of the FA quantifier fuzzification mechanism. As we prove in the paper, this model has a very solid theorethical behaviour, superior to most of the models defined in the literature. Moreover, we show that the underlying probabilistic interpretation has very interesting consequences. We present a parallel solver for numerical constraint satisfaction problems (NCSPs) that can scale on a number of cores. Our proposed method runs worker solvers on the available cores and simultaneously the workers cooperate for the search space distribution and balancing. In the experiments, we attained up to 119-fold speedup using 256 cores of a parallel computer. ROSS ("Representation, Ontology, Structure, Star") is introduced as a new method for knowledge representation that emphasizes representational constructs for physical structure. The ROSS representational scheme includes a language called "Star" for the specification of ontology classes. The ROSS method also includes a formal scheme called the "instance model". Instance models are used in the area of natural language meaning representation to represent situations. This paper provides both the rationale and the philosophical background for the ROSS method. This paper briefly characterizes the field of cognitive computing. As an exemplification, the field of natural language question answering is introduced together with its specific challenges. A possibility to master these challenges is illustrated by a detailed presentation of the LogAnswer system, which is a successful representative of the field of natural language question answering. The article is dedicated to the analysis of the existing models for assessment based of the fuzzy logic centroid technique. A new Generalized Rectangular Model were developed. Some generalizations of the existing models are offered. This short report describes an automated BWAPI-based script developed for live streams of a StarCraft Brood War bot tournament, SSCAIT. The script controls the in-game camera in order to follow the relevant events and improve the viewer experience. We enumerate its novel features and provide a few implementation notes. What is happiness for reinforcement learning agents? We seek a formal definition satisfying a list of desiderata. Our proposed definition of happiness is the temporal difference error, i.e. the difference between the value of the obtained reward and observation and the agent's expectation of this value. This definition satisfies most of our desiderata and is compatible with empirical research on humans. We state several implications and discuss examples. In this paper, we propose a different insight to analyze AdaBoost. This analysis reveals that, beyond some preconceptions, AdaBoost can be directly used as an asymmetric learning algorithm, preserving all its theoretical properties. A novel class-conditional description of AdaBoost, which models the actual asymmetric behavior of the algorithm, is presented. We introduce a lazy approach to the explanation-based approximation of probabilistic logic programs. It uses only the most significant part of the program when searching for explanations. The result is a fast and anytime approximate inference algorithm which returns hard lower and upper bounds on the exact probability. We experimentally show that this method outperforms state-of-the-art approximate inference. Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent. We introduce a new rule based system for belief tracking in dialog systems. Despite the simplicity of the rules being considered, the proposed belief tracker ranks favourably compared to the previous submissions on the second and third Dialog State Tracking challenges. The results of this simple tracker allows to reconsider the performances of previous submissions using more elaborate techniques. The problem of autonomous navigation is one of the basic problems for robotics. Although, in general, it may be challenging when an autonomous vehicle is placed into partially observable domain. In this paper we consider simplistic environment model and introduce a navigation algorithm based on Learning Classifier System. This paper introduces a novel approach to tackle the existing gap on message translations in dialogue systems. Currently, submitted messages to the dialogue systems are considered as isolated sentences. Thus, missing context information impede the disambiguation of homographs words in ambiguous sentences. Our approach solves this disambiguation problem by using concepts over existing ontologies. We give an algorithm A which assigns probabilities to logical sentences. For any simple infinite sequence of sentences whose truth-values appear indistinguishable from a biased coin that outputs "true" with probability p, we have that the sequence of probabilities that A assigns to these sentences converges to p. This volume contains the system description of the 18 solvers submitted to the First International Competition on Computational Models of Argumentation (ICCMA'15) and therefore gives an overview on state-of-the-art of computational approaches to abstract argumentation problems. Further information on the results of the competition and the performance of the individual solvers can be found on at http://argumentationcompetition.org/2015/. The first ever human vs. computer no-limit Texas hold 'em competition took place from April 24-May 8, 2015 at River's Casino in Pittsburgh, PA. In this article I present my thoughts on the competition design, agent architecture, and lessons learned. Sometime in the future we will have to deal with the impact of AI's being mistaken for humans. For this reason, I propose that any autonomous system should be designed so that it is unlikely to be mistaken for anything besides an autonomous sysem, and should identify itself at the start of any interaction with another agent. This article provides a formalization of the W3C Draft Core SHACL Semantics specification using Z notation. This formalization exercise has identified a number of quality issues in the draft. It has also established that the recursive definitions in the draft are well-founded. Further formal validation of the draft will require the use of an executable specification technology. Attribute exploration has been investigated in several studies, with particular emphasis on the algorithmic aspects of this knowledge acquisition method. In its basic version the method itself is rather simple and transparent. But when background knowledge and partially described counter-examples are admitted, it gets more difficult. Here we discuss this case in an abstract, somewhat "axiomatic" setting, providing a terminology that clarifies the abstract strategy of the method rather than its algorithmic implementation. This document provides the foundations behind the functionality provided by the $\rho$G library (https://github.com/santiontanon/RHOG), focusing on the basic operations the library provides: subsumption, refinement of directed labeled graphs, and distance/similarity assessment between directed labeled graphs. $\rho$G development was initially supported by the National Science Foundation, by the EAGER grant IIS-1551338. In previous work with J. Hedges, we formalised a generalised quantifiers theory of natural language in categorical compositional distributional semantics with the help of bialgebras. In this paper, we show how quantifier scope ambiguity can be represented in that setting and how this representation can be generalised to branching quantifiers. I propose a system for Automated Theorem Proving in higher order logic using deep learning and eschewing hand-constructed features. Holophrasm exploits the formalism of the Metamath language and explores partial proof trees using a neural-network-augmented bandit algorithm and a sequence-to-sequence model for action enumeration. The system proves 14% of its test theorems from Metamath's set.mm module. Reasoning does not work well when done in isolation from its significance, both to the needs and interests of an agent and with respect to the wider world. Moreover, those issues may best be handled with a new sort of data structure that goes beyond the knowledge base and incorporates aspects of perceptual knowledge and even more, in which a kind of anticipatory action may be key. In this paper, we reexamine the Movie Graph Argument, which demonstrates a basic incompatibility between computationalism and materialism. We discover that the incompatibility is only manifest in singular classical-like universes. If we accept that we live in a Multiverse, then the incompatibility goes away, but in that case another line of argument shows that with computationalism, the fundamental, or primitive materiality has no causal influence on what is observed, which must must be derivable from basic arithmetic properties. This thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing. We present a logical framework to represent and reason about fuzzy optimization problems based on fuzzy answer set optimization programming. This is accomplished by allowing fuzzy optimization aggregates, e.g., minimum and maximum in the language of fuzzy answer set optimization programming to allow minimization or maximization of some desired criteria under fuzzy environments. We show the application of the proposed logical fuzzy optimization framework under the fuzzy answer set optimization programming to the fuzzy water allocation optimization problem. This article addresses the problem of expressing preferences in flexible queries while basing on a combination of the fuzzy logic theory and Conditional Preference Networks or CP-Nets. The Rao-Blackwell theorem is utilized to analyze and improve the scalability of inference in large probabilistic models that exhibit symmetries. A novel marginal density estimator is introduced and shown both analytically and empirically to outperform standard estimators by several orders of magnitude. The developed theory and algorithms apply to a broad class of probabilistic models including statistical relational models considered not susceptible to lifted probabilistic inference. Most feature detectors such as edge detectors or circle finders are statistical, in the sense that they decide at each point in an image about the presence of a feature, this paper describes the use of Bayesian feature detectors. Usual techniques to solve WCSP are based on cost transfer operations coupled with a branch and bound algorithm. In this paper, we focus on an approach integrating extraction and relaxation of Minimal Unsatisfiable Cores in order to solve this problem. We decline our approach in two ways: an incomplete, greedy, algorithm and a complete one. This paper presents the OntoRich framework, a support tool for semi-automatic ontology enrichment and evaluation. The WordNet is used to extract candidates for dynamic ontology enrichment from RSS streams. With the integration of OpenNLP the system gains access to syntactic analysis of the RSS news. The enriched ontologies are evaluated against several qualitative metrics. The explosion of available data along with the need to integrate and utilize that data has led to a pressing interest in data integration techniques. In terms of Semantic Web technologies, Ontology Alignment is a key step in the process of integrating heterogeneous knowledge bases. In this paper, we present the Edge Confidence technique, a modification and improvement over the popular Similarity Flooding technique for Ontology Alignment. Modeling emotional-cognition is in a nascent stage and therefore wide-open for new ideas and discussions. In this paper the author looks at the modeling problem by bringing in ideas from axiomatic mathematics, information theory, computer science, molecular biology, non-linear dynamical systems and quantum computing and explains how ideas from these disciplines may have applications in modeling emotional-cognition. The FOCUS constraint expresses the notion that solutions are concentrated. In practice, this constraint suffers from the rigidity of its semantics. To tackle this issue, we propose three generalizations of the FOCUS constraint. We provide for each one a complete filtering algorithm as well as discussing decompositions. We show how, and under which conditions, the equilibrium states of a first-order Ordinary Differential Equation (ODE) system can be described with a deterministic Structural Causal Model (SCM). Our exposition sheds more light on the concept of causality as expressed within the framework of Structural Causal Models, especially for cyclic models. We introduce stratified labelings as a novel semantical approach to abstract argumentation frameworks. Compared to standard labelings, stratified labelings provide a more fine-grained assessment of the controversiality of arguments using ranks instead of the usual labels in, out, and undecided. We relate the framework of stratified labelings to conditional logic and, in particular, to the System Z ranking functions. We introduce in this paper an algorithm named Contextuel-E-Greedy that tackles the dynamicity of the user's content. It is based on dynamic exploration/exploitation tradeoff and can adaptively balance the two aspects by deciding which situation is most relevant for exploration or exploitation. The experimental results demonstrate that our algorithm outperforms surveyed algorithms. In this work, we first define relations on the fuzzy parametrized soft sets and study their properties. We also give a decision making method based on these relations. In approximate reasoning, relations on the fuzzy parametrized soft sets have shown to be of a primordial importance. Finally, the method is successfully applied to a problems that contain uncertainties. This paper presents a powerful genetic algorithm(GA) to solve the traveling salesman problem (TSP). To construct a powerful GA, I use edge swapping(ES) with a local search procedure to determine good combinations of building blocks of parent solutions for generating even better offspring solutions. Experimental results on well studied TSP benchmarks demonstrate that the proposed GA is competitive in finding very high quality solutions on instances with up to 16,862 cities. In this paper, we proposed a new approximate heuristic search algorithm: Cascading A*, which is a two-phrase algorithm combining A* and IDA* by a new concept "envelope ball". The new algorithm CA* is efficient, able to generate approximate solution and any-time solution, and parallel friendly. This paper reports our initial experiments with using external ATP on some corpora built with the ACL2 system. This is intended to provide the first estimate about the usefulness of such external reasoning and AI systems for solving ACL2 problems. A common assumption in belief revision is that the reliability of the information sources is either given, derived from temporal information, or the same for all. This article does not describe a new semantics for integration but the problem of obtaining the reliability of the sources given the result of a previous merging. As an example, the relative reliability of two sensors can be assessed given some certain observation, and allows for subsequent mergings of data coming from them. Analyzing Big Data can help corporations to im-prove their efficiency. In this work we present a new vision to derive Value from Big Data using a Semantic Hierarchical Multi-label Classification called Semantic HMC based in a non-supervised Ontology learning process. We also proposea Semantic HMC process, using scalable Machine-Learning techniques and Rule-based reasoning. The domain model is one of the important components used by adaptive learning systems to automatically generate customized courses for the learners. In this paper our contribution is to propose a new tool for implementation of a domain model based on fuzzy relationships among concepts. This tool allows the experts and teachers to find the best parameters in order to adapt the learners's differences. We propose a framework for computer music composition that uses resilient propagation (RProp) and long short term memory (LSTM) recurrent neural network. In this paper, we show that LSTM network learns the structure and characteristics of music pieces properly by demonstrating its ability to recreate music. We also show that predicting existing music using RProp outperforms Back propagation through time (BPTT). The paper presents some steps for multi-valued representation of neutrosophic information. These steps are provided in the framework of multi-valued logics using the following logical value: true, false, neutral, unknown and saturated. Also, this approach provides some calculus formulae for the following neutrosophic features: truth, falsity, neutrality, ignorance, under-definedness, over-definedness, saturation and entropy. In addition, it was defined net truth, definedness and neutrosophic score. ARCOE-Logic 2014, the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic, was held in co-location with the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) on November 25, 2014 in Link\"oping, Sweden. These notes contain the five papers which were accepted and presented at the workshop. In this paper one presents a new fuzzy clustering algorithm based on a dissimilarity function determined by three parameters. This algorithm can be considered a generalization of the Gustafson-Kessel algorithm for fuzzy clustering. In this paper a knowledge representation model are proposed, FP5, which combine the ideas from fuzzy sets and penta-valued logic. FP5 represents imprecise properties whose accomplished degree is undefined, contradictory or indeterminate for some objects. Basic operations of conjunction, disjunction and negation are introduced. Relations to other representation models like fuzzy sets, intuitionistic, paraconsistent and bipolar fuzzy sets are discussed. We report on our initial work to automate the generation of a domain ontology using subject fields of resources held in the Virtual Observatory registry. Preliminary results are comparable to more generalized ontology learning software currently in use. We expect to be able to refine our solution to improve both the depth and breadth of the generated ontology. We propose a generalization of SimRank similarity measure for heterogeneous information networks. Given the information network, the intraclass similarity score s(a, b) is high if the set of objects that are related with a and the set of objects that are related with b are pair-wise similar according to all imposed relations. In practical situations, it is of interest to investigate computing approximations of sets as an important step of knowledge reduction of dynamic covering decision information systems. In this paper, we present incremental approaches to computing the type-1 and type-2 characteristic matrices of dynamic coverings whose cardinalities increase with immigration of more objects. We also present the incremental algorithms of computing the second and sixth lower and upper approximations of sets in dynamic covering approximation spaces. This report describes a minimalistic set of methods engineered to anchor clinical events onto a temporal space. Specifically, we describe methods to extract clinical events (e.g., Problems, Treatments and Tests), temporal expressions (i.e., time, date, duration, and frequency), and temporal links (e.g., Before, After, Overlap) between events and temporal entities. These methods are developed and validated using high quality datasets. Machine learning is a quickly evolving field which now looks really different from what it was 15 years ago, when classification and clustering were major issues. This document proposes several trends to explore the new questions of modern machine learning, with the strong afterthought that the belief function framework has a major role to play. We study confidentiality enforcement in ontologies under the Controlled Query Evaluation framework, where a policy specifies the sensitive information and a censor ensures that query answers that may compromise the policy are not returned. We focus on censors that ensure confidentiality while maximising information access, and consider both Datalog and the OWL 2 profiles as ontology languages. Can machines truly think? This question and its answer have many implications that depend, in large part, on any number of assumptions underlying how the issue has been addressed or considered previously. A crucial question, and one that is almost taken for granted, is the starting point for this discussion: Can "thought" be achieved or emulated by algorithmic procedures? In this paper one presents new similarity, cardinality and entropy measures for bipolar fuzzy set and for its particular forms like intuitionistic, paraconsistent and fuzzy set. All these are constructed in the framework of multi-valued representations and are based on a penta-valued logic that uses the following logical values: true, false, unknown, contradictory and ambiguous. Also a new distance for bounded real interval was defined. This paper presents a five-valued representation of bifuzzy sets. This representation is related to a five-valued logic that uses the following values: true, false, inconsistent, incomplete and ambiguous. In the framework of five-valued representation, formulae for similarity, entropy and syntropy of bifuzzy sets are constructed. Updated on 24/09/2015: This update provides preliminary experiment results for fine-grained classification on the surveillance data of CompCars. The train/test splits are provided in the updated dataset. See details in Section 6. In this paper, we address the problem of identifying linear structural equation models. We first extend the edge set half-trek criterion to cover a broader class of models. We then show that any semi-Markovian linear model can be recursively decomposed into simpler sub-models, resulting in improved identification power. Finally, we show that, unlike the existing methods developed for linear models, the resulting method subsumes the identification algorithm of non-parametric models. In many European countries the growth of the real GDP per capita has been linear since 1950. An explanation for this linearity is still missing. We propose that in artificial intelligence we may find models for a linear growth of performance. We also discuss possible consequences of the fact that in systems with linear growth the percentage growth goes to zero. We describe a framework for building abstraction hierarchies whereby an agent alternates skill- and representation-acquisition phases to construct a sequence of increasingly abstract Markov decision processes. Our formulation builds on recent results showing that the appropriate abstract representation of a problem is specified by the agent's skills. We describe how such a hierarchy can be used for fast planning, and illustrate the construction of an appropriate hierarchy for the Taxi domain. In this paper, we present a board game: Square War. The game definition of Square War is similar to the classic Chinese board game Go. Then we propose a mathematical problem of the game Square War. Finally, we show that the problem can be solved by using a method of mixed mathematics and computer science. There exists a theory of a single general-purpose learning algorithm which could explain the principles its operation. It assumes the initial rough architecture, a small library of simple innate circuits which are prewired at birth. and proposes that all significant mental algorithms are learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from architectural and functional perspectives. In this work, we present a MCTS-based Go-playing program which uses convolutional networks in all parts. Our method performs MCTS in batches, explores the Monte Carlo search tree using Thompson sampling and a convolutional network, and evaluates convnet-based rollouts on the GPU. We achieve strong win rates against open source Go programs and attain competitive results against state of the art convolutional net-based Go-playing programs. Accurate and computationally efficient means for classifying human activities have been the subject of extensive research efforts. Most current research focuses on extracting complex features to achieve high classification accuracy. We propose a template selection approach based on Dynamic Time Warping, such that complex feature extraction and domain knowledge is avoided. We demonstrate the predictive capability of the algorithm on both simulated and real smartphone data. Given a data set of numerical values which are sampled from some unknown probability distribution, we will show how to check if the data set exhibits the Markov property and we will show how to use the Markov property to predict future values from the same distribution, with probability 1. This paper presents inference rules for Resource Description Framework (RDF), RDF Schema (RDFS) and Web Ontology Language (OWL). Our formalization is based on Notation 3 Logic, which extended RDF by logical symbols and created Semantic Web logic for deductive RDF graph stores. We also propose OWL-P that is a lightweight formalism of OWL and supports soft inferences by omitting complex language constructs. For vehicle sharing schemes, where drop-off positions are not fixed, we propose a pricing scheme, where the price depends in part on the distance between where a vehicle is being dropped off and where the closest shared vehicle is parked. Under certain restrictive assumptions, we show that this pricing leads to a socially optimal spread of the vehicles within a region. This paper follows previous research we have already performed in the area of Bayesian networks models for CAT. We present models using Item Response Theory (IRT - standard CAT method), Bayesian networks, and neural networks. We conducted simulated CAT tests on empirical data. Results of these tests are presented for each model separately and compared. In this paper we explore the application of some notable Boolean methods, namely the Disjunctive Normal Form representation of logic table expansions, and apply them to a real-valued logic model which utilizes quantities on the range [0,1] to produce a probabilistic programming of a game character's logic in mathematical form. Characterizations of semi-stable and stage extensions in terms of 2-valued logical models are presented. To this end, the so-called GL-supported and GL-stage models are defined. These two classes of logical models are logic programming counterparts of the notion of range which is an established concept in argumentation semantics. We report about significant enhancements of the complex algebraic geometry theorem proving subsystem in GeoGebra for automated proofs in Euclidean geometry, concerning the extension of numerous GeoGebra tools with proof capabilities. As a result, a number of elementary theorems can be proven by using GeoGebra's intuitive user interface on various computer architectures including native Java and web based systems with JavaScript. We also provide a test suite for benchmarking our results with 200 test cases. We present a budget-free experimental setup and procedure for benchmarking numericaloptimization algorithms in a black-box scenario. This procedure can be applied with the COCO benchmarking platform. We describe initialization of and input to the algorithm and touch upon therelevance of termination and restarts. Knowledge representation is a popular research field in IT. As mathematical knowledge is most formalized, its representation is important and interesting. Mathematical knowledge consists of various mathematical theories. In this paper we consider a deductive system that derives mathematical notions, axioms and theorems. All these notions, axioms and theorems can be considered as the part of elementary set theory. This theory will be represented as a semantic net. Probabilistic programming is considered as a framework, in which basic components of cognitive architectures can be represented in unified and elegant fashion. At the same time, necessity of adopting some component of cognitive architectures for extending capabilities of probabilistic programming languages is pointed out. In particular, implicit specification of generative models via declaration of concepts and links between them is proposed, and usefulness of declarative knowledge for achieving efficient inference is briefly discussed. We improve further the 2015 version of abcdSAT by various heuristics such as at-least-one recently used strategy, learnt clause database approximation reduction etc. Based on the requirement of different tracks at the SAT Competition 2016, we develop three versions of abcdSAT: drup, inc and lim, which participate in the competition of main (agile), incremental library and no-limit track, respectively. This paper discusses the idea of levels of autonomy of systems - be this technical or organic - and compares the insights with models employed by industries used to describe maturity and capability of their products. OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software. In this note, we discuss and analyse a shortest path finding approach using strong spatial cognition. It is compared with a symbolic graph-based algorithm and it is shown that both approaches are similar with respect to structure and complexity. Nevertheless, the strong spatial cognition solution is easy to understand and even pops up immediately when one has to solve the problem. Our software system simulates the classical collaborative Japanese poetry form, renga, made of linked haikus. We used NLP methods wrapped up as web services. Our experiments were only a partial success, since results fail to satisfy classical constraints. To gather ideas for future work, we examine related research in semiotics, linguistics, and computing. Recently, Neural Networks have been proven extremely effective in many natural language processing tasks such as sentiment analysis, question answering, or machine translation. Aiming to exploit such advantages in the Ontology Learning process, in this technical report we present a detailed description of a Recurrent Neural Network based system to be used to pursue such goal. We present the last of a series of three academic essays which deal with the question of how and why to build a generalized player model. We propose that a general player model needs parameters for subjective experience of play, including: player psychology, game structure, and actions of play. Based on this proposition, we pose three linked research questions: RQ1 what is a necessary and sufficient foundation to a general player model?; RQ2 can such a foundation improve performance of a computational intelligence- based player model?; and RQ3 can such a player model improve efficacy of adaptive artificial intelligence in games? We set out the arguments behind these research questions in each of the three essays, presented as three preprints. The third essay, in this preprint, presents the argument that adaptive game artificial intelligence will be enhanced by a generalised player model. This is because games are inherently human artefacts which therefore, require some encoding of the human perspective in order to effectively autonomously respond to the individual player. The player model informs the necessary constraints on the adaptive artificial intelligence. A generalised player model is not only more efficient than a per-game solution, but also allows comparison between games which makes it a useful tool for studying play in general. We describe the concept and meaning of an adaptive game. We propose requirements for functional adaptive AI, arguing from first principles drawn from the games research literature. We propose solutions to these requirements, based on a formal model approach to our existing 'Behavlets' method for psychologically-derived player modelling: Cowley, B., & Charles, D. (2016). Behavlets: a Method for Practical Player Modelling using Psychology-Based Player Traits and Domain Specific Features. User Modeling and User-Adapted Interaction, 26(2), 257-306. A warning system for assisting drivers during overtaking maneuvers is proposed. The system relies on Car-2-Car communication technologies and multi-agent systems. A protocol for safety overtaking is proposed based on ACL communicative acts. The mathematical model for safety overtaking used Kalman filter to minimize localization error. Selectional restrictions are semantic constraints on forming certain complex types in natural language. The paper gives an overview of modeling selectional restrictions in a relational type system with morphological and syntactic types. We discuss some foundations of the system and ways of formalizing selectional restrictions. Keywords: type theory, selectional restrictions, syntax, morphology Crowdsourcing is a multidisciplinary research area including disciplines like artificial intelligence, human-computer interaction, database, and social science. To facilitate cooperation across disciplines, reproducibility is a crucial factor, but unfortunately, it has not gotten enough attention in the HCOMP community. In this paper, we present Reprowd, a system aiming to make it easy to reproduce crowdsourced data processing research. We have open sourced Reprowd at http://sfu-db.github.io/reprowd/. The point of this note is to prove that a language is in the complexity class PP if and only if the strings of the language encode valid inferences in a Bayesian network defined using function-free first-order logic with equality. Many fields are now snowed under with an avalanche of data, which raises considerable challenges for computer scientists. Meanwhile, robotics (among other fields) can often only use a few dozen data points because acquiring them involves a process that is expensive or time-consuming. How can an algorithm learn with only a few data points? We outline a program in the area of formalization of mathematics to automate theorem proving in algebra and algebraic geometry. We propose a construction of a dictionary between automated theorem provers and (La)TeX exploiting syntactic parsers. We describe its application to a repository of human-written facts and definitions in algebraic geometry (The Stacks Project). We use deep learning techniques. We explore the following question: Is a decision-making program fair, for some useful definition of fairness? First, we describe how several algorithmic fairness questions can be phrased as program verification problems. Second, we discuss an automated verification technique for proving or disproving fairness of decision-making programs with respect to a probabilistic model of the population. This special issue is dedicated to get a better picture of the relationships between computational linguistics and cognitive science. It specifically raises two questions: "what is the potential contribution of computational language modeling to cognitive science?" and conversely: "what is the influence of cognitive science in contemporary computational linguistics?" We introduce a kind of partial observability to the projective simulation (PS) learning method via Dirac notation. It is done by adding a projection operator and an observability parameter to the original formulation of the efficiency in PS model. Our examples are from invasion toy problem regarding a multi-agent setting. In reinforcement learning, the standard criterion to evaluate policies in a state is the expectation of (discounted) sum of rewards. However, this criterion may not always be suitable, we consider an alternative criterion based on the notion of quantiles. In the case of episodic reinforcement learning problems, we propose an algorithm based on stochastic approximation with two timescales. We evaluate our proposition on a simple model of the TV show, Who wants to be a millionaire. We examine three probabilistic concepts related to the sentence "two variables have no bearing on each other". We explore the relationships between these three concepts and establish their relevance to the process of constructing similarity networks---a tool for acquiring probabilistic knowledge from human experts. We also establish a precise relationship between connectedness in Bayesian networks and relevance in probability. I suggest an approach that helps the online marketers to target their Gamification elements to users by modifying the order of the list of tasks that they send to users. It is more realistic and flexible as it allows the model to learn more parameters when the online marketers collect more data. The targeting approach is scalable and quick, and it can be used over streaming data. We adapt Tomita's Generalized LR algorithm to languages generated by context-free grammars enriched with a shuffle operator. The change involves extensions to the underlying handle-finding finite automaton, construction of parser tables, and the necessary optimizations in constructing a deterministic parser. Our system is motivated by an application from artificial intelligence plan recognition. We argue for the correctness of the system, and discuss future extensions of this work. In this paper, we advocate the use of stratified logical theories for representing probabilistic models. We argue that such encodings can be more interpretable than those obtained in existing frameworks such as Markov logic networks. Among others, this allows for the use of domain experts to improve learned models by directly removing, adding, or modifying logical formulas. This paper discusses the semantics of weighted argumentation graphs that are biplor, i.e. contain both attacks and support graphs. The work builds on previous work by Amgoud, Ben-Naim et. al., which presents and compares several semantics for argumentation graphs that contain only supports or only attacks relationships, respectively. We show how the classic Cramer-Rao bound limits how accurately one can simultaneously estimate values of a large number of Google Ad campaigns (or similarly limit the measurement rate of many confounding A/B tests). This paper introduces ALYSIA: Automated LYrical SongwrIting Application. ALYSIA is based on a machine learning model using Random Forests, and we discuss its success at pitch and rhythm prediction. Next, we show how ALYSIA was used to create original pop songs that were subsequently recorded and produced. Finally, we discuss our vision for the future of Automated Songwriting for both co-creative and autonomous systems. We tackle the issue of finding a good policy when the number of policy updates is limited. This is done by approximating the expected policy reward as a sequence of concave lower bounds which can be efficiently maximized, drastically reducing the number of policy updates required to achieve good performance. We also extend existing methods to negative rewards, enabling the use of control variates. Despite the recent progress in automatic theorem provers, proof engineers are still suffering from the lack of powerful proof automation. In this position paper we first report our proof strategy language based on a meta-tool approach. Then, we propose an AI-based approach to drastically improve proof automation for Isabelle, while identifying three major challenges we plan to address for this objective. We introduce a new family of minmax rank aggregation problems under two distance measures, the Kendall {\tau} and the Spearman footrule. As the problems are NP-hard, we proceed to describe a number of constant-approximation algorithms for solving them. We conclude with illustrative applications of the aggregation methods on the Mallows model and genomic data. Interaction information is one of the multivariate generalizations of mutual information, which expresses the amount information shared among a set of variables, beyond the information, which is shared in any proper subset of those variables. Unlike (conditional) mutual information, which is always non-negative, interaction information can be negative. We utilize this property to find the direction of causal influences among variables in a triangle topology under some mild assumptions. ASHACL, a variant of the W3C Shapes Constraint Language, is designed to determine whether an RDF graph meets some conditions. These conditions are grouped into shapes, which validate whether particular RDF terms each meet the constraints of the shape. Shapes are themselves expressed as RDF triples in an RDF graph, called a shapes graph. Influence diagrams are a decision-theoretic extension of probabilistic graphical models. In this paper we show how they can be used to solve the Brachistochrone problem. We present results of numerical experiments on this problem, compare the solution provided by the influence diagram with the optimal solution. The R code used for the experiments is presented in the Appendix. We develop T-SKIRT: a temporal, structured-knowledge, IRT-based method for predicting student responses online. By explicitly accounting for student learning and employing a structured, multidimensional representation of student proficiencies, the model outperforms standard IRT-based methods on an online response prediction task when applied to real responses collected from students interacting with diverse pools of educational content. This paper proposes Monte Carlo Action Programming, a programming language framework for autonomous systems that act in large probabilistic state spaces with high branching factors. It comprises formal syntax and semantics of a nondeterministic action programming language. The language is interpreted stochastically via Monte Carlo Tree Search. Effectiveness of the approach is shown empirically. We investigate how to model exchangeability with choice functions. Exchangeability is a structural assessment on a sequence of uncertain variables. We show how such assessments are a special indifference assessment, and how that leads to a counterpart of de Finetti's Representation Theorem, both in a finite and a countable context. This paper proposes an innovative method for segmentation of skin lesions in dermoscopy images developed by the authors, based on fuzzy classification of pixels and histogram thresholding. Influence diagrams are a decision-theoretic extension of probabilistic graphical models. In this paper we show how they can be used to solve the Goddard problem. We present results of numerical experiments with this problem and compare the solutions provided by influence diagrams with the optimal solution. Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters. The paper discusses scientific and technological problems of dynamic integrated expert systems development. Extensions of problem-oriented methodology for dynamic integrated expert systems development are considered. Attention is paid to the temporal knowledge representation and processing. We present a baseline approach for cross-modal knowledge fusion. Different basic fusion methods are evaluated on existing embedding approaches to show the potential of joining knowledge about certain concepts across modalities in a fused concept representation. In this extended abstract, we propose Structured Production Systems (SPS), which extend traditional production systems with well-formed syntactic structures. Due to the richness of structures, structured production systems significantly enhance the expressive power as well as the flexibility of production systems, for instance, to handle uncertainty. We show that different rule application strategies can be reduced into the basic one by utilizing structures. Also, many fundamental approaches in computer science, including automata, grammar and logic, can be captured by structured production systems. Recently introduced composition operator for credal sets is an analogy of such operators in probability, possibility, evidence and valuation-based systems theories. It was designed to construct multidimensional models (in the framework of credal sets) from a system of low- dimensional credal sets. In this paper we study its potential from the computational point of view utilizing methods of polyhedral geometry. Social abstract argumentation is a principled way to assign values to conflicting (weighted) arguments. In this note we discuss the important property of the uniqueness of the model. Relation Extraction refers to the task of populating a database with tuples of the form $r(e_1, e_2)$, where $r$ is a relation and $e_1$, $e_2$ are entities. Distant supervision is one such technique which tries to automatically generate training examples based on an existing KB such as Freebase. This paper is a survey of some of the techniques in distant supervision which primarily rely on Probabilistic Graphical Models (PGMs). We present a rational analysis of curiosity, proposing that people's curiosity is driven by seeking stimuli that maximize their ability to make appropriate responses in the future. This perspective offers a way to unify previous theories of curiosity into a single framework. Experimental results confirm our model's predictions, showing how the relationship between curiosity and confidence can change significantly depending on the nature of the environment. In this paper, we constructed a model to determine weights of criterias and presented a solution for determining the optimal alternative by using the constructed model and relationship analysis between criterias in fuzzy group decision-making problem with different forms of preference information of decision makers on criterias. For most reinforcement learning approaches, the learning is performed by maximizing an accumulative reward that is expectedly and manually defined for specific tasks. However, in real world, rewards are emergent phenomena from the complex interactions between agents and environments. In this paper, we propose an implicit generic reward model for reinforcement learning. Unlike those rewards that are manually defined for specific tasks, such implicit reward is task independent. It only comes from the deviation from the agents' previous experiences. In this paper $\ast$--compatible extensions of fuzzy relations are studied, generalizing some results obtained by Duggan in case of crisp relations. From this general result are obtained as particular cases fuzzy versions of some important extension theorems for crisp relations (Szpilrajn, Hansson, Suzumura). Two notions of consistent closure of a fuzzy relation are introduced. Toby Walsh in 'The Singularity May Never Be Near' gives six arguments to support his point of view that technological singularity may happen but that it is unlikely. In this paper, we provide analysis of each one of his arguments and arrive at similar conclusions, but with more weight given to the 'likely to happen' probability. We present an initial version of Regular Boardgames general game description language. This stands as an extension of Simplified Boardgames language. Our language is designed to be able to express the rules of a majority of popular boardgames including the complex rules such as promotions, castling, en passant, jump captures, liberty captures, and obligatory moves. The language describes all the above through one consistent general mechanism based on regular expressions, without using exceptions or ad hoc rules. This paper is concerned with the apparent greatest weakness of the Mathematical Theory of Evidence (MTE) of Shafer \cite{Shafer:76}, which has been strongly criticized by Wasserman \cite{Wasserman:92ijar} - the relationship to frequencies. Weaknesses of various proposals of probabilistic interpretation of MTE belief functions are demonstrated. A new frequency-based interpretation is presented overcoming various drawbacks of earlier interpretations. In this paper, we propose generative probabilistic models for label aggregation. We use Gibbs sampling and a novel variational inference algorithm to perform the posterior inference. Empirical results show that our methods consistently outperform state-of-the-art methods. In this article, we explain in detail the internal structures and databases of a smart health application. Moreover, we describe how to generate a statistically sound synthetic dataset using real-world medical data. In this work we describe and evaluate methods to learn musical embeddings. Each embedding is a vector that represents four contiguous beats of music and is derived from a symbolic representation. We consider autoencoding-based methods including denoising autoencoders, and context reconstruction, and evaluate the resulting embeddings on a forward prediction and a classification task. Presented is a Julia meta-program that discovers compact theories from data if they exist. It writes candidate theories in Julia and then validates: tossing the bad theories and keeping the good theories. Compactness is measured by a metric: such as the number of space-time derivatives. The underlying algorithm is applicable to a wide variety of combinatorics problems and compactness serves to cut down the search space. Representing knowledge as high-dimensional vectors in a continuous semantic vector space can help overcome the brittleness and incompleteness of traditional knowledge bases. We present a method for performing deductive reasoning directly in such a vector space, combining analogy, association, and deduction in a straightforward way at each step in a chain of reasoning, drawing on knowledge from diverse sources and ontologies. We study some mathematical aspects of the Mahjong game. In particular, we use combinatorial theory and write a Python program to study some special features of the game. The results confirm some folklore concerning the game, and expose some unexpected results. Related results and possible future research in connection to artificial intelligence are mentioned. We study a novel outlier detection problem that aims to identify abnormal input-output associations in data, whose instances consist of multi-dimensional input (context) and output (responses) pairs. We present our approach that works by analyzing data in the conditional (input--output) relation space, captured by a decomposable probabilistic model. Experimental results demonstrate the ability of our approach in identifying multivariate conditional outliers. Given an environment with continuous state spaces and discrete actions, we investigate using a Double Deep Q-learning Reinforcement Agent to find optimal policies using the LunarLander-v2 OpenAI gym environment. We give a non-FPT lower bound on the size of structured decision DNNF and OBDD with decomposable AND-nodes representing CNF-formulas of bounded incidence treewidth. Both models are known to be of FPT size for CNFs of bounded primal treewidth. To the best of our knowledge this is the first parameterized separation of primal treewidth and incidence treewidth for knowledge compilation models. We consider the problem of rational uncertainty about unproven mathematical statements, which G\"odel and others have remarked on. Using Bayesian-inspired arguments we build a normative model of fair bets under deductive uncertainty which draws from both probability and the theory of algorithms. We comment on connections to Zeilberger's notion of "semi-rigorous proofs", particularly that inherent subjectivity is an obstacle. In this note, we point out a basic link between generative adversarial (GA) training and binary classification -- any powerful discriminator essentially computes an (f-)divergence between real and generated samples. The result, repeatedly re-derived in decision theory, has implications for GA Networks (GANs), providing an alternative perspective on training f-GANs by designing the discriminator loss function. Logic-based paradigms are nowadays widely used in many different fields, also thank to the availability of robust tools and systems that allow the development of real-world and industrial applications. In this work we present LoIDE, an advanced and modular web-editor for logic-based languages that also integrates with state-of-the-art solvers. In this paper we design and evaluate a Deep-Reinforcement Learning agent that optimizes routing. Our agent adapts automatically to current traffic conditions and proposes tailored configurations that attempt to minimize the network delay. Experiments show very promising performance. Moreover, this approach provides important operational advantages with respect to traditional optimization algorithms. This paper maps out the relation between different approaches for handling preferences in argumentation with strict rules and defeasible assumptions by offering translations between them. The systems we compare are: non-prioritized defeats i.e. attacks, preference-based defeats, and preference-based defeats extended with reverse defeat. This article discusses how the automation of tensor algorithms, based on A Mathematics of Arrays and Psi Calculus, and a new way to represent numbers, Unum Arithmetic, enables mechanically provable, scalable, portable, and more numerically accurate software. The connected autonomous vehicle has been often touted as a technology that will become pervasive in society in the near future. Rather than being stand alone, we examine the need for autonomous vehicles to cooperate and interact within their socio-cyber-physical environments, including the problems cooperation will solve, but also the issues and challenges. Over the past decade, the idea of smart homes has been conceived as a potential solution to counter energy crises or to at least mitigate its intensive destructive consequences in the residential building sector. Causation has been the issue of philosophic debate since Hippocrates. Recent work defines actual causation in terms of Pearl/Halpern's causality framework, formalizing necessary causes (IJCAI'15). This has inspired causality notions in the security domain (CSF'15), which, perhaps surprisingly, formalize sufficient causes instead. We provide an explicit relation between necessary and sufficient causes. We demonstrate applications of topological characteristics of oil and gas reservoirs considered as three-dimensional bodies to geological modeling. Mao H. (2017, Representing attribute reduction and concepts in concept lattice using graphs. Soft Computing 21(24):7293--7311) claims to make contributions to the study of reduction of attributes in concept lattices by using graph theory. We show that her results are either trivial or already well-known and all three algorithms proposed in the paper are incorrect. Constant structure closed semantic systems are the systems each element of which receives its definition through the correspondent unchangeable set of other elements of the system. Discrete time means here that the definitions of the elements change iteratively and simultaneously based on the "neighbor portraits" from the previous iteration. I prove that the iterative redefinition process in such class of systems will quickly degenerate into a series of pairwise isomorphic states and discuss some directions of further research. : In this paper a method to make inputting electrical model upon factors that affect melting process of high ultra power(UHP) electric furnace by using fuzzy rule and regression model is suggested and its effectiveness is verified with simulation experiment. We propose a framework that directly tackles the probability distribution of the value function parameters in Deep Q Network (DQN), with powerful variational inference subroutines to approximate the posterior of the parameters. We will establish the equivalence between our proposed surrogate objective and variational inference loss. Our new algorithm achieves efficient exploration and performs well on large scale chain Markov Decision Process (MDP). Nintendo's Super Smash Bros. Melee fighting game can be emulated on modern hardware allowing us to inspect internal memory states, such as character positions. We created an AI that avoids being hit by training using these internal memory states and outputting controller button presses. After training on a month's worth of Melee matches, our best agent learned to avoid the toughest AI built into the game for a full minute 74.6% of the time. In this work, we present our findings and experiments for stock-market prediction using various textual sentiment analysis tools, such as mood analysis and event extraction, as well as prediction models, such as LSTMs and specific convolutional architectures. Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found that pseudorehearsal can assist learning and decrease forgetting. This short paper describes our ongoing research on Greenhouse - a zero-positive machine learning system for time-series anomaly detection. Classical anomaly detection is principally concerned with point-based anomalies, anomalies that occur at a single data point. In this paper, we present a new mathematical model to express range-based anomalies, anomalies that occur over a range (or period) of time. Starting from the observation that rational closure has the undesirable property of being an "all or nothing" mechanism, we here propose a multipreferential semantics, which enriches the preferential semantics underlying rational closure in order to separately deal with the inheritance of different properties in an ontology with exceptions. We provide a multipreference closure mechanism which is sound with respect to the multipreference semantics. The cognitive theory of true conditions (CTTC) is a proposal to describe the model-theoretic semantics of symbolic cognitive architectures and design the implementation of cognitive abilities. The CTTC is formulated mathematically using the multi-optional many-sorted past present future(MMPPF) structures. This article defines mathematically the MMPPF structures and the formal languages proposed to describe them by the CTTC. We propose the Onto2Vec method, an approach to learn feature vectors for biological entities based on their annotations to biomedical ontologies. Our method can be applied to a wide range of bioinformatics research problems such as similarity-based prediction of interactions between proteins, classification of interaction types using supervised learning, or clustering. We describe an approach to learn, in a term-rewriting setting, function definitions from input/output equations. By confining ourselves to structurally recursive definitions we obtain a fairly fast learning algorithm that often yields definitions close to intuitive expectations. We provide a Prolog prototype implementation of our approach, and indicate open issues of further investigation. Implicational bases are objects of interest in formal concept analysis and its applications. Unfortunately, even the smallest base, the Duquenne-Guigues base, has an exponential size in the worst case. In this paper, we use results on the average number of minimal transversals in random hypergraphs to show that the base of proper premises is, on average, of quasi-polynomial size. The paper analyzes the problem of judgments or preferences subsequent to initial analysis by autonomous agents in a hierarchical system where the higher level agents does not have access to group size information. We propose methods that reduce instances of preference reversal of the kind encountered in Simpson's paradox. We introduce and discuss, through a computational algebraic geometry approach, the automatic reasoning handling of propositions that are simultaneously true and false over some relevant collections of instances. A rigorous, algorithmic criterion is presented for detecting such cases, and its performance is exemplified through the implementation of this test on the dynamic geometry program GeoGebra. The article describes the technique for designing a domain ontology, shows the flowchart of algorithm design and example of constructing a fragment of the ontology of the subject area of Computer Science is considered. The article presents an overview of current specialized ontology engineering tools, as well as texts' annotation tools based on ontologies. The main functions and features of these tools, their advantages and disadvantages are discussed. A systematic comparative analysis of means for engineering ontologies is presented. This paper describes the design principles of methodology of knowledge-oriented information systems based on ontological approach. Such systems implement technology subject-oriented extraction of knowledge from the set of natural language texts and their formal and logical presentation and application processing Computing universal distributed representations of sentences is a fundamental task in natural language processing. We propose a method to learn such representations by encoding the suffixes of word sequences in a sentence and training on the Stanford Natural Language Inference (SNLI) dataset. We demonstrate the effectiveness of our approach by evaluating it on the SentEval benchmark, improving on existing approaches on several transfer tasks. In many neural models, new features as polynomial functions of existing ones are used to augment representations. Using the natural language inference task as an example, we investigate the use of scaled polynomials of degree 2 and above as matching features. We find that scaling degree 2 features has the highest impact on performance, reducing classification error by 5% in the best models. The Cognitive Theory of True Conditions (CTTC) is a proposal to design the implementation of cognitive abilities and to describe the model-theoretic semantics of symbolic cognitive architectures. The CTTC is formulated mathematically using the multi-optional many-sorted past present future(MMPPF) structures. This article discussed how decision-making processes are described in the CTTC. Bias and heterogeneity in peer assessment can lead to the issue of unfair scoring in the educational field. To deal with this problem, we propose a reference ranking method for an online peer assessment system using HodgeRank. Such a scheme provides instructors with an objective scoring reference based on mathematics. We present a formal language with expressions denoting general symbol structures and queries which access information in those structures. A sequence-to-sequence network processing this language learns to encode symbol structures and query them. The learned representation (approximately) shares a simple linearity property with theoretical techniques for performing this task. This article explores the ideas that went into George Boole's development of an algebra for logical inference in his book The Laws of Thought. We explore in particular his wife Mary Boole's claim that he was deeply influenced by Indian logic and argue that his work was more than a framework for processing propositions. By exploring parallels between his work and Indian logic, we are able to explain several peculiarities of this work. We interpret part of the experimental results of Shwartz-Ziv and Tishby [2017]. Inspired by these results, we established a conjecture of the dynamics of the machinary of deep neural network. This conjecture can be used to explain the counterpart result by Saxe et al. [2018]. In this paper, we study the model theoretical aspects of Weakly Aggregative Modal Logic (WAL), which is a collection of disguised polyadic modal logics with $n$-ary modalities whose arguments are all the same. We give a van-Benthem-Rosen characterization theorem of WAL based on an intuitive notion of bisimulation, and show that WAL has Craig Interpolation. Committee selection with diversity or distributional constraints is a ubiquitous problem. However, many of the formal approaches proposed so far have certain drawbacks including (1) computationally intractability in general, and (2) inability to suggest a solution for certain instances where the hard constraints cannot be met. We propose a practical and polynomial-time algorithm for diverse committee selection that draws on the idea of using soft bounds and satisfies natural axioms. This chapter offers an accessible introduction to the channel-based approach to Bayesian probability theory. This framework rests on algebraic and logical foundations, inspired by the methodologies of programming language semantics. It offers a uniform, structured and expressive language for describing Bayesian phenomena in terms of familiar programming concepts, like channel, predicate transformation and state transformation. The introduction also covers inference in Bayesian networks, which will be modelled by a suitable calculus of string diagrams. This paper discusses the problem of learning language from unprocessed text and speech signals, concentrating on the problem of learning a lexicon. In particular, it argues for a representation of language in which linguistic parameters like words are built by perturbing a composition of existing parameters. The power of this representation is demonstrated by several examples in text segmentation and compression, acquisition of a lexicon from raw speech, and the acquisition of mappings between text and artificial representations of meaning. A new three-stage computer artificial neural network model of the tip-of-the-tongue phenomenon is proposed. Each word's node is build from some interconnected learned auto-associative two-layer neural networks each of which represents separate word's semantic, lexical, or phonological components. The model synthesizes memory, psycholinguistic, and metamemory approaches, bridges speech errors and naming chronometry research traditions, and can explain quantitatively many tip-of-the-tongue effects. Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a recent approach to market-based RL and for the first time evaluate it in a toy POMDP setting. In this paper we expose the theoretical background underlying our current research. This consists in the development of behaviour-based knowledge systems, for closing the gaps between behaviour-based and knowledge-based systems, and also between the understandings of the phenomena they model. We expose the requirements and stages for developing behaviour-based knowledge systems and discuss their limits. We believe that these are necessary conditions for the development of higher order cognitive capacities, in artificial and natural cognitive systems. Multi-dimensional data classification is an important and challenging problem in many astro-particle experiments. Neural networks have proved to be versatile and robust in multi-dimensional data classification. In this article we shall study the classification of gamma from the hadrons for the MAGIC Experiment. Two neural networks have been used for the classification task. One is Multi-Layer Perceptron based on supervised learning and other is Self-Organising Map (SOM), which is based on unsupervised learning technique. The results have been shown and the possible ways of combining these networks have been proposed to yield better and faster classification results. This paper presents an artificial evolutionbased method for stereo image analysis and its application to real-time obstacle detection and avoidance for a mobile robot. It uses the Parisian approach, which consists here in splitting the representation of the robot's environment into a large number of simple primitives, the "flies", which are evolved following a biologically inspired scheme and give a fast, low-cost solution to the obstacle detection problem in mobile robotics. Pertaining to Agent-based Computational Economics (ACE), this work presents two models for the rise and downfall of speculative bubbles through an exchange price fixing based on double auction mechanisms. The first model is based on a finite time horizon context, where the expected dividends decrease along time. The second model follows the {\em greater fool} hypothesis; the agent behaviour depends on the comparison of the estimated risk with the greater fool's. Simulations shed some light on the influent parameters and the necessary conditions for the apparition of speculative bubbles in an asset market within the considered framework. This is to present work on modifying the Aleph ILP system so that it evaluates the hypothesised clauses in parallel by distributing the data-set among the nodes of a parallel or distributed machine. The paper briefly discusses MPI, the interface used to access message- passing libraries for parallel computers and clusters. It then proceeds to describe an extension of YAP Prolog with an MPI interface and an implementation of data-parallel clause evaluation for Aleph through this interface. The paper concludes by testing the data-parallel Aleph on artificially constructed data-sets. In this paper, we study instances of complex neural networks, i.e. neural netwo rks with complex topologies. We use Self-Organizing Map neural networks whose n eighbourhood relationships are defined by a complex network, to classify handwr itten digits. We show that topology has a small impact on performance and robus tness to neuron failures, at least at long learning times. Performance may howe ver be increased (by almost 10%) by artificial evolution of the network topo logy. In our experimental conditions, the evolved networks are more random than their parents, but display a more heterogeneous degree distribution. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. Many efforts have been made to predict protein subcellular localization. This paper aims to merge the artificial neural networks and bioinformatics to predict the location of protein in yeast genome. We introduce a new subcellular prediction method based on a backpropagation neural network. The results show that the prediction within an error limit of 5 to 10 percentage can be achieved with the system. The convergence properties of the stationary Fokker-Planck algorithm for the estimation of the asymptotic density of stochastic search processes is studied. Theoretical and empirical arguments for the characterization of convergence of the estimation in the case of separable and nonseparable nonlinear optimization problems are given. Some implications of the convergence of stationary Fokker-Planck learning for the inference of parameters in artificial neural network models are outlined. The management and combination of uncertain, imprecise, fuzzy and even paradoxical or high conflicting sources of information has always been, and still remains today, of primal importance for the development of reliable modern information systems involving artificial reasoning. In this introduction, we present a survey of our recent theory of plausible and paradoxical reasoning, known as Dezert-Smarandache Theory (DSmT), developed for dealing with imprecise, uncertain and conflicting sources of information. We focus our presentation on the foundations of DSmT and on its most important rules of combination, rather than on browsing specific applications of DSmT available in literature. Several simple examples are given throughout this presentation to show the efficiency and the generality of this new approach. The role of T-cells within the immune system is to confirm and assess anomalous situations and then either respond to or tolerate the source of the effect. To illustrate how these mechanisms can be harnessed to solve real-world problems, we present the blueprint of a T-cell inspired algorithm for computer security worm detection. We show how the three central T-cell processes, namely T-cell maturation, differentiation and proliferation, naturally map into this domain and further illustrate how such an algorithm fits into a complete immune inspired computer security system and framework. We discuss how to use a Genetic Regulatory Network as an evolutionary representation to solve a typical GP reinforcement problem, the pole balancing. The network is a modified version of an Artificial Regulatory Network proposed a few years ago, and the task could be solved only by finding a proper way of connecting inputs and outputs to the network. We show that the representation is able to generalize well over the problem domain, and discuss the performance of different models of this kind. Chaotic neural networks have received a great deal of attention these last years. In this paper we establish a precise correspondence between the so-called chaotic iterations and a particular class of artificial neural networks: global recurrent multi-layer perceptrons. We show formally that it is possible to make these iterations behave chaotically, as defined by Devaney, and thus we obtain the first neural networks proven chaotic. Several neural networks with different architectures are trained to exhibit a chaotical behavior. A multiagent system may be thought of as an artificial society of autonomous software agents and we can apply concepts borrowed from welfare economics and social choice theory to assess the social welfare of such an agent society. In this paper, we study an abstract negotiation framework where agents can agree on multilateral deals to exchange bundles of indivisible resources. We then analyse how these deals affect social welfare for different instances of the basic framework and different interpretations of the concept of social welfare itself. In particular, we show how certain classes of deals are both sufficient and necessary to guarantee that a socially optimal allocation of resources will be reached eventually. In this paper we present the experimental results of the neural network control of a servo-system in order to control its speed. The control strategy is implemented by using an inverse-model control based on Artificial Neural Networks (ANNs). The network training was performed using two learning algorithms: Levenberg-Marquardt and Bayesian regularization. We evaluate the generalization capability for each method according to both the correct operation of the controller to follow the reference signal, and the control efforts developed by the ANN-based controller. Unsupervised classification algorithm based on clonal selection principle named Unsupervised Clonal Selection Classification (UCSC) is proposed in this paper. The new proposed algorithm is data driven and self-adaptive, it adjusts its parameters to the data to make the classification operation as fast as possible. The performance of UCSC is evaluated by comparing it with the well known K-means algorithm using several artificial and real-life data sets. The experiments show that the proposed UCSC algorithm is more reliable and has high classification precision comparing to traditional classification methods such as K-means. The ideas about decision making under ignorance in economics are combined with the ideas about uncertainty representation in computer science. The combination sheds new light on the question of how artificial agents can act in a dynamically consistent manner. The notion of sequential consistency is formalized by adapting the law of iterated expectation for plausibility measures. The necessary and sufficient condition for a certainty equivalence operator for Nehring-Puppe's preference to be sequentially consistent is given. This result sheds light on the models of decision making under uncertainty. Discovering causal relations among observed variables in a given data set is a main topic in studies of statistics and artificial intelligence. Recently, some techniques to discover an identifiable causal structure have been explored based on non-Gaussianity of the observed data distribution. However, most of these are limited to continuous data. In this paper, we present a novel causal model for binary data and propose a new approach to derive an identifiable causal structure governing the data based on skew Bernoulli distributions of external noise. Experimental evaluation shows excellent performance for both artificial and real world data sets. We focus on a comparative study of three recently developed nature-inspired optimization algorithms, including state transition algorithm, harmony search and artificial bee colony. Their core mechanisms are introduced and their similarities and differences are described. Then, a suit of 27 well-known benchmark problems are used to investigate the performance of these algorithms and finally we discuss their general applicability with respect to the structure of optimization problems. Drawing from research on computational models of argumentation (particularly the Carneades Argumentation System), we explore the graphical representation of arguments in a dispute; then, comparing two different traditions on the limits of the justification of decisions, and devising an intermediate, semi-formal, model, we also show that it can shed light on the theory of dispute resolution. We conclude our paper with an observation on the usefulness of highly constrained reasoning for Online Dispute Resolution systems. Restricting the search space of arguments exclusively to reasons proposed by the parties (vetoing the introduction of new arguments by the human or artificial arbitrator) is the only way to introduce some kind of decidability -- together with foreseeability -- in the argumentation system. We present a method for constructing the log-optimal portfolio using the well-calibrated forecasts of market values. Dawid's notion of calibration and the Blackwell approachability theorem are used for computing well-calibrated forecasts. We select a portfolio using this "artificial" probability distribution of market values. Our portfolio performs asymptotically at least as well as any stationary portfolio that redistributes the investment at each round using a continuous function of side information. Unlike in classical mathematical finance theory, no stochastic assumptions are made about market values. Everyday activities performed by artificial assistants can potentially be executed naively and dangerously given their lack of common sense knowledge. This paper presents conceptual work towards obtaining prior knowledge on the usual modality (passive or active) of any given entity, and their affordance estimates, by extracting high-confidence ability modality semantic relations (X can Y relationship) from non-figurative texts, by analyzing co-occurrence of grammatical instances of subjects and verbs, and verbs and objects. The discussion includes an outline of the concept, potential and limitations, and possible feature and learning framework adoption. This paper deals with the problem of neural code solving. On the basis of the formulated hypotheses the information model of a neuron-detector is suggested, the detector being one of the basic elements of an artificial neural network (ANN). The paper subjects the connectionist paradigm of ANN building to criticism and suggests a new presentation paradigm for ANN building and neuroelements (NE) learning. The adequacy of the suggested model is proved by the fact that is does not contradict the modern propositions of neuropsychology and neurophysiology. We consider the effects of social learning on the individual learning and genetic evolution of a colony of artificial agents capable of genetic, individual and social modes of adaptation. We confirm that there is strong selection pressure to acquire traits of individual learning and social learning when these are adaptive traits. We show that selection pressure for learning of either kind can supress selection pressure for reproduction or greater fitness. We show that social learning differs from individual learning in that it can support a second evolutionary system that is decoupled from the biological evolutionary system. This decoupling leads to an emergent interaction where immature agents are more likely to engage in learning activities than mature agents. We propose a new approach for solving combinatorial optimization problem by utilizing the mechanism of chases and escapes, which has a long history in mathematics. In addition to the well-used steepest descent and neighboring search, we perform a chase and escape game on the "landscape" of the cost function. We have created a concrete algorithm for the Traveling Salesman Problem. Our preliminary test indicates a possibility that this new fusion of chases and escapes problem into combinatorial optimization search is fruitful. Although different learning systems are coordinated to afford complex behavior, little is known about how this occurs. This article describes a theoretical framework that specifies how complex behaviors that might be thought to require error-driven learning might instead be acquired through simple reinforcement. This framework includes specific assumptions about the mechanisms that contribute to the evolution of (artificial) neural networks to generate topologies that allow the networks to learn large-scale complex problems using only information about the quality of their performance. The practical and theoretical implications of the framework are discussed, as are possible biological analogs of the approach. In this position paper we present a novel approach to neurobiologically plausible implementation of emotional reactions and behaviors for real-time autonomous robotic systems. The working metaphor we use is the "day" and "night" phases of mammalian life. During the "day" phase a robotic system stores the inbound information and is controlled by a light-weight rule-based system in real time. In contrast to that, during the "night" phase the stored information is been transferred to the supercomputing system to update the realistic neural network: emotional and behavioral strategies. Recent approaches based on artificial neural networks (ANNs) have shown promising results for short-text classification. However, many short texts occur in sequences (e.g., sentences in a document or utterances in a dialog), and most existing ANN-based systems do not leverage the preceding short texts when classifying a subsequent one. In this work, we present a model based on recurrent neural networks and convolutional neural networks that incorporates the preceding short texts. Our model achieves state-of-the-art results on three different datasets for dialog act prediction. We formulate learning of a binary autoencoder as a biconvex optimization problem which learns from the pairwise correlations between encoded and decoded bits. Among all possible algorithms that use this information, ours finds the autoencoder that reconstructs its inputs with worst-case optimal loss. The optimal decoder is a single layer of artificial neurons, emerging entirely from the minimax loss minimization, and with weights learned by convex optimization. All this is reflected in competitive experimental results, demonstrating that binary autoencoding can be done efficiently by conveying information in pairwise correlations in an optimal fashion. In order to study the application of artificial intelligence (AI) to dental imaging, we applied AI technology to classify a set of panoramic radiographs using (a) a convolutional neural network (CNN) which is a form of an artificial neural network (ANN), (b) representative image cognition algorithms that implement scale-invariant feature transform (SIFT), and (c) histogram of oriented gradients (HOG). Existing models based on artificial neural networks (ANNs) for sentence classification often do not incorporate the context in which sentences appear, and classify sentences individually. However, traditional sentence classification approaches have been shown to greatly benefit from jointly classifying subsequent sentences, such as with conditional random fields. In this work, we present an ANN architecture that combines the effectiveness of typical ANN models to classify sentences in isolation, with the strength of structured prediction. Our model achieves state-of-the-art results on two different datasets for sequential sentence classification in medical abstracts. Over 50 million scholarly articles have been published: they constitute a unique repository of knowledge. In particular, one may infer from them relations between scientific concepts, such as synonyms and hyponyms. Artificial neural networks have been recently explored for relation extraction. In this work, we continue this line of work and present a system based on a convolutional neural network to extract relations. Our model ranked first in the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific articles (subtask C). We discuss problems with the standard approaches to evaluation for tasks like visual question answering, and argue that artificial data can be used to address these as a complement to current practice. We demonstrate that with the help of existing 'deep' linguistic processing technology we are able to create challenging abstract datasets, which enable us to investigate the language understanding abilities of multimodal deep learning models in detail. Rubenstein et al. present an interesting system of programmable self-assembled structure formation using 1000 Kilobot robots. The paper claims to advance work in artificial swarms similar to capabilities of natural systems besides being highly robust. However, the system lacks in terms of matching motility and complex shapes with holes, thereby limiting practical similarity to self-assembly in living systems. The ever increasing prevalence of publicly available structured data on the World Wide Web enables new applications in a variety of domains. In this paper, we provide a conceptual approach that leverages such data in order to explain the input-output behavior of trained artificial neural networks. We apply existing Semantic Web technologies in order to provide an experimental proof of concept. Cortical minicolumns are considered a model of cortical organization. Their function is still a source of research and not reflected properly in modern architecture of nets in algorithms of Artificial Intelligence. We assume its function and describe it in this article. Furthermore, we show how this proposal allows to construct a new architecture, that is not based on convolutional neural networks, test it on MNIST data and receive close to Convolutional Neural Network accuracy. We also show that the proposed architecture possesses an ability to train on a small quantity of samples. To achieve these results, we enable the minicolumns to remember context transformations. Literature involving preferences of artificial agents or human beings often assume their preferences can be represented using a complete transitive binary relation. Much has been written however on different models of preferences. We review some of the reasons that have been put forward to justify more complex modeling, and review some of the techniques that have been proposed to obtain models of such preferences. After providing a brief historical overview on the synergies between artificial intelligence research, in the areas of evolutionary computations and machine learning, and the optimal design of interplanetary trajectories, we propose and study the use of deep artificial neural networks to represent, on-board, the optimal guidance profile of an interplanetary mission. The results, limited to the chosen test case of an Earth-Mars orbital transfer, extend the findings made previously for landing scenarios and quadcopter dynamics, opening a new research area in interplanetary trajectory planning. The impression of free will is the feeling according to which our choices are neither imposed from our inside nor from outside. It is the sense we are the ultimate cause of our acts. In direct opposition with the universal determinism, the existence of free will continues to be discussed. In this paper, free will is linked to a decisional mechanism: an agent is provided with free will if having performed a predictable choice Cp, it can immediately perform another choice Cr in a random way. The intangible feeling of free will is replaced by a decision-making process including a predictable decision-making process immediately followed by an unpredictable decisional one. In events that are composed by many activities, there is a problem that involves retrieve and management the information of visitors that are visiting the activities. This management is crucial to find some activities that are drawing attention of visitors; identify an ideal positioning for activities; which path is more frequented by visitors. In this work, these features are studied using Complex Network theory. For the beginning, an artificial database was generated to study the mentioned features. Secondly, this work shows a method to optimize the event structure that is better than a random method and a recommendation system that achieves ~95% of accuracy. The next-generation wireless networks are evolving into very complex systems because of the very diversified service requirements, heterogeneity in applications, devices, and networks. The mobile network operators (MNOs) need to make the best use of the available resources, for example, power, spectrum, as well as infrastructures. Traditional networking approaches, i.e., reactive, centrally-managed, one-size-fits-all approaches and conventional data analysis tools that have limited capability (space and time) are not competent anymore and cannot satisfy and serve that future complex networks in terms of operation and optimization in a cost-effective way. A novel paradigm of proactive, self-aware, self- adaptive and predictive networking is much needed. The MNOs have access to large amounts of data, especially from the network and the subscribers. Systematic exploitation of the big data greatly helps in making the network smart, intelligent and facilitates cost-effective operation and optimization. In view of this, we consider a data-driven next-generation wireless network model, where the MNOs employ advanced data analytics for their networks. We discuss the data sources and strong drivers for the adoption of the data analytics and the role of machine learning, artificial intelligence in making the network intelligent in terms of being self-aware, self-adaptive, proactive and prescriptive. A set of network design and optimization schemes are presented with respect to data analytics. The paper is concluded with a discussion of challenges and benefits of adopting big data analytics and artificial intelligence in the next-generation communication system. Handwriting is one of the most important means of daily communication. Although the problem of handwriting recognition has been considered for more than 60 years there are still many open issues, especially in the task of unconstrained handwritten sentence recognition. This paper focuses on the automatic system that recognizes continuous English sentence through a mouse-based gestures in real-time based on Artificial Neural Network. The proposed Artificial Neural Network is trained using the traditional backpropagation algorithm for self supervised neural network which provides the system with great learning ability and thus has proven highly successful in training for feed-forward Artificial Neural Network. The designed algorithm is not only capable of translating discrete gesture moves, but also continuous gestures through the mouse. In this paper we are using the efficient neural network approach for recognizing English sentence drawn by mouse. This approach shows an efficient way of extracting the boundary of the English Sentence and specifies the area of the recognition English sentence where it has been drawn in an image and then used Artificial Neural Network to recognize the English sentence. The proposed approach English sentence recognition (ESR) system is designed and tested successfully. Experimental results show that the higher speed and accuracy were examined. We combine Artificial Immune Systems 'AIS', technology with Collaborative Filtering 'CF' and use it to build a movie recommendation system. We already know that Artificial Immune Systems work well as movie recommenders from previous work by Cayzer and Aickelin 3, 4, 5. Here our aim is to investigate the effect of different affinity measure algorithms for the AIS. Two different affinity measures, Kendalls Tau and Weighted Kappa, are used to calculate the correlation coefficients for the movie recommender. We compare the results with those published previously and show that Weighted Kappa is more suitable than others for movie problems. We also show that AIS are generally robust movie recommenders and that, as long as a suitable affinity measure is chosen, results are good. A combined Short-Term Learning (STL) and Long-Term Learning (LTL) approach to solving mobile robot navigation problems is presented and tested in both real and simulated environments. The LTL consists of rapid simulations that use a Genetic Algorithm to derive diverse sets of behaviours. These sets are then transferred to an idiotypic Artificial Immune System (AIS), which forms the STL phase, and the system is said to be seeded. The combined LTL-STL approach is compared with using STL only, and with using a handdesigned controller. In addition, the STL phase is tested when the idiotypic mechanism is turned off. The results provide substantial evidence that the best option is the seeded idiotypic system, i.e. the architecture that merges LTL with an idiotypic AIS for the STL. They also show that structurally different environments can be used for the two phases without compromising transferability As introduced by Bentley et al. (2005), artificial immune systems (AIS) are lacking tissue, which is present in one form or another in all living multi-cellular organisms. Some have argued that this concept in the context of AIS brings little novelty to the already saturated field of the immune inspired computational research. This article aims to show that such a component of an AIS has the potential to bring an advantage to a data processing algorithm in terms of data pre-processing, clustering and extraction of features desired by the immune inspired system. The proposed tissue algorithm is based on self-organizing networks, such as self-organizing maps (SOM) developed by Kohonen (1996) and an analogy of the so called Toll-Like Receptors (TLR) affecting the activation function of the clusters developed by the SOM. Dendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system. Research into this family of cells has revealed that they perform the role of coordinating T-cell based immune responses, both reactive and for generating tolerance. We have derived an algorithm based on the functionality of these cells, and have used the signals and differentiation pathways to build a control mechanism for an artificial immune system. We present our algorithmic details in addition to some preliminary results, where the algorithm was applied for the purpose of anomaly detection. We hope that this algorithm will eventually become the key component within a large, distributed immune system, based on sound immunological concepts. In this paper we present an efficient computer aided mass classification method in digitized mammograms using Artificial Neural Network (ANN), which performs benign-malignant classification on region of interest (ROI) that contains mass. One of the major mammographic characteristics for mass classification is texture. ANN exploits this important factor to classify the mass into benign or malignant. The statistical textural features used in characterizing the masses are mean, standard deviation, entropy, skewness, kurtosis and uniformity. The main aim of the method is to increase the effectiveness and efficiency of the classification process in an objective manner to reduce the numbers of false-positive of malignancies. Three layers artificial neural network (ANN) with seven features was proposed for classifying the marked regions into benign and malignant and 90.91% sensitivity and 83.87% specificity is achieved that is very much promising compare to the radiologist's sensitivity 75%. An effective procedure to determine the optimal parameters appearing in artificial flockings is proposed in terms of optimization problems. We numerically examine genetic algorithms (GAs) to determine the optimal set of parameters such as the weights for three essential interactions in BOIDS by Reynolds (1987) under `zero-collision' and `no-breaking-up' constraints. As a fitness function (the energy function) to be maximized by the GA, we choose the so-called the $\gamma$-value of anisotropy which can be observed empirically in typical flocks of starling. We confirm that the GA successfully finds the solution having a large $\gamma$-value leading-up to a strong anisotropy. The numerical experience shows that the procedure might enable us to make more realistic and efficient artificial flocking of starling even in our personal computers. We also evaluate two distinct types of interactions in agents, namely, metric and topological definitions of interactions. We confirmed that the topological definition can explain the empirical evidence much better than the metric definition does. Both humans and artificial systems frequently use trial and error methods to problem solving. In order to be effective, this type of strategy implies having high quality control knowledge to guide the quest for the optimal solution. Unfortunately, this control knowledge is rarely perfect. Moreover, in artificial systems-as in humans-self-evaluation of one's own knowledge is often difficult. Yet, this self-evaluation can be very useful to manage knowledge and to determine when to revise it. The objective of our work is to propose an automated approach to evaluate the quality of control knowledge in artificial systems based on a specific trial and error strategy, namely the informed tree search strategy. Our revision approach consists in analysing the system's execution logs, and in using the belief theory to evaluate the global quality of the knowledge. We present a real-world industrial application in the form of an experiment using this approach in the domain of cartographic generalisation. Thus far, the results of using our approach have been encouraging. Back-propagation algorithm is one of the most widely used and popular techniques to optimize the feed forward neural network training. Nature inspired meta-heuristic algorithms also provide derivative-free solution to optimize complex problem. Artificial bee colony algorithm is a nature inspired meta-heuristic algorithm, mimicking the foraging or food source searching behaviour of bees in a bee colony and this algorithm is implemented in several applications for an improved optimized outcome. The proposed method in this paper includes an improved artificial bee colony algorithm based back-propagation neural network training method for fast and improved convergence rate of the hybrid neural network learning method. The result is analysed with the genetic algorithm based back-propagation method, and it is another hybridized procedure of its kind. Analysis is performed over standard data sets, reflecting the light of efficiency of proposed method in terms of convergence speed and rate. Landmines, specifically anti-tank mines, cluster bombs, and unexploded ordnance form a serious problem in many countries. Several landmine sweeping techniques are used for minesweeping. This paper presents the design and the implementation of the vision system of an autonomous robot for landmines localization. The proposed work develops state-of-the-art techniques in digital image processing for pre-processing captured images of the contaminated area. After enhancement, Artificial Neural Network (ANN) is used in order to identify, recognize and classify the landmines' make and model. The Back-Propagation algorithm is used for training the network. The proposed work proved to be able to identify and classify different types of landmines under various conditions (rotated landmine, partially covered landmine) with a success rate of up to 90%. This document is written with the intention to describe in detail a method and means by which a computer program can reason about the world and in so doing, increase its analogue to a living system. As the literature is rife and it is apparent we, as scientists and engineers, have not found the solution, this document will attempt the solution by grounding its intellectual arguments within tenets of human cognition in Western philosophy. The result will be a characteristic description of a method to describe an artificial system analogous to that performed for a human. The approach was the substance of my Master's thesis, explored more deeply during the course of my postdoc research. It focuses primarily on context awareness and choice set within a boundary of available epistemology, which serves to describe it. Expanded upon, such a description strives to discover agreement with Kant's critique of reason to understand how it could be applied to define the architecture of its design. The intention has never been to mimic human or biological systems, rather, to understand the profoundly fundamental rules, when leveraged correctly, results in an artificial consciousness as noumenon while in keeping with the perception of it as phenomenon. Due to imprecision and uncertainties in predicting real world problems, artificial neural network (ANN) techniques have become increasingly useful for modeling and optimization. This paper presents an artificial neural network approach for forecasting electric energy consumption. For effective planning and operation of power systems, optimal forecasting tools are needed for energy operators to maximize profit and also to provide maximum satisfaction to energy consumers. Monthly data for electric energy consumed in the Gaza strip was collected from year 1994 to 2013. Data was trained and the proposed model was validated using 2-Fold and K-Fold cross validation techniques. The model has been tested with actual energy consumption data and yields satisfactory performance. Personalization is the process of fitting a model to patient data, a critical step towards application of multi-physics computational models in clinical practice. Designing robust personalization algorithms is often a tedious, time-consuming, model- and data-specific process. We propose to use artificial intelligence concepts to learn this task, inspired by how human experts manually perform it. The problem is reformulated in terms of reinforcement learning. In an off-line phase, Vito, our self-taught artificial agent, learns a representative decision process model through exploration of the computational model: it learns how the model behaves under change of parameters. The agent then automatically learns an optimal strategy for on-line personalization. The algorithm is model-independent; applying it to a new model requires only adjusting few hyper-parameters of the agent and defining the observations to match. The full knowledge of the model itself is not required. Vito was tested in a synthetic scenario, showing that it could learn how to optimize cost functions generically. Then Vito was applied to the inverse problem of cardiac electrophysiology and the personalization of a whole-body circulation model. The obtained results suggested that Vito could achieve equivalent, if not better goodness of fit than standard methods, while being more robust (up to 11% higher success rates) and with faster (up to seven times) convergence rate. Our artificial intelligence approach could thus make personalization algorithms generalizable and self-adaptable to any patient and any model. The human intelligence lies in the algorithm, the nature of algorithm lies in the classification, and the classification is equal to outlier detection. A lot of algorithms have been proposed to detect outliers, meanwhile a lot of definitions. Unsatisfying point is that definitions seem vague, which makes the solution an ad hoc one. We analyzed the nature of outliers, and give two clear definitions. We then develop an efficient RDD algorithm, which converts outlier problem to pattern and degree problem. Furthermore, a collapse mechanism was introduced by IIR algorithm, which can be united seamlessly with the RDD algorithm and serve for the final decision. Both algorithms are originated from the study on general AI. The combined edition is named as Pe algorithm, which is the basis of the intelligent decision. Here we introduce longest k-turn subsequence problem and corresponding solution as an example to interpret the function of Pe algorithm in detecting curve-type outliers. We also give a comparison between IIR algorithm and Pe algorithm, where we can get a better understanding at both algorithms. A short discussion about intelligence is added to demonstrate the function of the Pe algorithm. Related experimental results indicate its robustness. Problem: This paper addresses the design of an intelligent software system for the IC (incident commander) of a team in order to coordinate actions of agents (field units or robots) in the domain of emergency/crisis response operations. Objective: This paper proposes GICoordinator. It is a GIS-based assistant software agent that assists and collaborates with the human planner in strategic planning and macro tasks assignment for centralized multi-agent coordination. Method: Our approach to design GICoordinator was to: analyze the problem, design a complete data model, design an architecture of GICoordinator, specify required capabilities of human and system in coordination problem solving, specify development tools, and deploy. Result: The result was an architecture/design of GICoordinator that contains system requirements. Findings: GICoordinator efficiently integrates geoinformatics with artifice intelligent techniques in order to provide a spatial intelligent coordinator system for an IC to efficiently coordinate and control agents by making macro/strategic decisions. Results define a framework for future works to develop this system. The sociotechnological system is a system constituted of human individuals and their artifacts: technological artifacts, institutions, conceptual and representational systems, worldviews, knowledge systems, culture and the whole biosphere as a volutionary niche. In our view the sociotechnological system as a super-organism is shaped and determined both by the characteristics of the agents involved and the characteristics emergent in their interactions at multiple scales. Our approach to sociotechnological dynamics will maintain a balance between perspectives: the individual and the collective. Accordingly, we analyze dynamics of the Web as a sociotechnological system made of people, computers and digital artifacts (Web pages, databases, search engines, etc.). Making sense of the sociotechnological system while being part of it, is also a constant interplay between pragmatic and value based approaches. The first is focusing on the actualities of the system while the second highlights the observer's projections. In our attempt to model sociotechnological dynamics and envision its future, we take special care to make explicit our values as part of the analysis. In sociotechnological systems with a high degree of reflexivity (coupling between the perception of the system and the system's behavior), highlighting values is of critical importance. In this essay, we choose to see the future evolution of the web as facilitating a basic value, that is, continuous open-ended intelligence expansion. By that we mean that we see intelligence expansion as the determinant of the 'greater good' and 'well being' of both of individuals and collectives at all scales. Our working definition of intelligence here is the progressive process of sense-making of self, other, environment and universe. Intelligence expansion, therefore, means an increasing ability of sense-making. With the advancement of huge data generation and data handling capability, Machine Learning and Probabilistic modelling enables an immense opportunity to employ predictive analytics platform in high security critical industries namely data centers, electricity grids, utilities, airport etc. where downtime minimization is one of the primary objectives. This paper proposes a novel, complete architecture of an intelligent predictive analytics platform, Fault Engine, for huge device network connected with electrical/information flow. Three unique modules, here proposed, seamlessly integrate with available technology stack of data handling and connect with middleware to produce online intelligent prediction in critical failure scenarios. The Markov Failure module predicts the severity of a failure along with survival probability of a device at any given instances. The Root Cause Analysis model indicates probable devices as potential root cause employing Bayesian probability assignment and topological sort. Finally, a community detection algorithm produces correlated clusters of device in terms of failure probability which will further narrow down the search space of finding route cause. The whole Engine has been tested with different size of network with simulated failure environments and shows its potential to be scalable in real-time implementation. The biological immune system is a robust, complex, adaptive system that defends the body from foreign pathogens. It is able to categorize all cells (or molecules) within the body as self or non-self substances. It does this with the help of a distributed task force that has the intelligence to take action from a local and also a global perspective using its network of chemical messengers for communication. There are two major branches of the immune system. The innate immune system is an unchanging mechanism that detects and destroys certain invading organisms, whilst the adaptive immune system responds to previously unknown foreign cells and builds a response to them that can remain in the body over a long period of time. This remarkable information processing biological system has caught the attention of computer science in recent years. A novel computational intelligence technique, inspired by immunology, has emerged, called Artificial Immune Systems. Several concepts from the immune system have been extracted and applied for solution to real world science and engineering problems. In this tutorial, we briefly describe the immune system metaphors that are relevant to existing Artificial Immune Systems methods. We will then show illustrative real-world problems suitable for Artificial Immune Systems and give a step-by-step algorithm walkthrough for one such problem. A comparison of the Artificial Immune Systems to other well-known algorithms, areas for future work, tips & tricks and a list of resources will round this tutorial off. It should be noted that as Artificial Immune Systems is still a young and evolving field, there is not yet a fixed algorithm template and hence actual implementations might differ somewhat from time to time and from those examples given here. Riddles based on simple puns can be classified according to the patterns of word, syllable or phrase similarity they depend upon. We have devised a formal model of the semantic and syntactic regularities underlying some of the simpler types of punning riddle. We have also implemented this preliminary theory in a computer program which can generate riddles from a lexicon containing general data about words and phrases; that is, the lexicon content is not customised to produce jokes. Informal evaluation of the program's results by a set of human judges suggest that the riddles produced by this program are of comparable quality to those in general circulation among school children. In this paper, we discuss a model of simple question-answer punning, implemented in a program, JAPE, which generates riddles from humour-independent lexical entries. The model uses two main types of structure: schemata, which determine the relationships between key words in a joke, and templates, which produce the surface form of the joke. JAPE succeeds in generating pieces of text that are recognizably jokes, but some of them are not very good jokes. We mention some potential improvements and extensions, including post-production heuristics for ordering the jokes according to quality. We present an integrated architecture for word-level and sentence-level processing in a unification-based paradigm. The core of the system is a CLP implementation of a unification engine for feature structures supporting relational values. In this framework an HPSG-style grammar is implemented. Word-level processing uses X2MorF, a morphological component based on an extended version of two-level morphology. This component is tightly integrated with the grammar as a relation. The advantage of this approach is that morphology and syntax are kept logically autonomous while at the same time minimizing interface problems. This paper is an introduction to natural language interfaces to databases (NLIDBs). A brief overview of the history of NLIDBs is first given. Some advantages and disadvantages of NLIDBs are then discussed, comparing NLIDBs to formal query languages, form-based interfaces, and graphical interfaces. An introduction to some of the linguistic problems NLIDBs have to confront follows, for the benefit of readers less familiar with computational linguistics. The discussion then moves on to NLIDB architectures, portability issues, restricted natural language input systems (including menu-based NLIDBs), and NLIDBs with reasoning capabilities. Some less explored areas of NLIDB research are then presented, namely database updates, meta-knowledge questions, temporal questions, and multi-modal NLIDBs. The paper ends with reflections on the current state of the art. The Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results. We propose a general-purpose method for finding high-quality solutions to hard optimization problems, inspired by self-organizing processes often found in nature. The method, called Extremal Optimization, successively eliminates extremely undesirable components of sub-optimal solutions. Drawing upon models used to simulate far-from-equilibrium dynamics, it complements approximation methods inspired by equilibrium statistical physics, such as Simulated Annealing. With only one adjustable parameter, its performance proves competitive with, and often superior to, more elaborate stochastic optimization procedures. We demonstrate it here on two classic hard optimization problems: graph partitioning and the traveling salesman problem. We describe an extensive study of search in GSAT, an approximation procedure for propositional satisfiability. GSAT performs greedy hill-climbing on the number of satisfied clauses in a truth assignment. Our experiments provide a more complete picture of GSAT's search than previous accounts. We describe in detail the two phases of search: rapid hill-climbing followed by a long plateau search. We demonstrate that when applied to randomly generated 3SAT problems, there is a very simple scaling with problem size for both the mean number of satisfied clauses and the mean branching rate. Our results allow us to make detailed numerical conjectures about the length of the hill-climbing phase, the average gradient of this phase, and to conjecture that both the average score and average branching rate decay exponentially during plateau search. We end by showing how these results can be used to direct future theoretical analysis. This work provides a case study of how computer experiments can be used to improve understanding of the theoretical properties of algorithms. As real logic programmers normally use cut (!), an effective learning procedure for logic programs should be able to deal with it. Because the cut predicate has only a procedural meaning, clauses containing cut cannot be learned using an extensional evaluation method, as is done in most learning systems. On the other hand, searching a space of possible programs (instead of a space of independent clauses) is unfeasible. An alternative solution is to generate first a candidate base program which covers the positive examples, and then make it consistent by inserting cut where appropriate. The problem of learning programs with cut has not been investigated before and this seems to be a natural and reasonable approach. We generalize this scheme and investigate the difficulties that arise. Some of the major shortcomings are actually caused, in general, by the need for intensional evaluation. As a conclusion, the analysis of this paper suggests, on precise and technical grounds, that learning cut is difficult, and current induction techniques should probably be restricted to purely declarative logic languages. To support the goal of allowing users to record and retrieve information, this paper describes an interactive note-taking system for pen-based computers with two distinctive features. First, it actively predicts what the user is going to write. Second, it automatically constructs a custom, button-box user interface on request. The system is an example of a learning-apprentice software- agent. A machine learning component characterizes the syntax and semantics of the user's information. A performance system uses this learned information to generate completion strings and construct a user interface. Description of Online Appendix: People like to record information. Doing this on paper is initially efficient, but lacks flexibility. Recording information on a computer is less efficient but more powerful. In our new note taking softwre, the user records information directly on a computer. Behind the interface, an agent acts for the user. To help, it provides defaults and constructs a custom user interface. The demonstration is a QuickTime movie of the note taking agent in action. The file is a binhexed self-extracting archive. Macintosh utilities for binhex are available from mac.archive.umich.edu. QuickTime is available from ftp.apple.com in the dts/mac/sys.soft/quicktime. Terminological knowledge representation systems (TKRSs) are tools for designing and using knowledge bases that make use of terminological languages (or concept languages). We analyze from a theoretical point of view a TKRS whose capabilities go beyond the ones of presently available TKRSs. The new features studied, often required in practical applications, can be summarized in three main points. First, we consider a highly expressive terminological language, called ALCNR, including general complements of concepts, number restrictions and role conjunction. Second, we allow to express inclusion statements between general concepts, and terminological cycles as a particular case. Third, we prove the decidability of a number of desirable TKRS-deduction services (like satisfiability, subsumption and instance checking) through a sound, complete and terminating calculus for reasoning in ALCNR-knowledge bases. Our calculus extends the general technique of constraint systems. As a byproduct of the proof, we get also the result that inclusion statements in ALCNR can be simulated by terminological cycles, if descriptive semantics is adopted. A formalism is presented for computing and organizing actions for autonomous agents in dynamic environments. We introduce the notion of teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent action is based. In addition to continuous feedback, T-R programs support parameter binding and recursion. A primary difference between T-R programs and many other circuit-based systems is that the circuitry of T-R programs is more compact; it is constructed at run time and thus does not have to anticipate all the contingencies that might arise over all possible runs. In addition, T-R programs are intuitive and easy to write and are written in a form that is compatible with automatic planning and learning methods. We briefly describe some experimental applications of T-R programs in the control of simulated and actual mobile robots. The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value. The theory revision problem is the problem of how best to go about revising a deficient domain theory using information contained in examples that expose inaccuracies. In this paper we present our approach to the theory revision problem for propositional domain theories. The approach described here, called PTR, uses probabilities associated with domain theory elements to numerically track the ``flow'' of proof through the theory. This allows us to measure the precise role of a clause or literal in allowing or preventing a (desired or undesired) derivation for a given example. This information is used to efficiently locate and repair flawed elements of the theory. PTR is proved to converge to a theory which correctly classifies all examples, and shown experimentally to be fast and accurate even for deep theories. This paper analyzes the correctness of the subsumption algorithm used in CLASSIC, a description logic-based knowledge representation system that is being used in practical applications. In order to deal efficiently with individuals in CLASSIC descriptions, the developers have had to use an algorithm that is incomplete with respect to the standard, model-theoretic semantics for description logics. We provide a variant semantics for descriptions with respect to which the current implementation is complete, and which can be independently motivated. The soundness and completeness of the polynomial-time subsumption algorithm is established using description graphs, which are an abstracted version of the implementation structures used in CLASSIC, and are of independent interest. Information extraction is the task of automatically picking up information of interest from an unconstrained text. Information of interest is usually extracted in two steps. First, sentence level processing locates relevant pieces of information scattered throughout the text; second, discourse processing merges coreferential information to generate the output. In the first step, pieces of information are locally identified without recognizing any relationships among them. A key word search or simple pattern search can achieve this purpose. The second step requires deeper knowledge in order to understand relationships among separately identified pieces of information. Previous information extraction systems focused on the first step, partly because they were not required to link up each piece of information with other pieces. To link the extracted pieces of information and map them onto a structured output format, complex discourse processing is essential. This paper reports on a Japanese information extraction system that merges information using a pattern matcher and discourse processor. Evaluation results show a high level of system performance which approaches human performance. The vast amounts of on-line text now available have led to renewed interest in information extraction (IE) systems that analyze unrestricted text, producing a structured representation of selected information from the text. This paper presents a novel approach that uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logical relations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not only automatically decides what classifiers are needed, but even derives the feature set for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain. This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms. For many years, the intuitions underlying partial-order planning were largely taken for granted. Only in the past few years has there been renewed interest in the fundamental principles underlying this paradigm. In this paper, we present a rigorous comparative analysis of partial-order and total-order planning by focusing on two specific planners that can be directly compared. We show that there are some subtle assumptions that underly the wide-spread intuitions regarding the supposed efficiency of partial-order planning. For instance, the superiority of partial-order planning can depend critically upon the search strategy and the structure of the search space. Understanding the underlying assumptions is crucial for constructing efficient planners. The paradigms of transformational planning, case-based planning, and plan debugging all involve a process known as plan adaptation - modifying or repairing an old plan so it solves a new problem. In this paper we provide a domain-independent algorithm for plan adaptation, demonstrate that it is sound, complete, and systematic, and compare it to other adaptation algorithms in the literature. Our approach is based on a view of planning as searching a graph of partial plans. Generative planning starts at the graph's root and moves from node to node using plan-refinement operators. In planning by adaptation, a library plan - an arbitrary node in the plan graph - is the starting point for the search, and the plan-adaptation algorithm can apply both the same refinement operators available to a generative planner and can also retract constraints and steps from the plan. Our algorithm's completeness ensures that the adaptation algorithm will eventually search the entire graph and its systematicity ensures that it will do so without redundantly searching any parts of the graph. Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning. The DNA promoter sequences domain theory and database have become popular for testing systems that integrate empirical and analytical learning. This note reports a simple change and reinterpretation of the domain theory in terms of M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on the 106 items of the database. Moreover, an exhaustive search of the space of M-of-N domain theory interpretations indicates that the expected accuracy of a randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2% is achieved in 12 cases. This demonstrates the informativeness of the domain theory, without the complications of understanding the interactions between various learning algorithms and the theory. In addition, our results help characterize the difficulty of learning using the DNA promoters theory. This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search. Many studies have been carried out in order to increase the search efficiency of constraint satisfaction problems; among them, some make use of structural properties of the constraint network; others take into account semantic properties of the constraints, generally assuming that all the constraints possess the given property. In this paper, we propose a new decomposition method benefiting from both semantic properties of functional constraints (not bijective constraints) and structural properties of the network; furthermore, not all the constraints need to be functional. We show that under some conditions, the existence of solutions can be guaranteed. We first characterize a particular subset of the variables, which we name a root set. We then introduce pivot consistency, a new local consistency which is a weak form of path consistency and can be achieved in O(n^2d^2) complexity (instead of O(n^3d^3) for path consistency), and we present associated properties; in particular, we show that any consistent instantiation of the root set can be linearly extended to a solution, which leads to the presentation of the aforementioned new method for solving by decomposing functional CSPs. We study the process of multi-agent reinforcement learning in the context of load balancing in a distributed system, without use of either central coordination or explicit communication. We first define a precise framework in which to study adaptive load balancing, important features of which are its stochastic nature and the purely local information available to individual agents. Given this framework, we show illuminating results on the interplay between basic adaptive behavior parameters and their effect on system efficiency. We then investigate the properties of adaptive load balancing in heterogeneous populations, and address the issue of exploration vs. exploitation in that context. Finally, we show that naive use of communication may not improve, and might even harm system efficiency. We present algorithms that learn certain classes of function-free recursive logic programs in polynomial time from equivalence queries. In particular, we show that a single k-ary recursive constant-depth determinate clause is learnable. Two-clause programs consisting of one learnable recursive clause and one constant-depth determinate non-recursive clause are also learnable, if an additional ``basecase'' oracle is assumed. These results immediately imply the pac-learnability of these classes. Although these classes of learnable recursive programs are very constrained, it is shown in a companion paper that they are maximally general, in that generalizing either class in any natural way leads to a computationally difficult learning problem. Thus, taken together with its companion paper, this paper establishes a boundary of efficient learnability for recursive logic programs. In a companion paper it was shown that the class of constant-depth determinate k-ary recursive clauses is efficiently learnable. In this paper we present negative results showing that any natural generalization of this class is hard to learn in Valiant's model of pac-learnability. In particular, we show that the following program classes are cryptographically hard to learn: programs with an unbounded number of constant-depth linear recursive clauses; programs with one constant-depth determinate clause containing an unbounded number of recursive calls; and programs with one linear recursive clause of constant locality. These results immediately imply the non-learnability of any more general class of programs. We also show that learning a constant-depth determinate program with either two linear recursive clauses or one linear recursive clause and one non-recursive clause is as hard as learning boolean DNF. Together with positive results from the companion paper, these negative results establish a boundary of efficient learnability for recursive function-free clauses. This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called FOIDL, is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. FOIDL is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic). ion is one of the most promising approaches to improve the performance of problem solvers. In several domains abstraction by dropping sentences of a domain description -- as used in most hierarchical planners -- has proven useful. In this paper we present examples which illustrate significant drawbacks of abstraction by dropping sentences. To overcome these drawbacks, we propose a more general view of abstraction involving the change of representation language. We have developed a new abstraction methodology and a related sound and complete learning algorithm that allows the complete change of representation language of planning cases from concrete to abstract. However, to achieve a powerful change of the representation language, the abstract language itself as well as rules which describe admissible ways of abstracting states must be provided in the domain model. This new abstraction approach is the core of Paris (Plan Abstraction and Refinement in an Integrated System), a system in which abstract planning cases are automatically learned from given concrete cases. An empirical study in the domain of process planning in mechanical engineering shows significant advantages of the proposed reasoning from abstract cases over classical hierarchical planning. Identifying inaccurate data has long been regarded as a significant and difficult problem in AI. In this paper, we present a new method for identifying inaccurate data on the basis of qualitative correlations among related data. First, we introduce the definitions of related data and qualitative correlations among related data. Then we put forward a new concept called support coefficient function (SCF). SCF can be used to extract, represent, and calculate qualitative correlations among related data within a dataset. We propose an approach to determining dynamic shift intervals of inaccurate data, and an approach to calculating possibility of identifying inaccurate data, respectively. Both of the approaches are based on SCF. Finally we present an algorithm for identifying inaccurate data by using qualitative correlations among related data as confirmatory or disconfirmatory evidence. We have developed a practical system for interpreting infrared spectra by applying the method, and have fully tested the system against several hundred real spectra. The experimental results show that the method is significantly better than the conventional methods used in many similar systems. This paper studies the problem of ergodicity of transition probability matrices in Markovian models, such as hidden Markov models (HMMs), and how it makes very difficult the task of learning to represent long-term context for sequential data. This phenomenon hurts the forward propagation of long-term context information, as well as learning a hidden state representation to represent long-term context, which depends on propagating credit information backwards in time. Using results from Markov chain theory, we show that this problem of diffusion of context and credit is reduced when the transition probabilities approach 0 or 1, i.e., the transition probability matrices are sparse and the model essentially deterministic. The results found in this paper apply to learning approaches based on continuous optimization, such as gradient descent and the Baum-Welch algorithm. Functionality-based recognition systems recognize objects at the category level by reasoning about how well the objects support the expected function. Such systems naturally associate a ``measure of goodness'' or ``membership value'' with a recognized object. This measure of goodness is the result of combining individual measures, or membership values, from potentially many primitive evaluations of different properties of the object's shape. A membership function is used to compute the membership value when evaluating a primitive of a particular physical property of an object. In previous versions of a recognition system known as Gruff, the membership function for each of the primitive evaluations was hand-crafted by the system designer. In this paper, we provide a learning component for the Gruff system, called Omlet, that automatically learns membership functions given a set of example objects labeled with their desired category measure. The learning algorithm is generally applicable to any problem in which low-level membership values are combined through an and-or tree structure to give a final overall membership value. This paper presents an approach to learning from situated, interactive tutorial instruction within an ongoing agent. Tutorial instruction is a flexible (and thus powerful) paradigm for teaching tasks because it allows an instructor to communicate whatever types of knowledge an agent might need in whatever situations might arise. To support this flexibility, however, the agent must be able to learn multiple kinds of knowledge from a broad range of instructional interactions. Our approach, called situated explanation, achieves such learning through a combination of analytic and inductive techniques. It combines a form of explanation-based learning that is situated for each instruction with a full suite of contextually guided responses to incomplete explanations. The approach is implemented in an agent called Instructo-Soar that learns hierarchies of new tasks and other domain knowledge from interactive natural language instructions. Instructo-Soar meets three key requirements of flexible instructability that distinguish it from previous systems: (1) it can take known or unknown commands at any instruction point; (2) it can handle instructions that apply to either its current situation or to a hypothetical situation specified in language (as in, for instance, conditional instructions); and (3) it can learn, from instructions, each class of knowledge it uses to perform tasks. The main aim of this work is the development of a vision-based road detection system fast enough to cope with the difficult real-time constraints imposed by moving vehicle applications. The hardware platform, a special-purpose massively parallel system, has been chosen to minimize system production and operational costs. This paper presents a novel approach to expectation-driven low-level image segmentation, which can be mapped naturally onto mesh-connected massively parallel SIMD architectures capable of handling hierarchical data structures. The input image is assumed to contain a distorted version of a given template; a multiresolution stretching process is used to reshape the original template in accordance with the acquired image content, minimizing a potential function. The distorted template is the process output. In the area of inductive learning, generalization is a main operation, and the usual definition of induction is based on logical implication. Recently there has been a rising interest in clausal representation of knowledge in machine learning. Almost all inductive learning systems that perform generalization of clauses use the relation theta-subsumption instead of implication. The main reason is that there is a well-known and simple technique to compute least general generalizations under theta-subsumption, but not under implication. However generalization under theta-subsumption is inappropriate for learning recursive clauses, which is a crucial problem since recursion is the basic program structure of logic programs. We note that implication between clauses is undecidable, and we therefore introduce a stronger form of implication, called T-implication, which is decidable between clauses. We show that for every finite set of clauses there exists a least general generalization under T-implication. We describe a technique to reduce generalizations under implication of a clause to generalizations under theta-subsumption of what we call an expansion of the original clause. Moreover we show that for every non-tautological clause there exists a T-complete expansion, which means that every generalization under T-implication of the clause is reduced to a generalization under theta-subsumption of the expansion. Characteristic models are an alternative, model based, representation for Horn expressions. It has been shown that these two representations are incomparable and each has its advantages over the other. It is therefore natural to ask what is the cost of translating, back and forth, between these representations. Interestingly, the same translation questions arise in database theory, where it has applications to the design of relational databases. This paper studies the computational complexity of these problems. Our main result is that the two translation problems are equivalent under polynomial reductions, and that they are equivalent to the corresponding decision problem. Namely, translating is equivalent to deciding whether a given set of models is the set of characteristic models for a given Horn expression. We also relate these problems to the hypergraph transversal problem, a well known problem which is related to other applications in AI and for which no polynomial time algorithm is known. It is shown that in general our translation problems are at least as hard as the hypergraph transversal problem, and in a special case they are equivalent to it. Many applications -- from planning and scheduling to problems in molecular biology -- rely heavily on a temporal reasoning component. In this paper, we discuss the design and empirical analysis of algorithms for a temporal reasoning system based on Allen's influential interval-based framework for representing temporal information. At the core of the system are algorithms for determining whether the temporal information is consistent, and, if so, finding one or more scenarios that are consistent with the temporal information. Two important algorithms for these tasks are a path consistency algorithm and a backtracking algorithm. For the path consistency algorithm, we develop techniques that can result in up to a ten-fold speedup over an already highly optimized implementation. For the backtracking algorithm, we develop variable and value ordering heuristics that are shown empirically to dramatically improve the performance of the algorithm. As well, we show that a previously suggested reformulation of the backtracking search problem can reduce the time and space requirements of the backtracking search. Taken together, the techniques we develop allow a temporal reasoning component to solve problems that are of practical size. Traditional databases commonly support efficient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a first step toward dynamic reasoning in probabilistic databases with comparable efficiency. We propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks. In the conventional algorithm, new evidence is absorbed in O(1) time and queries are processed in time O(N), where N is the size of the network. We propose an algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(log N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases. We briefly discuss a potential application of dynamic probabilistic reasoning in computational biology. Termination of logic programs with negated body atoms (here called general logic programs) is an important topic. One reason is that many computational mechanisms used to process negated atoms, like Clark's negation as failure and Chan's constructive negation, are based on termination conditions. This paper introduces a methodology for proving termination of general logic programs w.r.t. the Prolog selection rule. The idea is to distinguish parts of the program depending on whether or not their termination depends on the selection rule. To this end, the notions of low-, weakly up-, and up-acceptable program are introduced. We use these notions to develop a methodology for proving termination of general logic programs, and show how interesting problems in non-monotonic reasoning can be formalized and implemented by means of terminating general logic programs. This paper presents new experimental evidence against the utility of Occam's razor. A~systematic procedure is presented for post-processing decision trees produced by C4.5. This procedure was derived by rejecting Occam's razor and instead attending to the assumption that similar objects are likely to belong to the same class. It increases a decision tree's complexity without altering the performance of that tree on the training data from which it is inferred. The resulting more complex decision trees are demonstrated to have, on average, for a variety of common learning tasks, higher predictive accuracy than the less complex original decision trees. This result raises considerable doubt about the utility of Occam's razor as it is commonly applied in modern machine learning. The main operations in Inductive Logic Programming (ILP) are generalization and specialization, which only make sense in a generality order. In ILP, the three most important generality orders are subsumption, implication and implication relative to background knowledge. The two languages used most often are languages of clauses and languages of only Horn clauses. This gives a total of six different ordered languages. In this paper, we give a systematic treatment of the existence or non-existence of least generalizations and greatest specializations of finite sets of clauses in each of these six ordered sets. We survey results already obtained by others and also contribute some answers of our own. Our main new results are, firstly, the existence of a computable least generalization under implication of every finite set of clauses containing at least one non-tautologous function-free clause (among other, not necessarily function-free clauses). Secondly, we show that such a least generalization need not exist under relative implication, not even if both the set that is to be generalized and the background knowledge are function-free. Thirdly, we give a complete discussion of existence and non-existence of greatest specializations in each of the six ordered languages. This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning. Although most scheduling problems are NP-hard, domain specific techniques perform well in practice but are quite expensive to construct. In adaptive problem-solving solving, domain specific knowledge is acquired automatically for a general problem solver with a flexible control architecture. In this approach, a learning system explores a space of possible heuristic methods for one well-suited to the eccentricities of the given domain and problem distribution. In this article, we discuss an application of the approach to scheduling satellite communications. Using problem distributions based on actual mission requirements, our approach identifies strategies that not only decrease the amount of CPU time required to produce schedules, but also increase the percentage of problems that are solvable within computational resource limitations. Speedup learning seeks to improve the computational efficiency of problem solving with experience. In this paper, we develop a formal framework for learning efficient problem solving from random problems and their solutions. We apply this framework to two different representations of learned knowledge, namely control rules and macro-operators, and prove theorems that identify sufficient conditions for learning in each representation. Our proofs are constructive in that they are accompanied with learning algorithms. Our framework captures both empirical and explanation-based speedup learning in a unified fashion. We illustrate our framework with implementations in two domains: symbolic integration and Eight Puzzle. This work integrates many strands of experimental and theoretical work in machine learning, including empirical learning of control rules, macro-operator learning, Explanation-Based Learning (EBL), and Probably Approximately Correct (PAC) Learning. A fundamental assumption made by classical AI planners is that there is no uncertainty in the world: the planner has full knowledge of the conditions under which the plan will be executed and the outcome of every action is fully predictable. These planners cannot therefore construct contingency plans, i.e., plans in which different actions are performed in different circumstances. In this paper we discuss some issues that arise in the representation and construction of contingency plans and describe Cassandra, a partial-order contingency planner. Cassandra uses explicit decision-steps that enable the agent executing the plan to decide which plan branch to follow. The decision-steps in a plan result in subgoals to acquire knowledge, which are planned for in the same way as any other subgoals. Cassandra thus distinguishes the process of gathering information from the process of making decisions. The explicit representation of decisions in Cassandra allows a coherent approach to the problems of contingent planning, and provides a solid base for extensions such as the use of different decision-making procedures. An important problem in geometric reasoning is to find the configuration of a collection of geometric bodies so as to satisfy a set of given constraints. Recently, it has been suggested that this problem can be solved efficiently by symbolically reasoning about geometry. This approach, called degrees of freedom analysis, employs a set of specialized routines called plan fragments that specify how to change the configuration of a set of bodies to satisfy a new constraint while preserving existing constraints. A potential drawback, which limits the scalability of this approach, is concerned with the difficulty of writing plan fragments. In this paper we address this limitation by showing how these plan fragments can be automatically synthesized using first principles about geometric bodies, actions, and topology. Motivated by the control theoretic distinction between controllable and uncontrollable events, we distinguish between two types of agents within a multi-agent system: controllable agents, which are directly controlled by the system's designer, and uncontrollable agents, which are not under the designer's direct control. We refer to such systems as partially controlled multi-agent systems, and we investigate how one might influence the behavior of the uncontrolled agents through appropriate design of the controlled agents. In particular, we wish to understand which problems are naturally described in these terms, what methods can be applied to influence the uncontrollable agents, the effectiveness of such methods, and whether similar methods work across different domains. Using a game-theoretic framework, this paper studies the design of partially controlled multi-agent systems in two contexts: in one context, the uncontrollable agents are expected utility maximizers, while in the other they are reinforcement learners. We suggest different techniques for controlling agents' behavior in each domain, assess their success, and examine their relationship. Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods. A new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as ``or'', ``sum'' or ``max'', on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms. Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes. An algorithm that learns from a set of examples should ideally be able to exploit the available resources of (a) abundant computing power and (b) domain-specific knowledge to improve its ability to generalize. Connectionist theory-refinement systems, which use background knowledge to select a neural network's topology and initial weights, have proven to be effective at exploiting domain-specific knowledge; however, most do not exploit available computing power. This weakness occurs because they lack the ability to refine the topology of the neural networks they produce, thereby limiting generalization, especially when given impoverished domain theories. We present the REGENT algorithm which uses (a) domain-specific knowledge to help create an initial population of knowledge-based neural networks and (b) genetic operators of crossover and mutation (specifically designed for knowledge-based networks) to continually search for better network topologies. Experiments on three real-world domains indicate that our new algorithm is able to significantly increase generalization compared to a standard connectionist theory-refinement system, as well as our previous algorithm for growing knowledge-based networks. Several recent studies have compared the relative efficiency of alternative flaw selection strategies for partial-order causal link (POCL) planning. We review this literature, and present new experimental results that generalize the earlier work and explain some of the discrepancies in it. In particular, we describe the Least-Cost Flaw Repair (LCFR) strategy developed and analyzed by Joslin and Pollack (1994), and compare it with other strategies, including Gerevini and Schubert's (1996) ZLIFO strategy. LCFR and ZLIFO make very different, and apparently conflicting claims about the most effective way to reduce search-space size in POCL planning. We resolve this conflict, arguing that much of the benefit that Gerevini and Schubert ascribe to the LIFO component of their ZLIFO strategy is better attributed to other causes. We show that for many problems, a strategy that combines least-cost flaw selection with the delay of separable threats will be effective in reducing search-space size, and will do so without excessive computational overhead. Although such a strategy thus provides a good default, we also show that certain domain characteristics may reduce its effectiveness. The easy-hard-easy pattern in the difficulty of combinatorial search problems as constraints are added has been explained as due to a competition between the decrease in number of solutions and increased pruning. We test the generality of this explanation by examining one of its predictions: if the number of solutions is held fixed by the choice of problems, then increased pruning should lead to a monotonic decrease in search cost. Instead, we find the easy-hard-easy pattern in median search cost even when the number of solutions is held constant, for some search methods. This generalizes previous observations of this pattern and shows that the existing theory does not explain the full range of the peak in search cost. In these cases the pattern appears to be due to changes in the size of the minimal unsolvable subproblems, rather than changing numbers of solutions. SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences. A fundamental goal of research in molecular biology is to understand protein structure. Protein crystallography is currently the most successful method for determining the three-dimensional (3D) conformation of a protein, yet it remains labor intensive and relies on an expert's ability to derive and evaluate a protein scene model. In this paper, the problem of protein structure determination is formulated as an exercise in scene analysis. A computational methodology is presented in which a 3D image of a protein is segmented into a graph of critical points. Bayesian and certainty factor approaches are described and used to analyze critical point graphs and identify meaningful substructures, such as alpha-helices and beta-sheets. Results of applying the methodologies to protein images at low and medium resolution are reported. The research is related to approaches to representation, segmentation and classification in vision, as well as to top-down approaches to protein structure prediction. Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms. Local search algorithms for combinatorial search problems frequently encounter a sequence of states in which it is impossible to improve the value of the objective function; moves through these regions, called plateau moves, dominate the time spent in local search. We analyze and characterize plateaus for three different classes of randomly generated Boolean Satisfiability problems. We identify several interesting features of plateaus that impact the performance of local search algorithms. We show that local minima tend to be small but occasionally may be very large. We also show that local minima can be escaped without unsatisfying a large number of clauses, but that systematically searching for an escape route may be computationally expensive if the local minimum is large. We show that plateaus with exits, called benches, tend to be much larger than minima, and that some benches have very few exit states which local search can use to escape. We show that the solutions (i.e., global minima) of randomly generated problem instances form clusters, which behave similarly to local minima. We revisit several enhancements of local search algorithms and explain their performance in light of our results. Finally we discuss strategies for creating the next generation of local search algorithms. The assessment of bidirectional heuristic search has been incorrect since it was first published more than a quarter of a century ago. For quite a long time, this search strategy did not achieve the expected results, and there was a major misunderstanding about the reasons behind it. Although there is still wide-spread belief that bidirectional heuristic search is afflicted by the problem of search frontiers passing each other, we demonstrate that this conjecture is wrong. Based on this finding, we present both a new generic approach to bidirectional heuristic search and a new approach to dynamically improving heuristic values that is feasible in bidirectional search only. These approaches are put into perspective with both the traditional and more recently proposed approaches in order to facilitate a better overall understanding. Empirical results of experiments with our new approaches show that bidirectional heuristic search can be performed very efficiently and also with limited memory. These results suggest that bidirectional heuristic search appears to be better for solving certain difficult problems than corresponding unidirectional search. This provides some evidence for the usefulness of a search strategy that was long neglected. In summary, we show that bidirectional heuristic search is viable and consequently propose that it be reconsidered. In this paper we consider the problem of `theory patching', in which we are given a domain theory, some of whose components are indicated to be possibly flawed, and a set of labeled training examples for the domain concept. The theory patching problem is to revise only the indicated components of the theory, such that the resulting theory correctly classifies all the training examples. Theory patching is thus a type of theory revision in which revisions are made to individual components of the theory. Our concern in this paper is to determine for which classes of logical domain theories the theory patching problem is tractable. We consider both propositional and first-order domain theories, and show that the theory patching problem is equivalent to that of determining what information contained in a theory is `stable' regardless of what revisions might be performed to the theory. We show that determining stability is tractable if the input theory satisfies two conditions: that revisions to each theory component have monotonic effects on the classification of examples, and that theory components act independently in the classification of examples in the theory. We also show how the concepts introduced can be used to determine the soundness and completeness of particular theory patching algorithms. This paper presents a comprehensive approach for model-based diagnosis which includes proposals for characterizing and computing preferred diagnoses, assuming that the system description is augmented with a system structure (a directed graph explicating the interconnections between system components). Specifically, we first introduce the notion of a consequence, which is a syntactically unconstrained propositional sentence that characterizes all consistency-based diagnoses and show that standard characterizations of diagnoses, such as minimal conflicts, correspond to syntactic variations on a consequence. Second, we propose a new syntactic variation on the consequence known as negation normal form (NNF) and discuss its merits compared to standard variations. Third, we introduce a basic algorithm for computing consequences in NNF given a structured system description. We show that if the system structure does not contain cycles, then there is always a linear-size consequence in NNF which can be computed in linear time. For arbitrary system structures, we show a precise connection between the complexity of computing consequences and the topology of the underlying system structure. Finally, we present an algorithm that enumerates the preferred diagnoses characterized by a consequence. The algorithm is shown to take linear time in the size of the consequence if the preference criterion satisfies some general conditions. One of the most common mechanisms used for speeding up problem solvers is macro-learning. Macros are sequences of basic operators acquired during problem solving. Macros are used by the problem solver as if they were basic operators. The major problem that macro-learning presents is the vast number of macros that are available for acquisition. Macros increase the branching factor of the search space and can severely degrade problem-solving efficiency. To make macro learning useful, a program must be selective in acquiring and utilizing macros. This paper describes a general method for selective acquisition of macros. Solvable training problems are generated in increasing order of difficulty. The only macros acquired are those that take the problem solver out of a local minimum to a better state. The utility of the method is demonstrated in several domains, including the domain of NxN sliding-tile puzzles. After learning on small puzzles, the system is able to efficiently solve puzzles of any size. The standard approach to logic in the literature in philosophy and mathematics, which has also been adopted in computer science, is to define a language (the syntax), an appropriate class of models together with an interpretation of formulas in the language (the semantics), a collection of axioms and rules of inference characterizing reasoning (the proof theory), and then relate the proof theory to the semantics via soundness and completeness results. Here we consider an approach that is more common in the economics literature, which works purely at the semantic, set-theoretic level. We provide set-theoretic completeness results for a number of epistemic and conditional logics, and contrast the expressive power of the syntactic and set-theoretic approaches We examine the computational complexity of testing and finding small plans in probabilistic planning domains with both flat and propositional representations. The complexity of plan evaluation and existence varies with the plan type sought; we examine totally ordered plans, acyclic plans, and looping plans, and partially ordered plans under three natural definitions of plan value. We show that problems of interest are complete for a variety of complexity classes: PL, P, NP, co-NP, PP, NP^PP, co-NP^PP, and PSPACE. In the process of proving that certain planning problems are complete for NP^PP, we introduce a new basic NP^PP-complete problem, E-MAJSAT, which generalizes the standard Boolean satisfiability problem to computations involving probabilistic quantities; our results suggest that the development of good heuristics for E-MAJSAT could be important for the creation of efficient algorithms for a wide variety of problems. We address the issues of semantics and conversations for agent communication languages and the Knowledge Query Manipulation Language (KQML) in particular. Based on ideas from speech act theory, we present a semantic description for KQML that associates ``cognitive'' states of the agent with the use of the language's primitives (performatives). We have used this approach to describe the semantics for the whole set of reserved KQML performatives. Building on the semantics, we devise the conversation policies, i.e., a formal description of how KQML performatives may be combined into KQML exchanges (conversations), using a Definite Clause Grammar. Our research offers methods for a speech act theory-based semantic description of a language of communication acts and for the specification of the protocols associated with these acts. Languages of communication acts address the issue of communication among software applications at a level of abstraction that is useful to the emerging software agents paradigm. We present our approach to the problem of how an agent, within an economic Multi-Agent System, can determine when it should behave strategically (i.e. learn and use models of other agents), and when it should act as a simple price-taker. We provide a framework for the incremental implementation of modeling capabilities in agents, and a description of the forms of knowledge required. The agents were implemented and different populations simulated in order to learn more about their behavior and the merits of using and learning agent models. Our results show, among other lessons, how savvy buyers can avoid being ``cheated'' by sellers, how price volatility can be used to quantitatively predict the benefits of deeper models, and how specific types of agent populations influence system behavior. The paper presents a constructive fixpoint semantics for autoepistemic logic (AEL). This fixpoint characterizes a unique but possibly three-valued belief set of an autoepistemic theory. It may be three-valued in the sense that for a subclass of formulas F, the fixpoint may not specify whether F is believed or not. The paper presents a constructive 3-valued semantics for autoepistemic logic (AEL). We introduce a derivation operator and define the semantics as its least fixpoint. The semantics is 3-valued in the sense that, for some formulas, the least fixpoint does not specify whether they are believed or not. We show that complete fixpoints of the derivation operator correspond to Moore's stable expansions. In the case of modal representations of logic programs our least fixpoint semantics expresses well-founded semantics or 3-valued Fitting-Kunen semantics (depending on the embedding used). We show that, computationally, our semantics is simpler than the semantics proposed by Moore (assuming that the polynomial hierarchy does not collapse). This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on "management succession events" and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%. ACLP is a system which combines abductive reasoning and constraint solving by integrating the frameworks of Abductive Logic Programming (ALP) and Constraint Logic Programming (CLP). It forms a general high-level knowledge representation environment for abductive problems in Artificial Intelligence and other areas. In ACLP, the task of abduction is supported and enhanced by its non-trivial integration with constraint solving facilitating its application to complex problems. The ACLP system is currently implemented on top of the CLP language of ECLiPSe as a meta-interpreter exploiting its underlying constraint solver for finite domains. It has been applied to the problems of planning and scheduling in order to test its computational effectiveness compared with the direct use of the (lower level) constraint solving framework of CLP on which it is built. These experiments provide evidence that the abductive framework of ACLP does not compromise significantly the computational efficiency of the solutions. Other experiments show the natural ability of ACLP to accommodate easily and in a robust way new or changing requirements of the original problem. We study here the well-known propagation rules for Boolean constraints. First we propose a simple notion of completeness for sets of such rules and establish a completeness result. Then we show an equivalence in an appropriate sense between Boolean constraint propagation and unit propagation, a form of resolution for propositional logic. Subsequently we characterize one set of such rules by means of the notion of hyper-arc consistency introduced in (Mohr and Masini 1988). Also, we clarify the status of a similar, though different, set of rules introduced in (Simonis 1989a) and more fully in (Codognet and Diaz 1996). This paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar-based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusing information about the comparison between both methods appearing in the related literature. In doing so, several directions have been explored, including: testing several modifications of the basic learning algorithms and varying the feature space. Secondly, an improvement of both algorithms is proposed, in order to deal with large attribute sets. This modification, which basically consists in using only the positive information appearing in the examples, allows to improve greatly the efficiency of the methods, with no loss in accuracy. The experiments have been performed on the largest sense-tagged corpus available containing the most frequent and ambiguous English words. Results show that the Exemplar-based approach to WSD is generally superior to the Bayesian approach, especially when a specific metric for dealing with symbolic attributes is used. Rational inference relations were introduced by Lehmann and Magidor as the ideal systems for drawing conclusions from a conditional base. However, there has been no simple characterization of these relations, other than its original representation by preferential models. In this paper, we shall characterize them with a class of total preorders of formulas by improving and extending Gardenfors and Makinson's results for expectation inference relations. A second representation is application-oriented and is obtained by considering a class of consequence operators that grade sets of defaults according to our reliance on them. The finitary fragment of this class of consequence operators has been employed by recent default logic formalisms based on maxiconsistency. In this paper, an application of automated theorem proving techniques to computational semantics is considered. In order to compute the presuppositions of a natural language discourse, several inference tasks arise. Instead of treating these inferences independently of each other, we show how integrating techniques from formal approaches to context into deduction can help to compute presuppositions more efficiently. Contexts are represented as Discourse Representation Structures and the way they are nested is made explicit. In addition, a tableau calculus is present which keeps track of contextual information, and thereby allows to avoid carrying out redundant inference steps as it happens in approaches that neglect explicit nesting of contexts. Many logic programming based approaches can be used to describe and solve combinatorial search problems. On the one hand there is constraint logic programming which computes a solution as an answer substitution to a query containing the variables of the constraint satisfaction problem. On the other hand there are systems based on stable model semantics, abductive systems, and first order logic model generators which compute solutions as models of some theory. This paper compares these different approaches from the point of view of knowledge representation (how declarative are the programs) and from the point of view of performance (how good are they at solving typical problems). We present an approach for modelling the structure and coarse content of legal documents with a view to providing automated support for the drafting of contracts and contract database retrieval. The approach is designed to be applicable where contract drafting is based on model-form contracts or on existing examples of a similar type. The main features of the approach are: (1) the representation addresses the structure and the interrelationships between the constituent parts of contracts, but not the text of the document itself; (2) the representation of documents is separated from the mechanisms that manipulate it; and (3) the drafting process is subject to a collection of explicitly stated constraints that govern the structure of the documents. We describe the representation of document instances and of 'generic documents', which are data structures used to drive the creation of new document instances, and we show extracts from a sample session to illustrate the features of a prototype system implemented in MacProlog. The lexicographic closure of any given finite set D of normal defaults is defined. A conditional assertion "if a then b" is in this lexicographic closure if, given the defaults D and the fact a, one would conclude b. The lexicographic closure is essentially a rational extension of D, and of its rational closure, defined in a previous paper. It provides a logic of normal defaults that is different from the one proposed by R. Reiter and that is rich enough not to require the consideration of non-normal defaults. A large number of examples are provided to show that the lexicographic closure corresponds to the basic intuitions behind Reiter's logic of defaults. We analyze the problem of defining well-founded semantics for ordered logic programs within a general framework based on alternating fixpoint theory. We start by showing that generalizations of existing answer set approaches to preference are too weak in the setting of well-founded semantics. We then specify some informal yet intuitive criteria and propose a semantical framework for preference handling that is more suitable for defining well-founded semantics for ordered logic programs. The suitability of the new approach is convinced by the fact that many attractive properties are satisfied by our semantics. In particular, our semantics is still correct with respect to various existing answer sets semantics while it successfully overcomes the weakness of their generalization to well-founded semantics. Finally, we indicate how an existing preferred well-founded semantics can be captured within our semantical framework. We implement Groenendijk and Stokhof's partition semantics of questions in a simple question answering algorithm. The algorithm is sound, complete, and based on tableau theorem proving. The algorithm relies on a syntactic characterization of answerhood: Any answer to a question is equivalent to some formula built up only from instances of the question. We prove this characterization by translating the logic of interrogation to classical predicate logic and applying Craig's interpolation theorem. Recent advances in Multiagent Systems (MAS) and Epistemic Logic within Distributed Systems Theory, have used various combinatorial structures that model both the geometry of the systems and the Kripke model structure of models for the logic. Examining one of the simpler versions of these models, interpreted systems, and the related Kripke semantics of the logic $S5_n$ (an epistemic logic with $n$-agents), the similarities with the geometric / homotopy theoretic structure of groupoid atlases is striking. These latter objects arise in problems within algebraic K-theory, an area of algebra linked to the study of decomposition and normal form theorems in linear algebra. They have a natural well structured notion of path and constructions of path objects, etc., that yield a rich homotopy theory. Part of the theory of logic programming and nonmonotonic reasoning concerns the study of fixed-point semantics for these paradigms. Several different semantics have been proposed during the last two decades, and some have been more successful and acknowledged than others. The rationales behind those various semantics have been manifold, depending on one's point of view, which may be that of a programmer or inspired by commonsense reasoning, and consequently the constructions which lead to these semantics are technically very diverse, and the exact relationships between them have not yet been fully understood. In this paper, we present a conceptually new method, based on level mappings, which allows to provide uniform characterizations of different semantics for logic programs. We will display our approach by giving new and uniform characterizations of some of the major semantics, more particular of the least model semantics for definite programs, of the Fitting semantics, and of the well-founded semantics. A novel characterization of the weakly perfect model semantics will also be provided. We relate two formerly independent areas: Formal concept analysis and logic of domains. We will establish a correspondene between contextual attribute logic on formal contexts resp. concept lattices and a clausal logic on coherent algebraic cpos. We show how to identify the notion of formal concept in the domain theoretic setting. In particular, we show that a special instance of the resolution rule from the domain logic coincides with the concept closure operator from formal concept analysis. The results shed light on the use of contexts and domains for knowledge representation and reasoning purposes. Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets. In most real-world settings, due to limited time or other resources, an agent cannot perform all potentially useful deliberation and information gathering actions. This leads to the metareasoning problem of selecting such actions. Decision-theoretic methods for metareasoning have been studied in AI, but there are few theoretical results on the complexity of metareasoning. We derive hardness results for three settings which most real metareasoning systems would have to encompass as special cases. In the first, the agent has to decide how to allocate its deliberation time across anytime algorithms running on different problem instances. We show this to be $\mathcal{NP}$-complete. In the second, the agent has to (dynamically) allocate its deliberation or information gathering resources across multiple actions that it has to choose among. We show this to be $\mathcal{NP}$-hard even when evaluating each individual action is extremely simple. In the third, the agent has to (dynamically) choose a limited number of deliberation or information gathering actions to disambiguate the state of the world. We show that this is $\mathcal{NP}$-hard under a natural restriction, and $\mathcal{PSPACE}$-hard in general. Object oriented constraint programs (OOCPs) emerge as a leading evolution of constraint programming and artificial intelligence, first applied to a range of industrial applications called configuration problems. The rich variety of technical approaches to solving configuration problems (CLP(FD), CC(FD), DCSP, Terminological systems, constraint programs with set variables ...) is a source of difficulty. No universally accepted formal language exists for communicating about OOCPs, which makes the comparison of systems difficult. We present here a Z based specification of OOCPs which avoids the falltrap of hidden object semantics. The object system is part of the specification, and captures all of the most advanced notions from the object oriented modeling standard UML. The paper illustrates these issues and the conciseness and precision of Z by the specification of a working OOCP that solves an historical AI problem : parsing a context free grammar. Being written in Z, an OOCP specification also supports formal proofs. The whole builds the foundation of an adaptative and evolving framework for communicating about constrained object models and programs. A new approach to the construction of general persistent polyhierarchical classifications is proposed. It is based on implicit description of category polyhierarchy by a generating polyhierarchy of classification criteria. Similarly to existing approaches, the classification categories are defined by logical functions encoded by attributive expressions. However, the generating hierarchy explicitly predefines domains of criteria applicability, and the semantics of relations between categories is invariant to changes in the universe composition, extending variety of criteria, and increasing their cardinalities. The generating polyhierarchy is an independent, compact, portable, and re-usable information structure serving as a template classification. It can be associated with one or more particular sets of objects, included in more general classifications as a standard component, or used as a prototype for more comprehensive classifications. The approach dramatically simplifies development and unplanned modifications of persistent hierarchical classifications compared with tree, DAG, and faceted schemes. It can be efficiently implemented in common DBMS, while considerably reducing amount of computer resources required for storage, maintenance, and use of complex polyhierarchies. In this paper we present NLML (Natural Language Markup Language), a markup language to describe the syntactic and semantic structure of any grammatically correct English expression. At first the related works are analyzed to demonstrate the necessity of the NLML: simple form, easy management and direct storage. Then the description of the English grammar with NLML is introduced in details in three levels: sentences (with different complexities, voices, moods, and tenses), clause (relative clause and noun clause) and phrase (noun phrase, verb phrase, prepositional phrase, adjective phrase, adverb phrase and predicate phrase). At last the application fields of the NLML in NLP are shown with two typical examples: NLOJM (Natural Language Object Modal in Java) and NLDB (Natural Language Database). This paper presents a methodology for summarization from multiple documents which are about a specific topic. It is based on the specification and identification of the cross-document relations that occur among textual elements within those documents. Our methodology involves the specification of the topic-specific entities, the messages conveyed for the specific entities by certain textual elements and the specification of the relations that can hold among these messages. The above resources are necessary for setting up a specific topic for our query-based summarization approach which uses these resources to identify the query-specific messages within the documents and the query-specific relations that connect these messages across documents. Neuro-fuzzy systems have attracted growing interest of researchers in various scientific and engineering areas due to the increasing need of intelligent systems. This paper evaluates the use of two popular soft computing techniques and conventional statistical approach based on Box--Jenkins autoregressive integrated moving average (ARIMA) model to predict electricity demand in the State of Victoria, Australia. The soft computing methods considered are an evolving fuzzy neural network (EFuNN) and an artificial neural network (ANN) trained using scaled conjugate gradient algorithm (CGA) and backpropagation (BP) algorithm. The forecast accuracy is compared with the forecasts used by Victorian Power Exchange (VPX) and the actual energy demand. To evaluate, we considered load demand patterns for 10 consecutive months taken every 30 min for training the different prediction models. Test results show that the neuro-fuzzy system performed better than neural networks, ARIMA model and the VPX forecasts. The aim of our research was to apply well-known data mining techniques (such as linear neural networks, multi-layered perceptrons, probabilistic neural networks, classification and regression trees, support vector machines and finally a hybrid decision tree neural network approach) to the problem of predicting the quality of service in call centers; based on the performance data actually collected in a call center of a large insurance company. Our aim was two-fold. First, to compare the performance of models built using the above-mentioned techniques and, second, to analyze the characteristics of the input sensitivity in order to better understand the relationship between the perform-ance evaluation process and the actual performance and in this way help improve the performance of call centers. In this paper we summarize our findings. The framework of algorithmic knowledge assumes that agents use algorithms to compute the facts they explicitly know. In many cases of interest, a deductive system, rather than a particular algorithm, captures the formal reasoning used by the agents to compute what they explicitly know. We introduce a logic for reasoning about both implicit and explicit knowledge with the latter defined with respect to a deductive system formalizing a logical theory for agents. The highly structured nature of deductive systems leads to very natural axiomatizations of the resulting logic when interpreted over any fixed deductive system. The decision problem for the logic, in the presence of a single agent, is NP-complete in general, no harder than propositional logic. It remains NP-complete when we fix a deductive system that is decidable in nondeterministic polynomial time. These results extend in a straightforward way to multiple agents. Defeasible argumentation has experienced a considerable growth in AI in the last decade. Theoretical results have been combined with development of practical applications in AI & Law, Case-Based Reasoning and various knowledge-based systems. However, the dialectical process associated with inference is computationally expensive. This paper focuses on speeding up this inference process by pruning the involved search space. Our approach is twofold. On one hand, we identify distinguished literals for computing defeat. On the other hand, we restrict ourselves to a subset of all possible conflicting arguments by introducing dialectical constraints. Niching enables a genetic algorithm (GA) to maintain diversity in a population. It is particularly useful when the problem has multiple optima where the aim is to find all or as many as possible of these optima. When the fitness landscape of a problem changes overtime, the problem is called non--stationary, dynamic or time--variant problem. In these problems, niching can maintain useful solutions to respond quickly, reliably and accurately to a change in the environment. In this paper, we present a niching method that works on the problem substructures rather than the whole solution, therefore it has less space complexity than previously known niching mechanisms. We show that the method is responding accurately when environmental changes occur. This paper explores the concept of engagement, the process by which individuals in an interaction start, maintain and end their perceived connection to one another. The paper reports on one aspect of engagement among human interactors--the effect of tracking faces during an interaction. It also describes the architecture of a robot that can participate in conversational, collaborative interactions with engagement gestures. Finally, the paper reports on findings of experiments with human participants who interacted with a robot when it either performed or did not perform engagement gestures. Results of the human-robot studies indicate that people become engaged with robots: they direct their attention to the robot more often in interactions where engagement gestures are present, and they find interactions more appropriate when engagement gestures are present than when they are not. We define a class of probabilistic models in terms of an operator algebra of stochastic processes, and a representation for this class in terms of stochastic parameterized grammars. A syntactic specification of a grammar is mapped to semantics given in terms of a ring of operators, so that grammatical composition corresponds to operator addition or multiplication. The operators are generators for the time-evolution of stochastic processes. Within this modeling framework one can express data clustering models, logic programs, ordinary and stochastic differential equations, graph grammars, and stochastic chemical reaction kinetics. This mathematical formulation connects these apparently distant fields to one another and to mathematical methods from quantum field theory and operator algebra. This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time O(m^4), m being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees. Computing and storing probabilities is a hard problem as soon as one has to deal with complex distributions over multiple random variables. The problem of efficient representation of probability distributions is central in term of computational efficiency in the field of probabilistic reasoning. The main problem arises when dealing with joint probability distributions over a set of random variables: they are always represented using huge probability arrays. In this paper, a new method based on binary-tree representation is introduced in order to store efficiently very large joint distributions. Our approach approximates any multidimensional joint distributions using an adaptive discretization of the space. We make the assumption that the lower is the probability mass of a particular region of feature space, the larger is the discretization step. This assumption leads to a very optimized representation in term of time and memory. The other advantages of our approach are the ability to refine dynamically the distribution every time it is needed leading to a more accurate representation of the probability distribution and to an anytime representation of the distribution. Imagination is the critical point in developing of realistic artificial intelligence (AI) systems. One way to approach imagination would be simulation of its properties and operations. We developed two models: AI-Brain Network Hierarchy of Languages and Semantical Holographic Calculus as well as simulation system ScriptWriter that emulate the process of imagination through an automatic animation of English texts. The purpose of this paper is to demonstrate the model and to present ScriptWriter system http://nvo.sdsc.edu/NVO/JCSG/get_SRB_mime_file2.cgi//home/tamara.sdsc/test/demo.zip?F=/home/tamara.sdsc/test/demo.zip&M=application/x-gtar for simulation of the imagination. Norms are essential to extend inference: inferences based on norms are far richer than those based on logical implications. In the recent decades, much effort has been devoted to reason on a domain, once its norms are represented. How to extract and express those norms has received far less attention. Extraction is difficult: as the readers are supposed to know them, the norms of a domain are seldom made explicit. For one thing, extracting norms requires a language to represent them, and this is the topic of this paper. We apply this language to represent norms in the domain of driving, and show that it is adequate to reason on the causes of accidents, as described by car-crash reports. In case-based reasoning, the adaptation of a source case in order to solve the target problem is at the same time crucial and difficult to implement. The reason for this difficulty is that, in general, adaptation strongly depends on domain-dependent knowledge. This fact motivates research on adaptation knowledge acquisition (AKA). This paper presents an approach to AKA based on the principles and techniques of knowledge discovery from databases and data-mining. It is implemented in CABAMAKA, a system that explores the variations within the case base to elicit adaptation knowledge. This system has been successfully tested in an application of case-based reasoning to decision support in the domain of breast cancer treatment. Of all the issues discussed at {\em Alife VII: Looking Forward, Looking Backward}, the issue of whether it was possible to create an artificial life system that exhibits {\em open-ended evolution} of novelty is by far the biggest. Of the 14 open problems settled on as a result of debate at the conference, some 6 are directly, or indirectly related to this issue. Most people equate open-ended evolution with complexity growth, although a priori these seem to be different things. In this paper I report on experiments to measure the complexity of Tierran organisms, and show the results for a {\em size-neutral} run of Tierra. In this run, no increase in organismal complexity was observed, although organism size did increase through the run. This result is discussed, offering some signposts on path to solving the issue of open ended evolution. In this paper, we employ Probabilistic Neural Network (PNN) with image and data processing techniques to implement a general purpose automated leaf recognition algorithm. 12 leaf features are extracted and orthogonalized into 5 principal variables which consist the input vector of the PNN. The PNN is trained by 1800 leaves to classify 32 kinds of plants with an accuracy greater than 90%. Compared with other approaches, our algorithm is an accurate artificial intelligence approach which is fast in execution and easy in implementation. Current network protection systems use a collection of intelligent components - e.g. classifiers or rule-based firewall systems to detect intrusions and anomalies and to secure a network against viruses, worms, or trojans. However, these network systems rely on individuality and support an architecture with less collaborative work of the protection components. They give less administration support for maintenance, but offer a large number of individual single points of failures - an ideal situation for network attacks to succeed. In this work, we discuss the required features, the performance, and the problems of a distributed protection system called SANA. It consists of a cooperative architecture, it is motivated by the human immune system, where the components correspond to artificial immune cells that are connected for their collaborative work. SANA promises a better protection against intruders than common known protection systems through an adaptive self-management while keeping the resources efficiently by an intelligent reduction of redundant tasks. We introduce a library of several novel and common used protection components and evaluate the performance of SANA by a proof-of-concept implementation. Current network protection systems use a collection of intelligent components - e.g. classifiers or rule-based firewall systems to detect intrusions and anomalies and to secure a network against viruses, worms, or trojans. However, these network systems rely on individuality and support an architecture with less collaborative work of the protection components. They give less administration support for maintenance, but offer a large number of individual single points of failures - an ideal situation for network attacks to succeed. In this work, we discuss the required features, the performance, and the problems of a distributed protection system called {\it SANA}. It consists of a cooperative architecture, it is motivated by the human immune system, where the components correspond to artificial immune cells that are connected for their collaborative work. SANA promises a better protection against intruders than common known protection systems through an adaptive self-management while keeping the resources efficiently by an intelligent reduction of redundancies. We introduce a library of several novel and common used protection components and evaluate the performance of SANA by a proof-of-concept implementation. For medical volume visualization, one of the most important tasks is to reveal clinically relevant details from the 3D scan (CT, MRI ...), e.g. the coronary arteries, without obscuring them with less significant parts. These volume datasets contain different materials which are difficult to extract and visualize with 1D transfer functions based solely on the attenuation coefficient. Multi-dimensional transfer functions allow a much more precise classification of data which makes it easier to separate different surfaces from each other. Unfortunately, setting up multi-dimensional transfer functions can become a fairly complex task, generally accomplished by trial and error. This paper explains neural networks, and then presents an efficient way to speed up visualization process by semi-automatic transfer function generation. We describe how to use neural networks to detect distinctive features shown in the 2D histogram of the volume data and how to use this information for data classification. In this paper we extend Inagaki Weighted Operators fusion rule (WO) in information fusion by doing redistribution of not only the conflicting mass, but also of masses of non-empty intersections, that we call Double Weighted Operators (DWO). Then we propose a new fusion rule Class of Proportional Redistribution of Intersection Masses (CPRIM), which generates many interesting particular fusion rules in information fusion. Both formulas are presented for any number of sources of information. An application and comparison with other fusion rules are given in the last section. The notion of optimality naturally arises in many areas of applied mathematics and computer science concerned with decision making. Here we consider this notion in the context of two formalisms used for different purposes and in different research areas: graphical games and soft constraints. We relate the notion of optimality used in the area of soft constraint satisfaction problems (SCSPs) to that used in graphical games, showing that for a large class of SCSPs that includes weighted constraints every optimal solution corresponds to a Nash equilibrium that is also a Pareto efficient joint strategy. A lattice-theoretic framework is introduced that permits the study of the conditional independence (CI) implication problem relative to the class of discrete probability measures. Semi-lattices are associated with CI statements and a finite, sound and complete inference system relative to semi-lattice inclusions is presented. This system is shown to be (1) sound and complete for saturated CI statements, (2) complete for general CI statements, and (3) sound and complete for stable CI statements. These results yield a criterion that can be used to falsify instances of the implication problem and several heuristics are derived that approximate this "lattice-exclusion" criterion in polynomial time. Finally, we provide experimental results that relate our work to results obtained from other existing inference algorithms. It has previously been an open problem whether all Boolean submodular functions can be decomposed into a sum of binary submodular functions over a possibly larger set of variables. This problem has been considered within several different contexts in computer science, including computer vision, artificial intelligence, and pseudo-Boolean optimisation. Using a connection between the expressive power of valued constraints and certain algebraic properties of functions, we answer this question negatively. Our results have several corollaries. First, we characterise precisely which submodular functions of arity 4 can be expressed by binary submodular functions. Next, we identify a novel class of submodular functions of arbitrary arities which can be expressed by binary submodular functions, and therefore minimised efficiently using a so-called expressibility reduction to the Min-Cut problem. More importantly, our results imply limitations on this kind of reduction and establish for the first time that it cannot be used in general to minimise arbitrary submodular functions. Finally, we refute a conjecture of Promislow and Young on the structure of the extreme rays of the cone of Boolean submodular functions. Many AI researchers and cognitive scientists have argued that analogy is the core of cognition. The most influential work on computational modeling of analogy-making is Structure Mapping Theory (SMT) and its implementation in the Structure Mapping Engine (SME). A limitation of SME is the requirement for complex hand-coded representations. We introduce the Latent Relation Mapping Engine (LRME), which combines ideas from SME and Latent Relational Analysis (LRA) in order to remove the requirement for hand-coded representations. LRME builds analogical mappings between lists of words, using a large corpus of raw text to automatically discover the semantic relations among the words. We evaluate LRME on a set of twenty analogical mapping problems, ten based on scientific analogies and ten based on common metaphors. LRME achieves human-level performance on the twenty problems. We compare LRME with a variety of alternative approaches and find that they are not able to reach the same level of performance. All self-active living beings need to solve the motivational problem: The question what to do at any moment of their live. For humans and non-human animals at least two distinct layers of motivational drives are known, the primary needs for survival and the emotional drives leading to a wide range of sophisticated strategies, such as explorative learning and socializing. Part of the emotional layer of drives has universal facets, being beneficial in an extended range of environmental settings. Emotions are triggered in the brain by the release of neuromodulators, which are, at the same time, the agents for meta-learning. This intrinsic relation between emotions, meta-learning and universal action strategies suggests a central importance for emotional control for the design of artificial intelligences and synthetic cognitive systems. An implementation of this concept is proposed in terms of a dense and homogeneous associative network (dHan). The present work consisted in developing a plateau game. There are the traditional ones (monopoly, cluedo, ect.) but those which interest us leave less place at the chance (luck) than to the strategy such that the chess game. Kallah is an old African game, its rules are simple but the strategies to be used are very complex to implement. Of course, they are based on a strongly mathematical basis as in the film "Rain-Man" where one can see that gambling can be payed with strategies based on mathematical theories. The Artificial Intelligence gives the possibility "of thinking" to a machine and, therefore, allows it to make decisions. In our work, we use it to give the means to the computer choosing its best movement. When mathematicians present proofs they usually adapt their explanations to their didactic goals and to the (assumed) knowledge of their addressees. Modern automated theorem provers, in contrast, present proofs usually at a fixed level of detail (also called granularity). Often these presentations are neither intended nor suitable for human use. A challenge therefore is to develop user- and goal-adaptive proof presentation techniques that obey common mathematical practice. We present a flexible and adaptive approach to proof presentation that exploits machine learning techniques to extract a model of the specific granularity of proof examples and employs this model for the automated generation of further proofs at an adapted level of granularity. Constraint programming (CP) has been used with great success to tackle a wide variety of constraint satisfaction problems which are computationally intractable in general. Global constraints are one of the important factors behind the success of CP. In this paper, we study a new global constraint, the multiset ordering constraint, which is shown to be useful in symmetry breaking and searching for leximin optimal solutions in CP. We propose efficient and effective filtering algorithms for propagating this global constraint. We show that the algorithms are sound and complete and we discuss possible extensions. We also consider alternative propagation methods based on existing constraints in CP toolkits. Our experimental results on a number of benchmark problems demonstrate that propagating the multiset ordering constraint via a dedicated algorithm can be very beneficial. We argue that parameterized complexity is a useful tool with which to study global constraints. In particular, we show that many global constraints which are intractable to propagate completely have natural parameters which make them fixed-parameter tractable and which are easy to compute. This tractability tends either to be the result of a simple dynamic program or of a decomposition which has a strong backdoor of bounded size. This strong backdoor is often a cycle cutset. We also show that parameterized complexity can be used to study other aspects of constraint programming like symmetry breaking. For instance, we prove that value symmetry is fixed-parameter tractable to break in the number of symmetries. Finally, we argue that parameterized complexity can be used to derive results about the approximability of constraint propagation. A wide range of constraints can be compactly specified using automata or formal languages. In a sequence of recent papers, we have shown that an effective means to reason with such specifications is to decompose them into primitive constraints. We can then, for instance, use state of the art SAT solvers and profit from their advanced features like fast unit propagation, clause learning, and conflict-based search heuristics. This approach holds promise for solving combinatorial problems in scheduling, rostering, and configuration, as well as problems in more diverse areas like bioinformatics, software testing and natural language processing. In addition, decomposition may be an effective method to propagate other global constraints. We study the CardPath constraint. This ensures a given constraint holds a number of times down a sequence of variables. We show that SLIDE, a special case of CardPath where the slid constraint must hold always, can be used to encode a wide range of sliding sequence constraints including CardPath itself. We consider how to propagate SLIDE and provide a complete propagator for CardPath. Since propagation is NP-hard in general, we identify special cases where propagation takes polynomial time. Our experiments demonstrate that using SLIDE to encode global constraints can be as efficient and effective as specialised propagators. Successful management of emotional stimuli is a pivotal issue concerning Affective Computing (AC) and the related research. As a subfield of Artificial Intelligence, AC is concerned not only with the design of computer systems and the accompanying hardware that can recognize, interpret, and process human emotions, but also with the development of systems that can trigger human emotional response in an ordered and controlled manner. This requires the maximum attainable precision and efficiency in the extraction of data from emotionally annotated databases While these databases do use keywords or tags for description of the semantic content, they do not provide either the necessary flexibility or leverage needed to efficiently extract the pertinent emotional content. Therefore, to this extent we propose an introduction of ontologies as a new paradigm for description of emotionally annotated data. The ability to select and sequence data based on their semantic attributes is vital for any study involving metadata, semantics and ontological sorting like the Semantic Web or the Social Semantic Desktop, and the approach described in the paper facilitates reuse in these areas as well. At this point in time there is a need for a new representation of different information, to identify and organize descending its characteristics. Today, science is a powerful tool for the description of reality - the numbers. Why the most important property of numbers. Suppose we have a number 0.2351734, it is clear that the figures are there in order of importance. If necessary, we can round the number up to some value, eg 0.235. Arguably, the 0,235 - the most important information of 0.2351734. Thus, we can reduce the size of numbers is not losing much with the accuracy. Clearly, if learning to provide a graphical or audio information kernel, we can provide the most relevant information, discarding the rest. Introduction of various kinds of information in an information kernel, is an important task, to solve many problems in artificial intelligence and information theory. Voting is a simple mechanism to aggregate the preferences of agents. Many voting rules have been shown to be NP-hard to manipulate. However, a number of recent theoretical results suggest that this complexity may only be in the worst-case since manipulation is often easy in practice. In this paper, we show that empirical studies are useful in improving our understanding of this issue. We demonstrate that there is a smooth transition in the probability that a coalition can elect a desired candidate using the veto rule as the size of the manipulating coalition increases. We show that a rescaled probability curve displays a simple and universal form independent of the size of the problem. We argue that manipulation of the veto rule is asymptotically easy for many independent and identically distributed votes even when the coalition of manipulators is critical in size. Based on this argument, we identify a situation in which manipulation is computationally hard. This is when votes are highly correlated and the election is "hung". We show, however, that even a single uncorrelated voter is enough to make manipulation easy again. We show that tools from circuit complexity can be used to study decompositions of global constraints. In particular, we study decompositions of global constraints into conjunctive normal form with the property that unit propagation on the decomposition enforces the same level of consistency as a specialized propagation algorithm. We prove that a constraint propagator has a a polynomial size decomposition if and only if it can be computed by a polynomial size monotone Boolean circuit. Lower bounds on the size of monotone Boolean circuits thus translate to lower bounds on the size of decompositions of global constraints. For instance, we prove that there is no polynomial sized decomposition of the domain consistency propagator for the ALLDIFFERENT constraint. To model combinatorial decision problems involving uncertainty and probability, we extend the stochastic constraint programming framework proposed in [Walsh, 2002] along a number of important dimensions (e.g. to multiple chance constraints and to a range of new objectives). We also provide a new (but equivalent) semantics based on scenarios. Using this semantics, we can compile stochastic constraint programs down into conventional (nonstochastic) constraint programs. This allows us to exploit the full power of existing constraint solvers. We have implemented this framework for decision making under uncertainty in stochastic OPL, a language which is based on the OPL constraint modelling language [Hentenryck et al., 1999]. To illustrate the potential of this framework, we model a wide range of problems in areas as diverse as finance, agriculture and production. We present a theoretical analysis of Maximum a Posteriori (MAP) sequence estimation for binary symmetric hidden Markov processes. We reduce the MAP estimation to the energy minimization of an appropriately defined Ising spin model, and focus on the performance of MAP as characterized by its accuracy and the number of solutions corresponding to a typical observed sequence. It is shown that for a finite range of sufficiently low noise levels, the solution is uniquely related to the observed sequence, while the accuracy degrades linearly with increasing the noise strength. For intermediate noise values, the accuracy is nearly noise-independent, but now there are exponentially many solutions to the estimation problem, which is reflected in non-zero ground-state entropy for the Ising model. Finally, for even larger noise intensities, the number of solutions reduces again, but the accuracy is poor. It is shown that these regimes are different thermodynamic phases of the Ising model that are related to each other via first-order phase transitions. Surveillance control and reporting (SCR) system for air threats play an important role in the defense of a country. SCR system corresponds to air and ground situation management/processing along with information fusion, communication, coordination, simulation and other critical defense oriented tasks. Threat Evaluation and Weapon Assignment (TEWA) sits at the core of SCR system. In such a system, maximal or near maximal utilization of constrained resources is of extreme importance. Manual TEWA systems cannot provide optimality because of different limitations e.g.surface to air missile (SAM) can fire from a distance of 5Km, but manual TEWA systems are constrained by human vision range and other constraints. Current TEWA systems usually work on target-by-target basis using some type of greedy algorithm thus affecting the optimality of the solution and failing in multi-target scenario. his paper relates to a novel two-staged flexible dynamic decision support based optimal threat evaluation and weapon assignment algorithm for multi-target air-borne threats. A general approach describing quantum decision procedures is developed. The approach can be applied to quantum information processing, quantum computing, creation of artificial quantum intelligence, as well as to analyzing decision processes of human decision makers. Our basic point is to consider an active quantum system possessing its own strategic state. Processing information by such a system is analogous to the cognitive processes associated to decision making by humans. The algebra of probability operators, associated with the possible options available to the decision maker, plays the role of the algebra of observables in quantum theory of measurements. A scheme is advanced for a practical realization of decision procedures by thinking quantum systems. Such thinking quantum systems can be realized by using spin lattices, systems of magnetic molecules, cold atoms trapped in optical lattices, ensembles of quantum dots, or multilevel atomic systems interacting with electromagnetic field. This work presents a novel learning method in the context of embodied artificial intelligence and self-organization, which has as few assumptions and restrictions as possible about the world and the underlying model. The learning rule is derived from the principle of maximizing the predictive information in the sensorimotor loop. It is evaluated on robot chains of varying length with individually controlled, non-communicating segments. The comparison of the results shows that maximizing the predictive information per wheel leads to a higher coordinated behavior of the physically connected robots compared to a maximization per robot. Another focus of this paper is the analysis of the effect of the robot chain length on the overall behavior of the robots. It will be shown that longer chains with less capable controllers outperform those of shorter length and more complex controllers. The reason is found and discussed in the information-geometric interpretation of the learning process. This paper reviews the overview of the dynamic shortest path routing problem and the various neural networks to solve it. Different shortest path optimization problems can be solved by using various neural networks algorithms. The routing in packet switched multi-hop networks can be described as a classical combinatorial optimization problem i.e. a shortest path routing problem in graphs. The survey shows that the neural networks are the best candidates for the optimization of dynamic shortest path routing problems due to their fastness in computation comparing to other softcomputing and metaheuristics algorithms A central problem in artificial intelligence is that of planning to maximize future reward under uncertainty in a partially observable environment. In this paper we propose and demonstrate a novel algorithm which accurately learns a model of such an environment directly from sequences of action-observation pairs. We then close the loop from observations to actions by planning in the learned model and recovering a policy which is near-optimal in the original environment. Specifically, we present an efficient and statistically consistent spectral algorithm for learning the parameters of a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then perform approximate point-based planning in the learned PSR. Analysis of our results shows that the algorithm learns a state space which efficiently captures the essential features of the environment. This representation allows accurate prediction with a small number of parameters, and enables successful and efficient planning. Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources. In this philosophical paper, we explore computational and biological analogies to address the fine-tuning problem in cosmology. We first clarify what it means for physical constants or initial conditions to be fine-tuned. We review important distinctions such as the dimensionless and dimensional physical constants, and the classification of constants proposed by Levy-Leblond. Then we explore how two great analogies, computational and biological, can give new insights into our problem. This paper includes a preliminary study to examine the two analogies. Importantly, analogies are both useful and fundamental cognitive tools, but can also be misused or misinterpreted. The idea that our universe might be modelled as a computational entity is analysed, and we discuss the distinction between physical laws and initial conditions using algorithmic information theory. Smolin introduced the theory of "Cosmological Natural Selection" with a biological analogy in mind. We examine an extension of this analogy involving intelligent life. We discuss if and how this extension could be legitimated. Keywords: origin of the universe, fine-tuning, physical constants, initial conditions, computational universe, biological universe, role of intelligent life, cosmological natural selection, cosmological artificial selection, artificial cosmogenesis. We study propagation algorithms for the conjunction of two AllDifferent constraints. Solutions of an AllDifferent constraint can be seen as perfect matchings on the variable/value bipartite graph. Therefore, we investigate the problem of finding simultaneous bipartite matchings. We present an extension of the famous Hall theorem which characterizes when simultaneous bipartite matchings exists. Unfortunately, finding such matchings is NP-hard in general. However, we prove a surprising result that finding a simultaneous matching on a convex bipartite graph takes just polynomial time. Based on this theoretical result, we provide the first polynomial time bound consistency algorithm for the conjunction of two AllDifferent constraints. We identify a pathological problem on which this propagator is exponentially faster compared to existing propagators. Our experiments show that this new propagator can offer significant benefits over existing methods. As one of the newest members in the field of artificial immune systems (AIS), the Dendritic Cell Algorithm (DCA) is based on behavioural models of natural dendritic cells (DCs). Unlike other AIS, the DCA does not rely on training data, instead domain or expert knowledge is required to predetermine the mapping between input signals from a particular instance to the three categories used by the DCA. This data preprocessing phase has received the criticism of having manually over-?tted the data to the algorithm, which is undesirable. Therefore, in this paper we have attempted to ascertain if it is possible to use principal component analysis (PCA) techniques to automatically categorise input data while still generating useful and accurate classication results. The integrated system is tested with a biometrics dataset for the stress recognition of automobile drivers. The experimental results have shown the application of PCA to the DCA for the purpose of automated data preprocessing is successful. Luhmann (1984) defined society as a communication system which is structurally coupled to, but not an aggregate of, human action systems. The communication system is then considered as self-organizing ("autopoietic"), as are human actors. Communication systems can be studied by using Shannon's (1948) mathematical theory of communication. The update of a network by action at one of the local nodes is then a well-known problem in artificial intelligence (Pearl 1988). By combining these various theories, a general algorithm for probabilistic structure/action contingency can be derived. The consequences of this contingency for each system, its consequences for their further histories, and the stabilization on each side by counterbalancing mechanisms are discussed, in both mathematical and theoretical terms. An empirical example is elaborated. We present tropical games, a generalization of combinatorial min-max games based on tropical algebras. Our model breaks the traditional symmetry of rational zero-sum games where players have exactly opposed goals (min vs. max), is more widely applicable than min-max and also supports a form of pruning, despite it being less effective than alpha-beta. Actually, min-max games may be seen as particular cases where both the game and its dual are tropical: when the dual of a tropical game is also tropical, the power of alpha-beta is completely recovered. We formally develop the model and prove that the tropical pruning strategy is correct, then conclude by showing how the problem of approximated parsing can be modeled as a tropical game, profiting from pruning. Voting is a simple mechanism to combine together the preferences of multiple agents. Agents may try to manipulate the result of voting by mis-reporting their preferences. One barrier that might exist to such manipulation is computational complexity. In particular, it has been shown that it is NP-hard to compute how to manipulate a number of different voting rules. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. In this paper, we study empirically the manipulability of single transferable voting (STV) to determine if computational complexity is really a barrier to manipulation. STV was one of the first voting rules shown to be NP-hard. It also appears one of the harder voting rules to manipulate. We sample a number of distributions of votes including uniform and real world elections. In almost every election in our experiments, it was easy to compute how a single agent could manipulate the election or to prove that manipulation by a single agent was impossible. Symmetry is an important feature of many constraint programs. We show that any problem symmetry acting on a set of symmetry breaking constraints can be used to break symmetry. Different symmetries pick out different solutions in each symmetry class. This simple but powerful idea can be used in a number of different ways. We describe one application within model restarts, a search technique designed to reduce the conflict between symmetry breaking and the branching heuristic. In model restarts, we restart search periodically with a random symmetry of the symmetry breaking constraints. Experimental results show that this symmetry breaking technique is effective in practice on some standard benchmark problems. In May 1, 2008, researchers at Hewlett Packard (HP) announced the first physical realization of a fundamental circuit element called memristor that attracted so much interest worldwide. This newly found element can easily be combined with crossbar interconnect technology which this new structure has opened a new field in designing configurable or programmable electronic systems. These systems in return can have applications in signal processing and artificial intelligence. In this paper, based on the simple memristor crossbar structure, we propose new and simple circuits for hardware implementation of fuzzy membership functions. In our proposed circuits, these fuzzy membership functions can have any shapes and resolutions. In addition, these circuits can be used as a basis in the construction of evolutionary systems. Text classification is the process of classifying documents into predefined categories based on their content. It is the automated assignment of natural language texts to predefined categories. Text classification is the primary requirement of text retrieval systems, which retrieve texts in response to a user query, and text understanding systems, which transform text in some way such as producing summaries, answering questions or extracting data. Existing supervised learning algorithms for classifying text need sufficient documents to learn accurately. This paper presents a new algorithm for text classification using artificial intelligence technique that requires fewer documents for training. Instead of using words, word relation i.e. association rules from these words is used to derive feature set from pre-classified text documents. The concept of na\"ive Bayes classifier is then used on derived features and finally only a single concept of genetic algorithm has been added for final classification. A system based on the proposed algorithm has been implemented and tested. The experimental results show that the proposed system works as a successful text classifier. DCSP (Distributed Constraint Satisfaction Problem) has been a very important research area in AI (Artificial Intelligence). There are many application problems in distributed AI that can be formalized as DSCPs. With the increasing complexity and problem size of the application problems in AI, the required storage place in searching and the average searching time are increasing too. Thus, to use a limited storage place efficiently in solving DCSP becomes a very important problem, and it can help to reduce searching time as well. This paper provides an efficient knowledge base management approach based on general usage of hyper-resolution-rule in consistence algorithm. The approach minimizes the increasing of the knowledge base by eliminate sufficient constraint and false nogood. These eliminations do not change the completeness of the original knowledge base increased. The proofs are given as well. The example shows that this approach decrease both the new nogoods generated and the knowledge base greatly. Thus it decreases the required storage place and simplify the searching process. The paper proposes artificial intelligence technique called hill climbing to find numerical solutions of Diophantine Equations. Such equations are important as they have many applications in fields like public key cryptography, integer factorization, algebraic curves, projective curves and data dependency in super computers. Importantly, it has been proved that there is no general method to find solutions of such equations. This paper is an attempt to find numerical solutions of Diophantine equations using steepest ascent version of Hill Climbing. The method, which uses tree representation to depict possible solutions of Diophantine equations, adopts a novel methodology to generate successors. The heuristic function used help to make the process of finding solution as a minimization process. The work illustrates the effectiveness of the proposed methodology using a class of Diophantine equations given by a1. x1 p1 + a2. x2 p2 + ...... + an . xn pn = N where ai and N are integers. The experimental results validate that the procedure proposed is successful in finding solutions of Diophantine Equations with sufficiently large powers and large number of variables. Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. This leads to poor performance in theory and often in practice. Some recent approaches provide stronger guarantees in this setting, but remain somewhat unsatisfactory as they train either non-stationary or stochastic policies and require a large number of iterations. In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings. We demonstrate that this new approach outperforms previous approaches on two challenging imitation learning problems and a benchmark sequence labeling problem. Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus our attention on the class of planar Ising models, for which inference is tractable using techniques of statistical physics [Kac and Ward; Kasteleyn]. Based on these techniques and recent methods for planarity testing and planar embedding [Chrobak and Payne], we propose a simple greedy algorithm for learning the best planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. We demonstrate our method in some simulations and for the application of modeling senate voting records. "Encoded in the large, highly evolved sensory and motor portions of the human brain is a billion years of experience about the nature of the world and how to survive in it. The deliberate process we call reasoning is, I believe, the thinnest veneer of human thought, effective only because it is supported by this much older and much powerful, though usually unconscious, sensor motor knowledge. We are all prodigious Olympians in perceptual and motor areas, so good that we make the difficult look easy. Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet mastered it. It is not all that intrinsically difficult; it just seems so when we do it."- Hans Moravec Moravec's paradox is involved with the fact that it is the seemingly easier day to day problems that are harder to implement in a machine, than the seemingly complicated logic based problems of today. The results prove that most artificially intelligent machines are as adept if not more than us at under-taking long calculations or even play chess, but their logic brings them nowhere when it comes to carrying out everyday tasks like walking, facial gesture recognition or speech recognition. Case Based Reasoning and particularly Estimation by Analogy, has been used in a number of problem-solving areas, such as cost estimation. Conventional methods, despite the lack of a sound criterion for choosing nearest projects, were based on estimation using a fixed and predetermined number of neighbors from the entire set of historical instances. This approach puts boundaries to the estimation ability of such algorithms, for they do not take into consideration that every project under estimation is unique and requires different handling. The notion of distributions of distances together with a distance metric for distributions help us to adapt the proposed method (we call it DD-EbA) each time to a specific case that is to be estimated without loosing in prediction power or computational cost. The results of this paper show that the proposed technique achieves the above idea in a very efficient way. Current work in planning with preferences assume that the user's preference models are completely specified and aim to search for a single solution plan. In many real-world planning scenarios, however, the user probably cannot provide any information about her desired plans, or in some cases can only express partial preferences. In such situations, the planner has to present not only one but a set of plans to the user, with the hope that some of them are similar to the plan she prefers. We first propose the usage of different measures to capture quality of plan sets that are suitable for such scenarios: domain-independent distance measures defined based on plan elements (actions, states, causal links) if no knowledge of the user's preferences is given, and the Integrated Convex Preference measure in case the user's partial preference is provided. We then investigate various heuristic approaches to find set of plans according to these measures, and present empirical results demonstrating the promise of our approach. Software testing is an expensive process, which is vital in the industry. Construction of the test-data in software testing requires the major cost and to decide which method to use in order to generate the test data is important. This paper discusses the efficiency of search-based algorithms (preferably genetic algorithm) versus random testing, in soft- ware test-data generation. This study differs from all previous studies due to sample programs (SUTs) which are used. Since we want to in- crease the complexity of SUTs gradually, and the program generation is automatic as well, Grammatical Evolution is used to guide the program generation. SUTs are generated according to the grammar we provide, with different levels of complexity. SUTs will first undergo genetic al- gorithm and then random testing. Based on the test results, this paper recommends one method to use for automation of software testing. The context of a software developer is something hard to define and capture, as it represents a complex network of elements across different dimensions that are not limited to the work developed on an IDE. We propose the definition of a software developer context model that takes into account all the dimensions that characterize the work environment of the developer. We are especially focused on what the software developer context encompasses at the project level and how it can be captured. The experimental work done so far show that useful context information can be extracted from project management tools. The extraction, analysis and availability of this context information can be used to enrich the work environment of the developer with additional knowledge to support her/his work. Meaning negotiation (MN) is the general process with which agents reach an agreement about the meaning of a set of terms. Artificial Intelligence scholars have dealt with the problem of MN by means of argumentations schemes, beliefs merging and information fusion operators, and ontology alignment but the proposed approaches depend upon the number of participants. In this paper, we give a general model of MN for an arbitrary number of agents, in which each participant discusses with the others her viewpoint by exhibiting it in an actual set of constraints on the meaning of the negotiated terms. We call this presentation of individual viewpoints an angle. The agents do not aim at forming a common viewpoint but, instead, at agreeing about an acceptable common angle. We analyze separately the process of MN by two agents (\emph{bilateral} or \emph{pairwise} MN) and by more than two agents (\emph{multiparty} MN), and we use game theoretic models to understand how the process develops in both cases: the models are Bargaining Game for bilateral MN and English Auction for multiparty MN. We formalize the process of reaching such an agreement by giving a deduction system that comprises of rules that are consistent and adequate for representing MN. The immune system is a highly parallel and distributed intelligent system which has learning, memory, and associative capabilities. Artificial Immune System is an evolutionary paradigm inspired by the biological aspects of the immune system of mammals. The immune system can inspire to form new algorithms learning from its course of action. The human immune system has motivated scientists and engineers for finding powerful information processing algorithms that has solved complex engineering problems. This work is the result of an attempt to explore a different perspective of the immune system namely the Immune Privileged Site (IPS) which has the ability to make an exception to different parts of the body by not triggering immune response to some of the foreign agent in these parts of the body. While the complete system is secured by an Immune System at certain times it may be required that the system allows certain activities which may be harmful to other system which is useful to it and learns over a period of time through the immune privilege model as done in case of Immune Privilege Sites in Natural Immune System. In Artificial Intelligence with Coalition Structure Generation (CSG) one refers to those cooperative complex problems that require to find an optimal partition, maximising a social welfare, of a set of entities involved in a system into exhaustive and disjoint coalitions. The solution of the CSG problem finds applications in many fields such as Machine Learning (covering machines, clustering), Data Mining (decision tree, discretization), Graph Theory, Natural Language Processing (aggregation), Semantic Web (service composition), and Bioinformatics. The problem of finding the optimal coalition structure is NP-complete. In this paper we present a greedy adaptive search procedure (GRASP) with path-relinking to efficiently search the space of coalition structures. Experiments and comparisons to other algorithms prove the validity of the proposed method in solving this hard combinatorial problem. In a minimal binary constraint network, every tuple of a constraint relation can be extended to a solution. The tractability or intractability of computing a solution to such a minimal network was a long standing open question. Dechter conjectured this computation problem to be NP-hard. We prove this conjecture. We also prove a conjecture by Dechter and Pearl stating that for k\geq2 it is NP-hard to decide whether a single constraint can be decomposed into an equivalent k-ary constraint network. We show that this holds even in case of bi-valued constraints where k\geq3, which proves another conjecture of Dechter and Pearl. Finally, we establish the tractability frontier for this problem with respect to the domain cardinality and the parameter k. Some recent works in conditional planning have proposed reachability heuristics to improve planner scalability, but many lack a formal description of the properties of their distance estimates. To place previous work in context and extend work on heuristics for conditional planning, we provide a formal basis for distance estimates between belief states. We give a definition for the distance between belief states that relies on aggregating underlying state distance measures. We give several techniques to aggregate state distances and their associated properties. Many existing heuristics exhibit a subset of the properties, but in order to provide a standardized comparison we present several generalizations of planning graph heuristics that are used in a single planner. We compliment our belief state distance estimate framework by also investigating efficient planning graph data structures that incorporate BDDs to compute the most effective heuristics. We developed two planners to serve as test-beds for our investigation. The first, CAltAlt, is a conformant regression planner that uses A* search. The second, POND, is a conditional progression planner that uses AO* search. We show the relative effectiveness of our heuristic techniques within these planners. We also compare the performance of these planners with several state of the art approaches in conditional planning. Many fundamental problems in artificial intelligence, knowledge representation, and verification involve reasoning about sets and relations between sets and can be modeled as set constraint satisfaction problems (set CSPs). Such problems are frequently intractable, but there are several important set CSPs that are known to be polynomial-time tractable. We introduce a large class of set CSPs that can be solved in quadratic time. Our class, which we call EI, contains all previously known tractable set CSPs, but also some new ones that are of crucial importance for example in description logics. The class of EI set constraints has an elegant universal-algebraic characterization, which we use to show that every set constraint language that properly contains all EI set constraints already has a finite sublanguage with an NP-hard constraint satisfaction problem. As was shown recently, many important AI problems require counting the number of models of propositional formulas. The problem of counting models of such formulas is, according to present knowledge, computationally intractable in a worst case. Based on the Davis-Putnam procedure, we present an algorithm, CDP, that computes the exact number of models of a propositional CNF or DNF formula F. Let m and n be the number of clauses and variables of F, respectively, and let p denote the probability that a literal l of F occurs in a clause C of F, then the average running time of CDP is shown to be O(nm^d), where d=-1/log(1-p). The practical performance of CDP has been estimated in a series of experiments on a wide variety of CNF formulas. This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classifiers that serve as noise filters for the training data. We evaluate single algorithm, majority vote and consensus filters on five datasets that are prone to labeling errors. Our experiments illustrate that filtering significantly improves classification accuracy for noise levels up to 30 percent. An analytical and empirical evaluation of the precision of our approach shows that consensus filters are conservative at throwing away good data at the expense of retaining bad data and that majority filters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus filters are preferable, whereas majority vote filters are preferable for situations with an abundance of data. Localization, that is the estimation of a robot's location from sensor data, is a fundamental problem in mobile robotics. This papers presents a version of Markov localization which provides accurate position estimates and which is tailored towards dynamic environments. The key idea of Markov localization is to maintain a probability density over the space of all locations of a robot in its environment. Our approach represents this space metrically, using a fine-grained grid to approximate densities. It is able to globally localize the robot from scratch and to recover from localization failures. It is robust to approximate models of the environment (such as occupancy grid maps) and noisy sensors (such as ultrasound sensors). Our approach also includes a filtering technique which allows a mobile robot to reliably estimate its position even in densely populated environments in which crowds of people block the robot's sensors for extended periods of time. The method described here has been implemented and tested in several real-world applications of mobile robots, including the deployments of two mobile robots as interactive museum tour-guides. Multi-Agent Systems (MAS) promise to offer solutions to problems where established, older paradigms fall short. In order to validate such claims that are repeatedly made in software agent publications, empirical in-depth studies of advantages and weaknesses of multi-agent solutions versus conventional ones in practical applications are needed. Climate control in large buildings is one application area where multi-agent systems, and market-oriented programming in particular, have been reported to be very successful, although central control solutions are still the standard practice. We have therefore constructed and implemented a variety of market designs for this problem, as well as different standard control engineering solutions. This article gives a detailed analysis and comparison, so as to learn about differences between standard versus agent approaches, and yielding new insights about benefits and limitations of computational markets. An important outcome is that "local information plus market communication produces global control". We show how to find a minimum weight loop cutset in a Bayesian network with high probability. Finding such a loop cutset is the first step in the method of conditioning for inference. Our randomized algorithm for finding a loop cutset outputs a minimum loop cutset after O(c 6^k kn) steps with probability at least 1 - (1 - 1/(6^k))^c6^k, where c > 1 is a constant specified by the user, k is the minimal size of a minimum weight loop cutset, and n is the number of vertices. We also show empirically that a variant of this algorithm often finds a loop cutset that is closer to the minimum weight loop cutset than the ones found by the best deterministic algorithms known. We investigate the space efficiency of a Propositional Knowledge Representation (PKR) formalism. Intuitively, the space efficiency of a formalism F in representing a certain piece of knowledge A, is the size of the shortest formula of F that represents A. In this paper we assume that knowledge is either a set of propositional interpretations (models) or a set of propositional formulae (theorems). We provide a formal way of talking about the relative ability of PKR formalisms to compactly represent a set of models or a set of theorems. We introduce two new compactness measures, the corresponding classes, and show that the relative space efficiency of a PKR formalism in representing models/theorems is directly related to such classes. In particular, we consider formalisms for nonmonotonic reasoning, such as circumscription and default logic, as well as belief revision operators and the stable model semantics for logic programs with negation. One interesting result is that formalisms with the same time complexity do not necessarily belong to the same space efficiency class. Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain. Agents in dynamic multi-agent environments must monitor their peers to execute individual and group plans. A key open question is how much monitoring of other agents' states is required to be effective: The Monitoring Selectivity Problem. We investigate this question in the context of detecting failures in teams of cooperating agents, via Socially-Attentive Monitoring, which focuses on monitoring for failures in the social relationships between the agents. We empirically and analytically explore a family of socially-attentive teamwork monitoring algorithms in two dynamic, complex, multi-agent domains, under varying conditions of task distribution and uncertainty. We show that a centralized scheme using a complex algorithm trades correctness for completeness and requires monitoring all teammates. In contrast, a simple distributed teamwork monitoring algorithm results in correct and complete detection of teamwork failures, despite relying on limited, uncertain knowledge, and monitoring only key agents in a team. In addition, we report on the design of a socially-attentive monitoring system and demonstrate its generality in monitoring several coordination relationships, diagnosing detected failures, and both on-line and off-line applications. Functional relationships between objects, called `attributes', are of considerable importance in knowledge representation languages, including Description Logics (DLs). A study of the literature indicates that papers have made, often implicitly, different assumptions about the nature of attributes: whether they are always required to have a value, or whether they can be partial functions. The work presented here is the first explicit study of this difference for subclasses of the CLASSIC DL, involving the same-as concept constructor. It is shown that although determining subsumption between concept descriptions has the same complexity (though requiring different algorithms), the story is different in the case of determining the least common subsumer (lcs). For attributes interpreted as partial functions, the lcs exists and can be computed relatively easily; even in this case our results correct and extend three previous papers about the lcs of DLs. In the case where attributes must have a value, the lcs may not exist, and even if it exists it may be of exponential size. Interestingly, it is possible to decide in polynomial time if the lcs exists. We study the complexity of the combination of the Description Logics ALCQ and ALCQI with a terminological formalism based on cardinality restrictions on concepts. These combinations can naturally be embedded into C^2, the two variable fragment of predicate logic with counting quantifiers, which yields decidability in NExpTime. We show that this approach leads to an optimal solution for ALCQI, as ALCQI with cardinality restrictions has the same complexity as C^2 (NExpTime-complete). In contrast, we show that for ALCQ, the problem can be solved in ExpTime. This result is obtained by a reduction of reasoning with cardinality restrictions to reasoning with the (in general weaker) terminological formalism of general axioms for ALCQ extended with nominals. Using the same reduction, we show that, for the extension of ALCQI with nominals, reasoning with general axioms is a NExpTime-complete problem. Finally, we sharpen this result and show that pure concept satisfiability for ALCQI with nominals is NExpTime-complete. Without nominals, this problem is known to be PSpace-complete. This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is based on a combination of reinforcement learning and performance modeling of spoken dialogue systems. The reinforcement learning component applies Q-learning (Watkins, 1989), while the performance modeling component applies the PARADISE evaluation framework (Walker et al., 1997) to learn the performance function (reward) used in reinforcement learning. We illustrate the method with a spoken dialogue system named ELVIS (EmaiL Voice Interactive System), that supports access to email over the phone. We conduct a set of experiments for training an optimal dialogue strategy on a corpus of 219 dialogues in which human users interact with ELVIS over the phone. We then test that strategy on a corpus of 18 dialogues. We show that ELVIS can learn to optimize its strategy selection for agent initiative, for reading messages, and for summarizing email folders. The goal of this research is to develop agents that are adaptive and predictable and timely. At first blush, these three requirements seem contradictory. For example, adaptation risks introducing undesirable side effects, thereby making agents' behavior less predictable. Furthermore, although formal verification can assist in ensuring behavioral predictability, it is known to be time-consuming. Our solution to the challenge of satisfying all three requirements is the following. Agents have finite-state automaton plans, which are adapted online via evolutionary learning (perturbation) operators. To ensure that critical behavioral constraints are always satisfied, agents' plans are first formally verified. They are then reverified after every adaptation. If reverification concludes that constraints are violated, the plans are repaired. The main objective of this paper is to improve the efficiency of reverification after learning, so that agents have a sufficiently rapid response time. We present two solutions: positive results that certain learning operators are a priori guaranteed to preserve useful classes of behavioral assurance constraints (which implies that no reverification is needed for these operators), and efficient incremental reverification algorithms for those learning operators that have negative a priori results. A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task. The recent approaches of extending the GRAPHPLAN algorithm to handle more expressive planning formalisms raise the question of what the formal meaning of "expressive power" is. We formalize the intuition that expressive power is a measure of how concisely planning domains and plans can be expressed in a particular formalism by introducing the notion of "compilation schemes" between planning formalisms. Using this notion, we analyze the expressiveness of a large family of propositional planning formalisms, ranging from basic STRIPS to a formalism with conditional effects, partial state specifications, and propositional formulae in the preconditions. One of the results is that conditional effects cannot be compiled away if plan size should grow only linearly but can be compiled away if we allow for polynomial growth of the resulting plans. This result confirms that the recently proposed extensions to the GRAPHPLAN algorithm concerning conditional effects are optimal with respect to the "compilability" framework. Another result is that general propositional formulae cannot be compiled into conditional effects if the plan size should be preserved linearly. This implies that allowing general propositional formulae in preconditions and effect conditions adds another level of difficulty in generating a plan. In order to generate plans for agents with multiple actuators, agent teams, or distributed controllers, we must be able to represent and plan using concurrent actions with interacting effects. This has historically been considered a challenging task requiring a temporal planner with the ability to reason explicitly about time. We show that with simple modifications, the STRIPS action representation language can be used to represent interacting actions. Moreover, algorithms for partial-order planning require only small modifications in order to be applied in such multiagent domains. We demonstrate this fact by developing a sound and complete partial-order planner for planning with concurrent interacting actions, POMP, that extends existing partial-order planners in a straightforward way. These results open the way to the use of partial-order planners for the centralized control of cooperative multiagent systems. This paper presents an implemented system for recognizing the occurrence of events described by simple spatial-motion verbs in short image sequences. The semantics of these verbs is specified with event-logic expressions that describe changes in the state of force-dynamic relations between the participants of the event. An efficient finite representation is introduced for the infinite sets of intervals that occur when describing liquid and semi-liquid events. Additionally, an efficient procedure using this representation is presented for inferring occurrences of compound events, described with event-logic expressions, from occurrences of primitive events. Using force dynamics and event logic to specify the lexical semantics of events allows the system to be more robust than prior systems based on motion profile. This paper presents GRT, a domain-independent heuristic planning system for STRIPS worlds. GRT solves problems in two phases. In the pre-processing phase, it estimates the distance between each fact and the goals of the problem, in a backward direction. Then, in the search phase, these estimates are used in order to further estimate the distance between each intermediate state and the goals, guiding so the search process in a forward direction and on a best-first basis. The paper presents the benefits from the adoption of opposite directions between the preprocessing and the search phases, discusses some difficulties that arise in the pre-processing phase and introduces techniques to cope with them. Moreover, it presents several methods of improving the efficiency of the heuristic, by enriching the representation and by reducing the size of the problem. Finally, a method of overcoming local optimal states, based on domain axioms, is proposed. According to it, difficult problems are decomposed into easier sub-problems that have to be solved sequentially. The performance results from various domains, including those of the recent planning competitions, show that GRT is among the fastest planners. In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter and Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter and Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems. Description Logics (DLs) are suitable, well-known, logics for managing structured knowledge. They allow reasoning about individuals and well defined concepts, i.e., set of individuals with common properties. The experience in using DLs in applications has shown that in many cases we would like to extend their capabilities. In particular, their use in the context of Multimedia Information Retrieval (MIR) leads to the convincement that such DLs should allow the treatment of the inherent imprecision in multimedia object content representation and retrieval. In this paper we will present a fuzzy extension of ALC, combining Zadeh's fuzzy logic with a classical DL. In particular, concepts becomes fuzzy and, thus, reasoning about imprecise concepts is supported. We will define its syntax, its semantics, describe its properties and present a constraint propagation calculus for reasoning in it. This paper investigates the problems arising in the construction of a program to play the game of contract bridge. These problems include both the difficulty of solving the game's perfect information variant, and techniques needed to address the fact that bridge is not, in fact, a perfect information game. GIB, the program being described, involves five separate technical advances: partition search, the practical application of Monte Carlo techniques to realistic problems, a focus on achievable sets to solve problems inherent in the Monte Carlo approach, an extension of alpha-beta pruning from total orders to arbitrary distributive lattices, and the use of squeaky wheel optimization to find approximately optimal solutions to cardplay problems. GIB is currently believed to be of approximately expert caliber, and is currently the strongest computer bridge program in the world. Enforcing local consistencies is one of the main features of constraint reasoning. Which level of local consistency should be used when searching for solutions in a constraint network is a basic question. Arc consistency and partial forms of arc consistency have been widely studied, and have been known for sometime through the forward checking or the MAC search algorithms. Until recently, stronger forms of local consistency remained limited to those that change the structure of the constraint graph, and thus, could not be used in practice, especially on large networks. This paper focuses on the local consistencies that are stronger than arc consistency, without changing the structure of the network, i.e., only removing inconsistent values from the domains. In the last five years, several such local consistencies have been proposed by us or by others. We make an overview of all of them, and highlight some relations between them. We compare them both theoretically and experimentally, considering their pruning efficiency and the time required to enforce them. We describe and evaluate the algorithmic techniques that are used in the FF planning system. Like the HSP system, FF relies on forward state space search, using a heuristic that estimates goal distances by ignoring delete lists. Unlike HSP's heuristic, our method does not assume facts to be independent. We introduce a novel search strategy that combines hill-climbing with systematic search, and we show how other powerful heuristic information can be extracted and used to prune the search space. FF was the most successful automatic planner at the recent AIPS-2000 planning competition. We review the results of the competition, give data for other benchmark domains, and investigate the reasons for the runtime performance of FF compared to HSP. Hidden Markov models (HMMs) and partially observable Markov decision processes (POMDPs) provide useful tools for modeling dynamical systems. They are particularly useful for representing the topology of environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information and geometrical constraints into both the models and the algorithm that learns them. By taking advantage of such information, learning HMMs/POMDPs can be made to generate better solutions and require fewer iterations, while being robust in the face of data reduction. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach. Many widely studied graphical models with latent variables lead to nontrivial constraints on the distribution of the observed variables. Inspired by the Bell inequalities in quantum mechanics, we refer to any linear inequality whose violation rules out some latent variable model as a "hidden variable test" for that model. Our main contribution is to introduce a sequence of relaxations which provides progressively tighter hidden variable tests. We demonstrate applicability to mixtures of sequences of i.i.d. variables, Bell inequalities, and homophily models in social networks. For the last, we demonstrate that our method provides a test that is able to rule out latent homophily as the sole explanation for correlations on a real social network that are known to be due to influence. This paper discusses a system that accelerates reinforcement learning by using transfer from related tasks. Without such transfer, even if two tasks are very similar at some abstract level, an extensive re-learning effort is required. The system achieves much of its power by transferring parts of previously learned solutions rather than a single complete solution. The system exploits strong features in the multi-dimensional function produced by reinforcement learning in solving a particular task. These features are stable and easy to recognize early in the learning process. They generate a partitioning of the state space and thus the function. The partition is represented as a graph. This is used to index and compose functions stored in a case base to form a close approximation to the solution of the new task. Experiments demonstrate that function composition often produces more than an order of magnitude increase in learning rate compared to a basic reinforcement learning algorithm. We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm. Simple conceptual graphs are considered as the kernel of most knowledge representation formalisms built upon Sowa's model. Reasoning in this model can be expressed by a graph homomorphism called projection, whose semantics is usually given in terms of positive, conjunctive, existential FOL. We present here a family of extensions of this model, based on rules and constraints, keeping graph homomorphism as the basic operation. We focus on the formal definitions of the different models obtained, including their operational semantics and relationships with FOL, and we analyze the decidability and complexity of the associated problems (consistency and deduction). As soon as rules are involved in reasonings, these problems are not decidable, but we exhibit a condition under which they fall in the polynomial hierarchy. These results extend and complete the ones already published by the authors. Moreover we systematically study the complexity of some particular cases obtained by restricting the form of constraints and/or rules. Fusions are a simple way of combining logics. For normal modal logics, fusions have been investigated in detail. In particular, it is known that, under certain conditions, decidability transfers from the component logics to their fusion. Though description logics are closely related to modal logics, they are not necessarily normal. In addition, ABox reasoning in description logics is not covered by the results from modal logics. In this paper, we extend the decidability transfer results from normal modal logics to a large class of description logics. To cover different description logics in a uniform way, we introduce abstract description systems, which can be seen as a common generalization of description and modal logics, and show the transfer results in this general setting. Common wisdom has it that small distinctions in the probabilities (parameters) quantifying a belief network do not matter much for the results of probabilistic queries. Yet, one can develop realistic scenarios under which small variations in network parameters can lead to significant changes in computed queries. A pending theoretical question is then to analytically characterize parameter changes that do or do not matter. In this paper, we study the sensitivity of probabilistic queries to changes in network parameters and prove some tight bounds on the impact that such parameters can have on queries. Our analytic results pinpoint some interesting situations under which parameter changes do or do not matter. These results are important for knowledge engineers as they help them identify influential network parameters. They also help explain some of the previous experimental results and observations with regards to network robustness against parameter changes. Spoken dialogue systems promise efficient and natural access to a large variety of information sources and services from any phone. However, current spoken dialogue systems are deficient in their strategies for preventing, identifying and repairing problems that arise in the conversation. This paper reports results on automatically training a Problematic Dialogue Predictor to predict problematic human-computer dialogues using a corpus of 4692 dialogues collected with the 'How May I Help You' (SM) spoken dialogue system. The Problematic Dialogue Predictor can be immediately applied to the system's decision of whether to transfer the call to a human customer care agent, or be used as a cue to the system's dialogue manager to modify its behavior to repair problems, and even perhaps, to prevent them. We show that a Problematic Dialogue Predictor using automatically-obtainable features from the first two exchanges in the dialogue can predict problematic dialogues 13.2% more accurately than the baseline. We propose a perspective on knowledge compilation which calls for analyzing different compilation approaches according to two key dimensions: the succinctness of the target compilation language, and the class of queries and transformations that the language supports in polytime. We then provide a knowledge compilation map, which analyzes a large number of existing target compilation languages according to their succinctness and their polytime transformations and queries. We argue that such analysis is necessary for placing new compilation approaches within the context of existing ones. We also go beyond classical, flat target compilation languages based on CNF and DNF, and consider a richer, nested class based on directed acyclic graphs (such as OBDDs), which we show to include a relatively large number of target compilation languages. The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies. Prediction markets show considerable promise for developing flexible mechanisms for machine learning. Here, machine learning markets for multivariate systems are defined, and a utility-based framework is established for their analysis. This differs from the usual approach of defining static betting functions. It is shown that such markets can implement model combination methods used in machine learning, such as product of expert and mixture of expert approaches as equilibrium pricing models, by varying agent utility functions. They can also implement models composed of local potentials, and message passing methods. Prediction markets also allow for more flexible combinations, by combining multiple different utility functions. Conversely, the market mechanisms implement inference in the relevant probabilistic models. This means that market mechanism can be utilized for implementing parallelized model building and inference for probabilistic modelling. We develop, analyze, and evaluate a novel, supervised, specific-to-general learner for a simple temporal logic and use the resulting algorithm to learn visual event definitions from video sequences. First, we introduce a simple, propositional, temporal, event-description language called AMA that is sufficiently expressive to represent many events yet sufficiently restrictive to support learning. We then give algorithms, along with lower and upper complexity bounds, for the subsumption and generalization problems for AMA formulas. We present a positive-examples--only specific-to-general learning method based on these algorithms. We also present a polynomial-time--computable ``syntactic'' subsumption test that implies semantic subsumption without being equivalent to it. A generalization algorithm based on syntactic subsumption can be used in place of semantic generalization to improve the asymptotic complexity of the resulting learning algorithm. Finally, we apply this algorithm to the task of learning relational event definitions from video and show that it yields definitions that are competitive with hand-coded ones. In this paper, we analyze the decision version of the NK landscape model from the perspective of threshold phenomena and phase transitions under two random distributions, the uniform probability model and the fixed ratio model. For the uniform probability model, we prove that the phase transition is easy in the sense that there is a polynomial algorithm that can solve a random instance of the problem with the probability asymptotic to 1 as the problem size tends to infinity. For the fixed ratio model, we establish several upper bounds for the solubility threshold, and prove that random instances with parameters above these upper bounds can be solved polynomially. This, together with our empirical study for random instances generated below and in the phase transition region, suggests that the phase transition of the fixed ratio model is also easy. Independence -- the study of what is relevant to a given problem of reasoning -- has received an increasing attention from the AI community. In this paper, we consider two basic forms of independence, namely, a syntactic one and a semantic one. We show features and drawbacks of them. In particular, while the syntactic form of independence is computationally easy to check, there are cases in which things that intuitively are not relevant are not recognized as such. We also consider the problem of forgetting, i.e., distilling from a knowledge base only the part that is relevant to the set of queries constructed from a subset of the alphabet. While such process is computationally hard, it allows for a simplification of subsequent reasoning, and can thus be viewed as a form of compilation: once the relevant part of a knowledge base has been extracted, all reasoning tasks to be performed can be simplified. This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference chains, and translate Spanish zero pronouns into English---issues hardly considered by other systems. The paper presents the resolution and evaluation of these anaphora problems in AGIR with the use of different kinds of knowledge (lexical, morphological, syntactic, and semantic). The translation of English and Spanish anaphoric third-person personal pronouns (including Spanish zero pronouns) into the target language has been evaluated on unrestricted corpora. We have obtained a precision of 80.4% and 84.8% in the translation of Spanish and English pronouns, respectively. Although we have only studied the Spanish and English languages, our approach can be easily extended to other languages such as Portuguese, Italian, or Japanese. We present a probabilistic generative model for timing deviations in expressive music performance. The structure of the proposed model is equivalent to a switching state space model. The switch variables correspond to discrete note locations as in a musical score. The continuous hidden variables denote the tempo. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. Exact computation of posterior features such as the MAP state is intractable in this model class, so we introduce Monte Carlo methods for integration and optimization. We compare Markov Chain Monte Carlo (MCMC) methods (such as Gibbs sampling, simulated annealing and iterative improvement) and sequential Monte Carlo methods (particle filters). Our simulation results suggest better results with sequential methods. The methods can be applied in both online and batch scenarios such as tempo tracking and transcription and are thus potentially useful in a number of music applications such as adaptive automatic accompaniment, score typesetting and music information retrieval. Bayesian belief networks have grown to prominence because they provide compact representations for many problems for which probabilistic inference is appropriate, and there are algorithms to exploit this compactness. The next step is to allow compact representations of the conditional probabilities of a variable given its parents. In this paper we present such a representation that exploits contextual independence in terms of parent contexts; which variables act as parents may depend on the value of other variables. The internal representation is in terms of contextual factors (confactors) that is simply a pair of a context and a table. The algorithm, contextual variable elimination, is based on the standard variable elimination algorithm that eliminates the non-query variables in turn, but when eliminating a variable, the tables that need to be multiplied can depend on the context. This algorithm reduces to standard variable elimination when there is no contextual independence structure to exploit. We show how this can be much more efficient than variable elimination when there is structure to exploit. We explain why this new method can exploit more structure than previous methods for structured belief network inference and an analogous algorithm that uses trees. In this article we present an algorithm to compute bounds on the marginals of a graphical model. For several small clusters of nodes upper and lower bounds on the marginal values are computed independently of the rest of the network. The range of allowed probability distributions over the surrounding nodes is restricted using earlier computed bounds. As we will show, this can be considered as a set of constraints in a linear programming problem of which the objective function is the marginal probability of the center nodes. In this way knowledge about the maginals of neighbouring clusters is passed to other clusters thereby tightening the bounds on their marginals. We show that sharp bounds can be obtained for undirected and directed graphs that are used for practical applications, but for which exact computations are infeasible. Policies of Markov Decision Processes (MDPs) determine the next action to execute from the current state and, possibly, the history (the past states). When the number of states is large, succinct representations are often used to compactly represent both the MDPs and the policies in a reduced amount of space. In this paper, some problems related to the size of succinctly represented policies are analyzed. Namely, it is shown that some MDPs have policies that can only be represented in space super-polynomial in the size of the MDP, unless the polynomial hierarchy collapses. This fact motivates the study of the problem of deciding whether a given MDP has a policy of a given size and reward. Since some algorithms for MDPs work by finding a succinct representation of the value function, the problem of deciding the existence of a succinct representation of a value function of a given size and reward is also considered. We describe a system for specifying the effects of actions. Unlike those commonly used in AI planning, our system uses an action description language that allows one to specify the effects of actions using domain rules, which are state constraints that can entail new action effects from old ones. Declaratively, an action domain in our language corresponds to a nonmonotonic causal theory in the situation calculus. Procedurally, such an action domain is compiled into a set of logical theories, one for each action in the domain, from which fully instantiated successor state-like axioms and STRIPS-like systems are then generated. We expect the system to be a useful tool for knowledge engineers writing action specifications for classical AI planning systems, GOLOG systems, and other systems where formal specifications of actions are needed. VHPOP is a partial order causal link (POCL) planner loosely based on UCPOP. It draws from the experience gained in the early to mid 1990's on flaw selection strategies for POCL planning, and combines this with more recent developments in the field of domain independent planning such as distance based heuristics and reachability analysis. We present an adaptation of the additive heuristic for plan space planning, and modify it to account for possible reuse of existing actions in a plan. We also propose a large set of novel flaw selection strategies, and show how these can help us solve more problems than previously possible by POCL planners. VHPOP also supports planning with durative actions by incorporating standard techniques for temporal constraint reasoning. We demonstrate that the same heuristic techniques used to boost the performance of classical POCL planning can be effective in domains with durative actions as well. The result is a versatile heuristic POCL planner competitive with established CSP-based and heuristic state space planners. Recently, planning based on answer set programming has been proposed as an approach towards realizing declarative planning systems. In this paper, we present the language Kc, which extends the declarative planning language K by action costs. Kc provides the notion of admissible and optimal plans, which are plans whose overall action costs are within a given limit resp. minimum over all plans (i.e., cheapest plans). As we demonstrate, this novel language allows for expressing some nontrivial planning tasks in a declarative way. Furthermore, it can be utilized for representing planning problems under other optimality criteria, such as computing ``shortest'' plans (with the least number of steps), and refinement combinations of cheapest and fastest plans. We study complexity aspects of the language Kc and provide a transformation to logic programs, such that planning problems are solved via answer set programming. Furthermore, we report experimental results on selected problems. Our experience is encouraging that answer set planning may be a valuable approach to expressive planning systems in which intricate planning problems can be naturally specified and solved. SAPA is a domain-independent heuristic forward chaining planner that can handle durative actions, metric resource constraints, and deadline goals. It is designed to be capable of handling the multi-objective nature of metric temporal planning. Our technical contributions include (i) planning-graph based methods for deriving heuristics that are sensitive to both cost and makespan (ii) techniques for adjusting the heuristic estimates to take action interactions and metric resource limitations into account and (iii) a linear time greedy post-processing technique to improve execution flexibility of the solution plans. An implementation of SAPA using many of the techniques presented in this paper was one of the best domain independent planners for domains with metric and temporal constraints in the third International Planning Competition, held at AIPS-02. We describe the technical details of extracting the heuristics and present an empirical evaluation of the current implementation of SAPA. Despite their near dominance, heuristic state search planners still lag behind disjunctive planners in the generation of parallel plans in classical planning. The reason is that directly searching for parallel solutions in state space planners would require the planners to branch on all possible subsets of parallel actions, thus increasing the branching factor exponentially. We present a variant of our heuristic state search planner AltAlt, called AltAltp which generates parallel plans by using greedy online parallelization of partial plans. The greedy approach is significantly informed by the use of novel distance heuristics that AltAltp derives from a graphplan-style planning graph for the problem. While this approach is not guaranteed to provide optimal parallel plans, empirical results show that AltAltp is capable of generating good quality parallel plans at a fraction of the cost incurred by the disjunctive planners. We present some techniques for planning in domains specified with the recent standard language PDDL2.1, supporting 'durative actions' and numerical quantities. These techniques are implemented in LPG, a domain-independent planner that took part in the 3rd International Planning Competition (IPC). LPG is an incremental, any time system producing multi-criteria quality plans. The core of the system is based on a stochastic local search method and on a graph-based representation called 'Temporal Action Graphs' (TA-graphs). This paper focuses on temporal planning, introducing TA-graphs and proposing some techniques to guide the search in LPG using this representation. The experimental results of the 3rd IPC, as well as further results presented in this paper, show that our techniques can be very effective. Often LPG outperforms all other fully-automated planners of the 3rd IPC in terms of speed to derive a solution, or quality of the solutions that can be produced. TALplanner is a forward-chaining planner that relies on domain knowledge in the shape of temporal logic formulas in order to prune irrelevant parts of the search space. TALplanner recently participated in the third International Planning Competition, which had a clear emphasis on increasing the complexity of the problem domains being used as benchmark tests and the expressivity required to represent these domains in a planning system. Like many other planners, TALplanner had support for some but not all aspects of this increase in expressivity, and a number of changes to the planner were required. After a short introduction to TALplanner, this article describes some of the changes that were made before and during the competition. We also describe the process of introducing suitable domain knowledge for several of the competition domains. The performance of anytime algorithms can be improved by simultaneously solving several instances of algorithm-problem pairs. These pairs may include different instances of a problem (such as starting from a different initial state), different algorithms (if several alternatives exist), or several runs of the same algorithm (for non-deterministic algorithms). In this paper we present a methodology for designing an optimal scheduling policy based on the statistical characteristics of the algorithms involved. We formally analyze the case where the processes share resources (a single-processor model), and provide an algorithm for optimal scheduling. We analyze, theoretically and empirically, the behavior of our scheduling algorithm for various distribution types. Finally, we present empirical results of applying our scheduling algorithm to the Latin Square problem. Auctions are becoming an increasingly popular method for transacting business, especially over the Internet. This article presents a general approach to building autonomous bidding agents to bid in multiple simultaneous auctions for interacting goods. A core component of our approach learns a model of the empirical price dynamics based on past data and uses the model to analytically calculate, to the greatest extent possible, optimal bids. We introduce a new and general boosting-based algorithm for conditional density estimation problems of this kind, i.e., supervised learning problems in which the goal is to estimate the entire conditional distribution of the real-valued label. This approach is fully implemented as ATTac-2001, a top-scoring agent in the second Trading Agent Competition (TAC-01). We present experiments demonstrating the effectiveness of our boosting-based price predictor relative to several reasonable alternatives. Planning with numeric state variables has been a challenge for many years, and was a part of the 3rd International Planning Competition (IPC-3). Currently one of the most popular and successful algorithmic techniques in STRIPS planning is to guide search by a heuristic function, where the heuristic is based on relaxing the planning task by ignoring the delete lists of the available actions. We present a natural extension of ``ignoring delete lists'' to numeric state variables, preserving the relevant theoretical properties of the STRIPS relaxation under the condition that the numeric task at hand is ``monotonic''. We then identify a subset of the numeric IPC-3 competition language, ``linear tasks'', where monotonicity can be achieved by pre-processing. Based on that, we extend the algorithms used in the heuristic planning system FF to linear tasks. The resulting system Metric-FF is, according to the IPC-3 results which we discuss, one of the two currently most efficient numeric planners. This paper reports the outcome of the third in the series of biennial international planning competitions, held in association with the International Conference on AI Planning and Scheduling (AIPS) in 2002. In addition to describing the domains, the planners and the objectives of the competition, the paper includes analysis of the results. The results are analysed from several perspectives, in order to address the questions of comparative performance between planners, comparative difficulty of domains, the degree of agreement between planners about the relative difficulty of individual problem instances and the question of how well planners scale relative to one another over increasingly difficult problems. The paper addresses these questions through statistical analysis of the raw results of the competition, in order to determine which results can be considered to be adequately supported by the data. The paper concludes with a discussion of some challenges for the future of the competition series. Information about user preferences plays a key role in automated decision making. In many domains it is desirable to assess such preferences in a qualitative rather than quantitative way. In this paper, we propose a qualitative graphical representation of preferences that reflects conditional dependence and independence of preference statements under a ceteris paribus (all else being equal) interpretation. Such a representation is often compact and arguably quite natural in many circumstances. We provide a formal semantics for this model, and describe how the structure of the network can be exploited in several inference tasks, such as determining whether one outcome dominates (is preferred to) another, ordering a set outcomes according to the preference relation, and constructing the best outcome subject to available evidence. We propose a formalism for representation of finite languages, referred to as the class of IDL-expressions, which combines concepts that were only considered in isolation in existing formalisms. The suggested applications are in natural language processing, more specifically in surface natural language generation and in machine translation, where a sentence is obtained by first generating a large set of candidate sentences, represented in a compact way, and then by filtering such a set through a parser. We study several formal properties of IDL-expressions and compare this new formalism with more standard ones. We also present a novel parsing algorithm for IDL-expressions and prove a non-trivial upper bound on its time complexity. Hierarchical latent class (HLC) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, empirical studies suggest that the BIC score is a reasonable criterion to use in practice for learning HLC models. Empirical studies also suggest that sometimes model selection can be improved if standard model dimension is replaced with effective model dimension in the penalty term of the BIC score. Effective dimensions are difficult to compute. In this paper, we prove a theorem that relates the effective dimension of an HLC model to the effective dimensions of a number of latent class models. The theorem makes it computationally feasible to compute the effective dimensions of large HLC models. The theorem can also be used to compute the effective dimensions of general tree models. Searching for and making decisions about information is becoming increasingly difficult as the amount of information and number of choices increases. Recommendation systems help users find items of interest of a particular type, such as movies or restaurants, but are still somewhat awkward to use. Our solution is to take advantage of the complementary strengths of personalized recommendation systems and dialogue systems, creating personalized aides. We present a system -- the Adaptive Place Advisor -- that treats item selection as an interactive, conversational process, with the program inquiring about item attributes and the user responding. Individual, long-term user preferences are unobtrusively obtained in the course of normal recommendation dialogues and used to direct future conversations with the same user. We present a novel user model that influences both item search and the questions asked during a conversation. We demonstrate the effectiveness of our system in significantly reducing the time and number of interactions required to find a satisfactory item, as compared to a control group of users interacting with a non-adaptive version of the system. We introduce an abductive method for a coherent integration of independent data-sources. The idea is to compute a list of data-facts that should be inserted to the amalgamated database or retracted from it in order to restore its consistency. This method is implemented by an abductive solver, called Asystem, that applies SLDNFA-resolution on a meta-theory that relates different, possibly contradicting, input databases. We also give a pure model-theoretic analysis of the possible ways to `recover' consistent data from an inconsistent database in terms of those models of the database that exhibit as minimal inconsistent information as reasonably possible. This allows us to characterize the `recovered databases' in terms of the `preferred' (i.e., most consistent) models of the theory. The outcome is an abductive-based application that is sound and complete with respect to a corresponding model-based, preferential semantics, and -- to the best of our knowledge -- is more expressive (thus more general) than any other implementation of coherent integration of databases. We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account. The 2002 Trading Agent Competition (TAC) presented a challenging market game in the domain of travel shopping. One of the pivotal issues in this domain is uncertainty about hotel prices, which have a significant influence on the relative cost of alternative trip schedules. Thus, virtually all participants employ some method for predicting hotel prices. We survey approaches employed in the tournament, finding that agents apply an interesting diversity of techniques, taking into account differing sources of evidence bearing on prices. Based on data provided by entrants on their agents' actual predictions in the TAC-02 finals and semifinals, we analyze the relative efficacy of these approaches. The results show that taking into account game-specific information about flight prices is a major distinguishing factor. Machine learning methods effectively induce the relationship between flight and hotel prices from game data, and a purely analytical approach based on competitive equilibrium analysis achieves equal accuracy with no historical data. Employing a new measure of prediction quality, we relate absolute accuracy to bottom-line performance in the game. The predominant knowledge-based approach to automated model construction, compositional modelling, employs a set of models of particular functional components. Its inference mechanism takes a scenario describing the constituent interacting components of a system and translates it into a useful mathematical model. This paper presents a novel compositional modelling approach aimed at building model repositories. It furthers the field in two respects. Firstly, it expands the application domain of compositional modelling to systems that can not be easily described in terms of interacting functional components, such as ecological systems. Secondly, it enables the incorporation of user preferences into the model selection process. These features are achieved by casting the compositional modelling problem as an activity-based dynamic preference constraint satisfaction problem, where the dynamic constraints describe the restrictions imposed over the composition of partial models and the preferences correspond to those of the user of the automated modeller. In addition, the preference levels are represented through the use of symbolic values that differ in orders of magnitude. Two major goals in machine learning are the discovery and improvement of solutions to complex problems. In this paper, we argue that complexification, i.e. the incremental elaboration of solutions through adding new structure, achieves both these goals. We demonstrate the power of complexification through the NeuroEvolution of Augmenting Topologies (NEAT) method, which evolves increasingly complex neural network architectures. NEAT is applied to an open-ended coevolutionary robot duel domain where robot controllers compete head to head. Because the robot duel domain supports a wide range of strategies, and because coevolution benefits from an escalating arms race, it serves as a suitable testbed for studying complexification. When compared to the evolution of networks with fixed structure, complexifying evolution discovers significantly more sophisticated strategies. The results suggest that in order to discover and improve complex solutions, evolution, and search in general, should be allowed to complexify as well as optimize. When writing a constraint program, we have to choose which variables should be the decision variables, and how to represent the constraints on these variables. In many cases, there is considerable choice for the decision variables. Consider, for example, permutation problems in which we have as many values as variables, and each variable takes an unique value. In such problems, we can choose between a primal and a dual viewpoint. In the dual viewpoint, each dual variable represents one of the primal values, whilst each dual value represents one of the primal variables. Alternatively, by means of channelling constraints to link the primal and dual variables, we can have a combined model with both sets of variables. In this paper, we perform an extensive theoretical and empirical study of such primal, dual and combined models for two classes of problems: permutation problems and injection problems. Our results show that it often be advantageous to use multiple viewpoints, and to have constraints which channel between them to maintain consistency. They also illustrate a general methodology for comparing different constraint models. This is the first of three planned papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high-performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal is to define a representation in which this structure is apparent and can easily be exploited to improve computational performance. This paper is a survey of the work underlying ZAP, and discusses previous attempts to improve the performance of the Davis-Putnam-Logemann-Loveland algorithm by exploiting the structure of the problem being solved. We examine existing ideas including extensions of the Boolean language to allow cardinality constraints, pseudo-Boolean representations, symmetry, and a limited form of quantification. While this paper is intended as a survey, our research results are contained in the two subsequent articles, with the theoretical structure of ZAP described in the second paper in this series, and ZAP's implementation described in the third. Argumentation is based on the exchange and valuation of interacting arguments, followed by the selection of the most acceptable of them (for example, in order to take a decision, to make a choice). Starting from the framework proposed by Dung in 1995, our purpose is to introduce 'graduality' in the selection of the best arguments, i.e., to be able to partition the set of the arguments in more than the two usual subsets of 'selected' and 'non-selected' arguments in order to represent different levels of selection. Our basic idea is that an argument is all the more acceptable if it can be preferred to its attackers. First, we discuss general principles underlying a 'gradual' valuation of arguments based on their interactions. Following these principles, we define several valuation models for an abstract argumentation system. Then, we introduce 'graduality' in the concept of acceptability of arguments. We propose new acceptability classes and a refinement of existing classes taking advantage of an available 'gradual' valuation. Decentralized control of cooperative systems captures the operation of a group of decision makers that share a single global objective. The difficulty in solving optimally such problems arises when the agents lack full observability of the global state of the system when they operate. The general problem has been shown to be NEXP-complete. In this paper, we identify classes of decentralized control problems whose complexity ranges between NEXP and P. In particular, we study problems characterized by independent transitions, independent observations, and goal-oriented objective functions. Two algorithms are shown to solve optimally useful classes of goal-oriented decentralized processes in polynomial time. This paper also studies information sharing among the decision-makers, which can improve their performance. We distinguish between three ways in which agents can exchange information: indirect communication, direct communication and sharing state features that are not controlled by the agents. Our analysis shows that for every class of problems we consider, introducing direct or indirect communication does not change the worst-case complexity. The results provide a better understanding of the complexity of decentralized control problems that arise in practice and facilitate the development of planning algorithms for these problems. This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average log-loss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a "decomposed" CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the Lempel-Ziv compression algorithm, significantly outperforms all algorithms on the protein classification problems. Many known planning tasks have inherent constraints concerning the best order in which to achieve the goals. A number of research efforts have been made to detect such constraints and to use them for guiding search, in the hope of speeding up the planning process. We go beyond the previous approaches by considering ordering constraints not only over the (top-level) goals, but also over the sub-goals that will necessarily arise during planning. Landmarks are facts that must be true at some point in every valid solution plan. We extend Koehler and Hoffmann's definition of reasonable orders between top level goals to the more general case of landmarks. We show how landmarks can be found, how their reasonable orders can be approximated, and how this information can be used to decompose a given planning task into several smaller sub-tasks. Our methodology is completely domain- and planner-independent. The implementation demonstrates that the approach can yield significant runtime performance improvements when used as a control loop around state-of-the-art sub-optimal planning systems, as exemplified by FF and LPG. We propose a model for errors in sung queries, a variant of the hidden Markov model (HMM). This is a solution to the problem of identifying the degree of similarity between a (typically error-laden) sung query and a potential target in a database of musical works, an important problem in the field of music information retrieval. Similarity metrics are a critical component of query-by-humming (QBH) applications which search audio and multimedia databases for strong matches to oral queries. Our model comprehensively expresses the types of error or variation between target and query: cumulative and non-cumulative local errors, transposition, tempo and tempo changes, insertions, deletions and modulation. The model is not only expressive, but automatically trainable, or able to learn and generalize from query examples. We present results of simulations, designed to assess the discriminatory potential of the model, and tests with real sung queries, to demonstrate relevance to real-world applications. In recent years, there has been much interest in phase transitions of combinatorial problems. Phase transitions have been successfully used to analyze combinatorial optimization problems, characterize their typical-case features and locate the hardest problem instances. In this paper, we study phase transitions of the asymmetric Traveling Salesman Problem (ATSP), an NP-hard combinatorial optimization problem that has many real-world applications. Using random instances of up to 1,500 cities in which intercity distances are uniformly distributed, we empirically show that many properties of the problem, including the optimal tour cost and backbone size, experience sharp transitions as the precision of intercity distances increases across a critical value. Our experimental results on the costs of the ATSP tours and assignment problem agree with the theoretical result that the asymptotic cost of assignment problem is pi ^2 /6 the number of cities goes to infinity. In addition, we show that the average computational cost of the well-known branch-and-bound subtour elimination algorithm for the problem also exhibits a thrashing behavior, transitioning from easy to difficult as the distance precision increases. These results answer positively an open question regarding the existence of phase transitions in the ATSP, and provide guidance on how difficult ATSP problem instances should be generated. Purely data driven approaches for machine learning present difficulties when data is scarce relative to the complexity of the model or when the model is forced to extrapolate. On the other hand, purely mechanistic approaches need to identify and specify all the interactions in the problem at hand (which may not be feasible) and still leave the issue of how to parameterize the system. In this paper, we present a hybrid approach using Gaussian processes and differential equations to combine data driven modelling with a physical model of the system. We show how different, physically-inspired, kernel functions can be developed through sensible, simple, mechanistic assumptions about the underlying system. The versatility of our approach is illustrated with three case studies from motion capture, computational biology and geostatistics. Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risk-sensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered. Data compression is a method of improving the efficiency of transmission and storage of images. Dithering, as a method of data compression, can be used to convert an 8-bit gray level image into a 1-bit / binary image. Undithering is the process of reconstruction of gray image from binary image obtained from dithering of gray image. In the present paper, I propose a method of undithering using linear filtering followed by anisotropic diffusion which brings the advantage of smoothing and edge enhancement. First-order statistical parameters, second-order statistical parameters, mean-squared error (MSE) between reconstructed image and the original image before dithering, and peak signal to noise ratio (PSNR) are evaluated at each step of diffusion. Results of the experiments show that the reconstructed image is not as sharp as the image before dithering but a large number of gray values are reproduced with reference to those of the original image prior to dithering. This paper begins the discussion of how the Information Flow Framework can be used to provide a principled foundation for the metalevel (or structural level) of the Standard Upper Ontology (SUO). This SUO structural level can be used as a logical framework for manipulating collections of ontologies in the object level of the SUO or other middle level or domain ontologies. From the Information Flow perspective, the SUO structural level resolves into several metalevel ontologies. This paper discusses a KIF formalization for one of those metalevel categories, the Category Theory Ontology. In particular, it discusses its category and colimit sub-namespaces. This paper describes the Automated Reasoning for Mizar (MizAR) service, which integrates several automated reasoning, artificial intelligence, and presentation tools with Mizar and its authoring environment. The service provides ATP assistance to Mizar authors in finding and explaining proofs, and offers generation of Mizar problems as challenges to ATP systems. The service is based on a sound translation from the Mizar language to that of first-order ATP systems, and relies on the recent progress in application of ATP systems in large theories containing tens of thousands of available facts. We present the main features of MizAR services, followed by an account of initial experiments in finding proofs with the ATP assistance. Our initial experience indicates that the tool offers substantial help in exploring the Mizar library and in preparing new Mizar articles. We propose a structured approach to the problem of retrieval of images by content and present a description logic that has been devised for the semantic indexing and retrieval of images containing complex objects. As other approaches do, we start from low-level features extracted with image analysis to detect and characterize regions in an image. However, in contrast with feature-based approaches, we provide a syntax to describe segmented regions as basic objects and complex objects as compositions of basic ones. Then we introduce a companion extensional semantics for defining reasoning services, such as retrieval, classification, and subsumption. These services can be used for both exact and approximate matching, using similarity measures. Using our logical approach as a formal specification, we implemented a complete client-server image retrieval system, which allows a user to pose both queries by sketch and queries by example. A set of experiments has been carried out on a testbed of images to assess the retrieval capabilities of the system in comparison with expert users ranking. Results are presented adopting a well-established measure of quality borrowed from textual information retrieval. This is the second of three planned papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal is to define a representation in which this structure is apparent and can easily be exploited to improve computational performance. This paper presents the theoretical basis for the ideas underlying ZAP, arguing that existing ideas in this area exploit a single, recurring structure in that multiple database axioms can be obtained by operating on a single axiom using a subgroup of the group of permutations on the literals in the problem. We argue that the group structure precisely captures the general structure at which earlier approaches hinted, and give numerous examples of its use. We go on to extend the Davis-Putnam-Logemann-Loveland inference procedure to this broader setting, and show that earlier computational improvements are either subsumed or left intact by the new method. The third paper in this series discusses ZAPs implementation and presents experimental performance results. Stochastic processes that involve the creation of objects and relations over time are widespread, but relatively poorly studied. For example, accurate fault diagnosis in factory assembly processes requires inferring the probabilities of erroneous assembly operations, but doing this efficiently and accurately is difficult. Modeled as dynamic Bayesian networks, these processes have discrete variables with very large domains and extremely high dimensionality. In this paper, we introduce relational dynamic Bayesian networks (RDBNs), which are an extension of dynamic Bayesian networks (DBNs) to first-order logic. RDBNs are a generalization of dynamic probabilistic relational models (DPRMs), which we had proposed in our previous work to model dynamic uncertain domains. We first extend the Rao-Blackwellised particle filtering described in our earlier work to RDBNs. Next, we lift the assumptions associated with Rao-Blackwellization in RDBNs and propose two new forms of particle filtering. The first one uses abstraction hierarchies over the predicates to smooth the particle filters estimates. The second employs kernel density estimation with a kernel function specifically designed for relational domains. Experiments show these two methods greatly outperform standard particle filtering on the task of assembly plan execution monitoring. In this paper we present a new approach to modeling finite set domain constraint problems using Reduced Ordered Binary Decision Diagrams (ROBDDs). We show that it is possible to construct an efficient set domain propagator which compactly represents many set domains and set constraints using ROBDDs. We demonstrate that the ROBDD-based approach provides unprecedented flexibility in modeling constraint satisfaction problems, leading to performance improvements. We also show that the ROBDD-based modeling approach can be extended to the modeling of integer and multiset constraint problems in a straightforward manner. Since domain propagation is not always practical, we also show how to incorporate less strict consistency notions into the ROBDD framework, such as set bounds, cardinality bounds and lexicographic bounds consistency. Finally, we present experimental results that demonstrate the ROBDD-based solver performs better than various more conventional constraint solvers on several standard set constraint problems. We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris distributional hypothesis and model the context of a certain term as a vector representing syntactic dependencies which are automatically acquired from the text corpus with a linguistic parser. On the basis of this context information, FCA produces a lattice that we convert into a special kind of partial order constituting a concept hierarchy. The approach is evaluated by comparing the resulting concept hierarchies with hand-crafted taxonomies for two domains: tourism and finance. We also directly compare our approach with hierarchical agglomerative clustering as well as with Bi-Section-KMeans as an instance of a divisive clustering algorithm. Furthermore, we investigate the impact of using different measures weighting the contribution of each attribute as well as of applying a particular smoothing technique to cope with data sparseness. This is the third of three papers describing ZAP, a satisfiability engine that substantially generalizes existing tools while retaining the performance characteristics of modern high-performance solvers. The fundamental idea underlying ZAP is that many problems passed to such engines contain rich internal structure that is obscured by the Boolean representation used; our goal has been to define a representation in which this structure is apparent and can be exploited to improve computational performance. The first paper surveyed existing work that (knowingly or not) exploited problem structure to improve the performance of satisfiability engines, and the second paper showed that this structure could be understood in terms of groups of permutations acting on individual clauses in any particular Boolean theory. We conclude the series by discussing the techniques needed to implement our ideas, and by reporting on their performance on a variety of problem instances. Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agents belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems. In this paper we propose a crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions. The operator is based on the theoretical distribution of the values of the genes of the best individuals in the population. The proposed operator takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring. Our aim is the optimization of the balance between exploration and exploitation in the search process. In order to test the efficiency and robustness of this crossover, we have used a set of functions to be optimized with regard to different criteria, such as, multimodality, separability, regularity and epistasis. With this set of functions we can extract conclusions in function of the problem at hand. We analyze the results using ANOVA and multiple comparison statistical tests. As an example of how our crossover can be used to solve artificial intelligence problems, we have applied the proposed model to the problem of obtaining the weight of each network in a ensemble of neural networks. The results obtained are above the performance of standard methods. Despite recent progress in AI planning, many benchmarks remain challenging for current planners. In many domains, the performance of a planner can greatly be improved by discovering and exploiting information about the domain structure that is not explicitly encoded in the initial PDDL formulation. In this paper we present and compare two automated methods that learn relevant information from previous experience in a domain and use it to solve new problem instances. Our methods share a common four-step strategy. First, a domain is analyzed and structural information is extracted, then macro-operators are generated based on the previously discovered structure. A filtering and ranking procedure selects the most useful macro-operators. Finally, the selected macros are used to speed up future searches. We have successfully used such an approach in the fourth international planning competition IPC-4. Our system, Macro-FF, extends Hoffmanns state-of-the-art planner FF 2.3 with support for two kinds of macro-operators, and with engineering enhancements. We demonstrate the effectiveness of our ideas on benchmarks from international planning competitions. Our results indicate a large reduction in search effort in those complex domains where structural information can be inferred. We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work. We provide an overview of the organization and results of the deterministic part of the 4th International Planning Competition, i.e., of the part concerned with evaluating systems doing deterministic planning. IPC-4 attracted even more competing systems than its already large predecessors, and the competition event was revised in several important respects. After giving an introduction to the IPC, we briefly explain the main differences between the deterministic part of IPC-4 and its predecessors. We then introduce formally the language used, called PDDL2.2 that extends PDDL2.1 by derived predicates and timed initial literals. We list the competing systems and overview the results of the competition. The entire set of data is far too large to be presented in full. We provide a detailed summary; the complete data is available in an online appendix. We explain how we awarded the competition prizes. A non-binary Constraint Satisfaction Problem (CSP) can be solved directly using extended versions of binary techniques. Alternatively, the non-binary problem can be translated into an equivalent binary one. In this case, it is generally accepted that the translated problem can be solved by applying well-established techniques for binary CSPs. In this paper we evaluate the applicability of the latter approach. We demonstrate that the use of standard techniques for binary CSPs in the encodings of non-binary problems is problematic and results in models that are very rarely competitive with the non-binary representation. To overcome this, we propose specialized arc consistency and search algorithms for binary encodings, and we evaluate them theoretically and empirically. We consider three binary representations; the hidden variable encoding, the dual encoding, and the double encoding. Theoretical and empirical results show that, for certain classes of non-binary constraints, binary encodings are a competitive option, and in many cases, a better one than the non-binary representation. In this paper, we introduce DLS-MC, a new stochastic local search algorithm for the maximum clique problem. DLS-MC alternates between phases of iterative improvement, during which suitable vertices are added to the current clique, and plateau search, during which vertices of the current clique are swapped with vertices not contained in the current clique. The selection of vertices is solely based on vertex penalties that are dynamically adjusted during the search, and a perturbation mechanism is used to overcome search stagnation. The behaviour of DLS-MC is controlled by a single parameter, penalty delay, which controls the frequency at which vertex penalties are reduced. We show empirically that DLS-MC achieves substantial performance improvements over state-of-the-art algorithms for the maximum clique problem over a large range of the commonly used DIMACS benchmark instances. Open distributed multi-agent systems are gaining interest in the academic community and in industry. In such open settings, agents are often coordinated using standardized agent conversation protocols. The representation of such protocols (for analysis, validation, monitoring, etc) is an important aspect of multi-agent applications. Recently, Petri nets have been shown to be an interesting approach to such representation, and radically different approaches using Petri nets have been proposed. However, their relative strengths and weaknesses have not been examined. Moreover, their scalability and suitability for different tasks have not been addressed. This paper addresses both these challenges. First, we analyze existing Petri net representations in terms of their scalability and appropriateness for overhearing, an important task in monitoring open multi-agent systems. Then, building on the insights gained, we introduce a novel representation using Colored Petri nets that explicitly represent legal joint conversation states and messages. This representation approach offers significant improvements in scalability and is particularly suitable for overhearing. Furthermore, we show that this new representation offers a comprehensive coverage of all conversation features of FIPA conversation standards. We also present a procedure for transforming AUML conversation protocol diagrams (a standard human-readable representation), to our Colored Petri net representation. This article develops Probabilistic Hybrid Action Models (PHAMs), a realistic causal model for predicting the behavior generated by modern percept-driven robot plans. PHAMs represent aspects of robot behavior that cannot be represented by most action models used in AI planning: the temporal structure of continuous control processes, their non-deterministic effects, several modes of their interferences, and the achievement of triggering conditions in closed-loop robot plans. The main contributions of this article are: (1) PHAMs, a model of concurrent percept-driven behavior, its formalization, and proofs that the model generates probably, qualitatively accurate predictions; and (2) a resource-efficient inference method for PHAMs based on sampling projections from probabilistic action models and state descriptions. We show how PHAMs can be applied to planning the course of action of an autonomous robot office courier based on analytical and experimental results. Distributed Constraint Satisfaction (DCSP) has long been considered an important problem in multi-agent systems research. This is because many real-world problems can be represented as constraint satisfaction and these problems often present themselves in a distributed form. In this article, we present a new complete, distributed algorithm called Asynchronous Partial Overlay (APO) for solving DCSPs that is based on a cooperative mediation process. The primary ideas behind this algorithm are that agents, when acting as a mediator, centralize small, relevant portions of the DCSP, that these centralized subproblems overlap, and that agents increase the size of their subproblems along critical paths within the DCSP as the problem solving unfolds. We present empirical evidence that shows that APO outperforms other known, complete DCSP techniques. As partial justification of their framework for iterated belief revision Darwiche and Pearl convincingly argued against Boutiliers natural revision and provided a prototypical revision operator that fits into their scheme. We show that the Darwiche-Pearl arguments lead naturally to the acceptance of a smaller class of operators which we refer to as admissible. Admissible revision ensures that the penultimate input is not ignored completely, thereby eliminating natural revision, but includes the Darwiche-Pearl operator, Nayaks lexicographic revision operator, and a newly introduced operator called restrained revision. We demonstrate that restrained revision is the most conservative of admissible revision operators, effecting as few changes as possible, while lexicographic revision is the least conservative, and point out that restrained revision can also be viewed as a composite operator, consisting of natural revision preceded by an application of a "backwards revision" operator previously studied by Papini. Finally, we propose the establishment of a principled approach for choosing an appropriate revision operator in different contexts and discuss future work. In recent years, CP-nets have emerged as a useful tool for supporting preference elicitation, reasoning, and representation. CP-nets capture and support reasoning with qualitative conditional preference statements, statements that are relatively natural for users to express. In this paper, we extend the CP-nets formalism to handle another class of very natural qualitative statements one often uses in expressing preferences in daily life - statements of relative importance of attributes. The resulting formalism, TCP-nets, maintains the spirit of CP-nets, in that it remains focused on using only simple and natural preference statements, uses the ceteris paribus semantics, and utilizes a graphical representation of this information to reason about its consistency and to perform, possibly constrained, optimization using it. The extra expressiveness it provides allows us to better model tradeoffs users would like to make, more faithfully representing their preferences. A delta-model is a satisfying assignment of a Boolean formula for which any small alteration, such as a single bit flip, can be repaired by flips to some small number of other bits, yielding a new satisfying assignment. These satisfying assignments represent robust solutions to optimization problems (e.g., scheduling) where it is possible to recover from unforeseen events (e.g., a resource becoming unavailable). The concept of delta-models was introduced by Ginsberg, Parkes and Roy (AAAI 1998), where it was proved that finding delta-models for general Boolean formulas is NP-complete. In this paper, we extend that result by studying the complexity of finding delta-models for classes of Boolean formulas which are known to have polynomial time satisfiability solvers. In particular, we examine 2-SAT, Horn-SAT, Affine-SAT, dual-Horn-SAT, 0-valid and 1-valid SAT. We see a wide variation in the complexity of finding delta-models, e.g., while 2-SAT and Affine-SAT have polynomial time tests for delta-models, testing whether a Horn-SAT formula has one is NP-complete. Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the potential to improve the robustness of multimodal input interpretation. This paper presents a new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources. We show the inadequacy of traditional distance-estimation heuristics for tasks of this type and present alternative heuristics that are more appropriate for multiple-goal search. In particular, we introduce the marginal-utility heuristic, which estimates the cost and the benefit of exploring a subtree below a search node. We developed two methods for online learning of the marginal-utility heuristic. One is based on local similarity of the partial marginal utility of sibling nodes, and the other generalizes marginal-utility over the state feature space. We apply our adaptive and non-adaptive multiple-goal search algorithms to several problems, including focused crawling, and show their superiority over existing methods. We present a heuristic search algorithm for solving first-order Markov Decision Processes (FOMDPs). Our approach combines first-order state abstraction that avoids evaluating states individually, and heuristic search that avoids evaluating all states. Firstly, in contrast to existing systems, which start with propositionalizing the FOMDP and then perform state abstraction on its propositionalized version we apply state abstraction directly on the FOMDP avoiding propositionalization. This kind of abstraction is referred to as first-order state abstraction. Secondly, guided by an admissible heuristic, the search is restricted to those states that are reachable from the initial state. We demonstrate the usefulness of the above techniques for solving FOMDPs with a system, referred to as FluCaP (formerly, FCPlanner), that entered the probabilistic track of the 2004 International Planning Competition (IPC2004) and demonstrated an advantage over other planners on the problems represented in first-order terms. Exact Max-SAT solvers, compared with SAT solvers, apply little inference at each node of the proof tree. Commonly used SAT inference rules like unit propagation produce a simplified formula that preserves satisfiability but, unfortunately, solving the Max-SAT problem for the simplified formula is not equivalent to solving it for the original formula. In this paper, we define a number of original inference rules that, besides being applied efficiently, transform Max-SAT instances into equivalent Max-SAT instances which are easier to solve. The soundness of the rules, that can be seen as refinements of unit resolution adapted to Max-SAT, are proved in a novel and simple way via an integer programming transformation. With the aim of finding out how powerful the inference rules are in practice, we have developed a new Max-SAT solver, called MaxSatz, which incorporates those rules, and performed an experimental investigation. The results provide empirical evidence that MaxSatz is very competitive, at least, on random Max-2SAT, random Max-3SAT, Max-Cut, and Graph 3-coloring instances, as well as on the benchmarks from the Max-SAT Evaluation 2006. Reputation mechanisms offer an effective alternative to verification authorities for building trust in electronic markets with moral hazard. Future clients guide their business decisions by considering the feedback from past transactions; if truthfully exposed, cheating behavior is sanctioned and thus becomes irrational. It therefore becomes important to ensure that rational clients have the right incentives to report honestly. As an alternative to side-payment schemes that explicitly reward truthful reports, we show that honesty can emerge as a rational behavior when clients have a repeated presence in the market. To this end we describe a mechanism that supports an equilibrium where truthful feedback is obtained. Then we characterize the set of pareto-optimal equilibria of the mechanism, and derive an upper bound on the percentage of false reports that can be recorded by the mechanism. An important role in the existence of this bound is played by the fact that rational clients can establish a reputation for reporting honestly. Conjunctive queries play an important role as an expressive query language for Description Logics (DLs). Although modern DLs usually provide for transitive roles, conjunctive query answering over DL knowledge bases is only poorly understood if transitive roles are admitted in the query. In this paper, we consider unions of conjunctive queries over knowledge bases formulated in the prominent DL SHIQ and allow transitive roles in both the query and the knowledge base. We show decidability of query answering in this setting and establish two tight complexity bounds: regarding combined complexity, we prove that there is a deterministic algorithm for query answering that needs time single exponential in the size of the KB and double exponential in the size of the query, which is optimal. Regarding data complexity, we prove containment in co-NP. Multi-robot path planning is difficult due to the combinatorial explosion of the search space with every new robot added. Complete search of the combined state-space soon becomes intractable. In this paper we present a novel form of abstraction that allows us to plan much more efficiently. The key to this abstraction is the partitioning of the map into subgraphs of known structure with entry and exit restrictions which we can represent compactly. Planning then becomes a search in the much smaller space of subgraph configurations. Once an abstract plan is found, it can be quickly resolved into a correct (but possibly sub-optimal) concrete plan without the need for further search. We prove that this technique is sound and complete and demonstrate its practical effectiveness on a real map. A contending solution, prioritised planning, is also evaluated and shown to have similar performance albeit at the cost of completeness. The two approaches are not necessarily conflicting; we demonstrate how they can be combined into a single algorithm which outperforms either approach alone. Model checking is a promising technology, which has been applied for verification of many hardware and software systems. In this paper, we introduce the concept of model update towards the development of an automatic system modification tool that extends model checking functions. We define primitive update operations on the models of Computation Tree Logic (CTL) and formalize the principle of minimal change for CTL model update. These primitive update operations, together with the underlying minimal change principle, serve as the foundation for CTL model update. Essential semantic and computational characterizations are provided for our CTL model update approach. We then describe a formal algorithm that implements this approach. We also illustrate two case studies of CTL model updates for the well-known microwave oven example and the Andrew File System 1, from which we further propose a method to optimize the update results in complex system modifications. Ontologies and automated reasoning are the building blocks of the Semantic Web initiative. Derivation rules can be included in an ontology to define derived concepts, based on base concepts. For example, rules allow to define the extension of a class or property, based on a complex relation between the extensions of the same or other classes and properties. On the other hand, the inclusion of negative information both in the form of negation-as-failure and explicit negative information is also needed to enable various forms of reasoning. In this paper, we extend RDF graphs with weak and strong negation, as well as derivation rules. The ERDF stable model semantics of the extended framework (Extended RDF) is defined, extending RDF(S) semantics. A distinctive feature of our theory, which is based on Partial Logic, is that both truth and falsity extensions of properties and classes are considered, allowing for truth value gaps. Our framework supports both closed-world and open-world reasoning through the explicit representation of the particular closed-world assumptions and the ERDF ontological categories of total properties and total classes. We represent planning as a set of loosely coupled network flow problems, where each network corresponds to one of the state variables in the planning domain. The network nodes correspond to the state variable values and the network arcs correspond to the value transitions. The planning problem is to find a path (a sequence of actions) in each network such that, when merged, they constitute a feasible plan. In this paper we present a number of integer programming formulations that model these loosely coupled networks with varying degrees of flexibility. Since merging may introduce exponentially many ordering constraints we implement a so-called branch-and-cut algorithm, in which these constraints are dynamically generated and added to the formulation when needed. Our results are very promising, they improve upon previous planning as integer programming approaches and lay the foundation for integer programming approaches for cost optimal planning. In a facility with front room and back room operations, it is useful to switch workers between the rooms in order to cope with changing customer demand. Assuming stochastic customer arrival and service times, we seek a policy for switching workers such that the expected customer waiting time is minimized while the expected back room staffing is sufficient to perform all work. Three novel constraint programming models and several shaving procedures for these models are presented. Experimental results show that a model based on closed-form expressions together with a combination of shaving procedures is the most efficient. This model is able to find and prove optimal solutions for many problem instances within a reasonable run-time. Previously, the only available approach was a heuristic algorithm. Furthermore, a hybrid method combining the heuristic and the best constraint programming method is shown to perform as well as the heuristic in terms of solution quality over time, while achieving the same performance in terms of proving optimality as the pure constraint programming model. This is the first work of which we are aware that solves such queueing-based problems with constraint programming. Markov decision processes capture sequential decision making under uncertainty, where an agent must choose actions so as to optimize long term reward. The paper studies efficient reasoning mechanisms for Relational Markov Decision Processes (RMDP) where world states have an internal relational structure that can be naturally described in terms of objects and relations among them. Two contributions are presented. First, the paper develops First Order Decision Diagrams (FODD), a new compact representation for functions over relational structures, together with a set of operators to combine FODDs, and novel reduction techniques to keep the representation small. Second, the paper shows how FODDs can be used to develop solutions for RMDPs, where reasoning is performed at the abstract level and the resulting optimal policy is independent of domain size (number of objects) or instantiation. In particular, a variant of the value iteration algorithm is developed by using special operations over FODDs, and the algorithm is shown to converge to the optimal policy. In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of information or of efficiency, in particular w.r.t. the volume of extracted patterns. By contrast, we propose to directly work on numerical data in a more precise and efficient way, and we prove it. For that, the notions of closed patterns, generators and equivalent classes are revisited in the numerical context. Moreover, two original algorithms are proposed and used in an evaluation involving real-world data, showing the predominance of the present approach. Situation calculus has been widely applied in Artificial Intelligence related fields. This formalism is considered as a dialect of logic programming language and mostly used in dynamic domain modeling. However, type systems are hardly deployed in situation calculus in the literature. To achieve a correct and sound typed program written in situation calculus, adding typing elements into the current situation calculus will be quite helpful. In this paper, we propose to add more typing mechanisms to the current version of situation calculus, especially for three basic elements in situation calculus: situations, actions and objects, and then perform rigid type checking for existing situation calculus programs to find out the well-typed and ill-typed ones. In this way, type correctness and soundness in situation calculus programs can be guaranteed by type checking based on our type system. This modified version of a lightweight situation calculus is proved to be a robust and well-typed system. Situation calculus has been applied widely in artificial intelligence to model and reason about actions and changes in dynamic systems. Since actions carried out by agents will cause constant changes of the agents' beliefs, how to manage these changes is a very important issue. Shapiro et al. [22] is one of the studies that considered this issue. However, in this framework, the problem of noisy sensing, which often presents in real-world applications, is not considered. As a consequence, noisy sensing actions in this framework will lead to an agent facing inconsistent situation and subsequently the agent cannot proceed further. In this paper, we investigate how noisy sensing actions can be handled in iterated belief change within the situation calculus formalism. We extend the framework proposed in [22] with the capability of managing noisy sensings. We demonstrate that an agent can still detect the actual situation when the ratio of noisy sensing actions vs. accurate sensing actions is limited. We prove that our framework subsumes the iterated belief change strategy in [22] when all sensing actions are accurate. Furthermore, we prove that our framework can adequately handle belief introspection, mistaken beliefs, belief revision and belief update even with noisy sensing, as done in [22] with accurate sensing actions only. The work presented here involves the design of a Multi Layer Perceptron (MLP) based pattern classifier for recognition of handwritten Bangla digits using a 76 element feature vector. Bangla is the second most popular script and language in the Indian subcontinent and the fifth most popular language in the world. The feature set developed for representing handwritten Bangla numerals here includes 24 shadow features, 16 centroid features and 36 longest-run features. On experimentation with a database of 6000 samples, the technique yields an average recognition rate of 96.67% evaluated after three-fold cross validation of results. It is useful for applications related to OCR of handwritten Bangla Digit and can also be extended to include OCR of handwritten characters of Bangla alphabet. Languages for open-universe probabilistic models (OUPMs) can represent situations with an unknown number of objects and iden- tity uncertainty. While such cases arise in a wide range of important real-world appli- cations, existing general purpose inference methods for OUPMs are far less efficient than those available for more restricted lan- guages and model classes. This paper goes some way to remedying this deficit by in- troducing, and proving correct, a generaliza- tion of Gibbs sampling to partial worlds with possibly varying model structure. Our ap- proach draws on and extends previous generic OUPM inference methods, as well as aux- iliary variable samplers for nonparametric mixture models. It has been implemented for BLOG, a well-known OUPM language. Combined with compile-time optimizations, the resulting algorithm yields very substan- tial speedups over existing methods on sev- eral test cases, and substantially improves the practicality of OUPM languages generally. Qualitative possibilistic networks, also known as min-based possibilistic networks, are important tools for handling uncertain information in the possibility theory frame- work. Despite their importance, only the junction tree adaptation has been proposed for exact reasoning with such networks. This paper explores alternative algorithms using compilation techniques. We first propose possibilistic adaptations of standard compilation-based probabilistic methods. Then, we develop a new, purely possibilistic, method based on the transformation of the initial network into a possibilistic base. A comparative study shows that this latter performs better than the possibilistic adap- tations of probabilistic methods. This result is also confirmed by experimental results. Possibilistic answer set programming (PASP) extends answer set programming (ASP) by attaching to each rule a degree of certainty. While such an extension is important from an application point of view, existing semantics are not well-motivated, and do not always yield intuitive results. To develop a more suitable semantics, we first introduce a characterization of answer sets of classical ASP programs in terms of possibilistic logic where an ASP program specifies a set of constraints on possibility distributions. This characterization is then naturally generalized to define answer sets of PASP programs. We furthermore provide a syntactic counterpart, leading to a possibilistic generalization of the well-known Gelfond-Lifschitz reduct, and we show how our framework can readily be implemented using standard ASP solvers. Many machine learning applications require the ability to learn from and reason about noisy multi-relational data. To address this, several effective representations have been developed that provide both a language for expressing the structural regularities of a domain, and principled support for probabilistic inference. In addition to these two aspects, however, many applications also involve a third aspect-the need to reason about similarities-which has not been directly supported in existing frameworks. This paper introduces probabilistic similarity logic (PSL), a general-purpose framework for joint reasoning about similarity in relational domains that incorporates probabilistic reasoning about similarities and relational structure in a principled way. PSL can integrate any existing domain-specific similarity measures and also supports reasoning about similarities between sets of entities. We provide efficient inference and learning techniques for PSL and demonstrate its effectiveness both in common relational tasks and in settings that require reasoning about similarity. We study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, these solutions can be very sensitive to model mismatches. In this paper, motivated by online learning, we introduce a new framework for tracking. We provide an efficient tracking algorithm for this framework. We provide experimental results comparing our algorithm to the Bayesian algorithm on simulated data. Our experiments show that when there are slight model mismatches, our algorithm outperforms the Bayesian algorithm. Relational Continuous Models (RCMs) represent joint probability densities over attributes of objects, when the attributes have continuous domains. With relational representations, they can model joint probability distributions over large numbers of variables compactly in a natural way. This paper presents a new exact lifted inference algorithm for RCMs, thus it scales up to large models of real world applications. The algorithm applies to Relational Pairwise Models which are (relational) products of potentials of arity 2. Our algorithm is unique in two ways. First, it substantially improves the efficiency of lifted inference with variables of continuous domains. When a relational model has Gaussian potentials, it takes only linear-time compared to cubic time of previous methods. Second, it is the first exact inference algorithm which handles RCMs in a lifted way. The algorithm is illustrated over an example from econometrics. Experimental results show that our algorithm outperforms both a groundlevel inference algorithm and an algorithm built with previously-known lifted methods. Partially-Observable Markov Decision Processes (POMDPs) are typically solved by finding an approximate global solution to a corresponding belief-MDP. In this paper, we offer a new planning algorithm for POMDPs with continuous state, action and observation spaces. Since such domains have an inherent notion of locality, we can find an approximate solution using local optimization methods. We parameterize the belief distribution as a Gaussian mixture, and use the Extended Kalman Filter (EKF) to approximate the belief update. Since the EKF is a first-order filter, we can marginalize over the observations analytically. By using feedback control and state estimation during policy execution, we recover a behavior that is effectively conditioned on incoming observations despite the unconditioned planning. Local optimization provides no guarantees of global optimality, but it allows us to tackle domains that are at least an order of magnitude larger than the current state-of-the-art. We demonstrate the scalability of our algorithm by considering a simulated hand-eye coordination domain with 16 continuous state dimensions and 6 continuous action dimensions. In this paper we introduce a class of Markov decision processes that arise as a natural model for many renewable resource allocation problems. Upon extending results from the inventory control literature, we prove that they admit a closed form solution and we show how to exploit this structure to speed up its computation. We consider the application of the proposed framework to several problems arising in very different domains, and as part of the ongoing effort in the emerging field of Computational Sustainability we discuss in detail its application to the Northern Pacific Halibut marine fishery. Our approach is applied to a model based on real world data, obtaining a policy with a guaranteed lower bound on the utility function that is structurally very different from the one currently employed. While game theory is widely used to model strategic interactions, a natural question is where do the game representations come from? One answer is to learn the representations from data. If one wants to learn both the payoffs and the players' strategies, a naive approach is to learn them both directly from the data. This approach ignores the fact the players might be playing reasonably good strategies, so there is a connection between the strategies and the data. The main contribution of this paper is to make this connection while learning. We formulate the learning problem as a weighted constraint satisfaction problem, including constraints both for the fit of the payoffs and strategies to the data and the fit of the strategies to the payoffs. We use quantal response equilibrium as our notion of rationality for quantifying the latter fit. Our results show that incorporating rationality constraints can improve learning when the amount of data is limited. Cyber-physical systems, such as mobile robots, must respond adaptively to dynamic operating conditions. Effective operation of these systems requires that sensing and actuation tasks are performed in a timely manner. Additionally, execution of mission specific tasks such as imaging a room must be balanced against the need to perform more general tasks such as obstacle avoidance. This problem has been addressed by maintaining relative utilization of shared resources among tasks near a user-specified target level. Producing optimal scheduling strategies requires complete prior knowledge of task behavior, which is unlikely to be available in practice. Instead, suitable scheduling strategies must be learned online through interaction with the system. We consider the sample complexity of reinforcement learning in this domain, and demonstrate that while the problem state space is countably infinite, we may leverage the problem's structure to guarantee efficient learning. Computing the probability of a formula given the probabilities or weights associated with other formulas is a natural extension of logical inference to the probabilistic setting. Surprisingly, this problem has received little attention in the literature to date, particularly considering that it includes many standard inference problems as special cases. In this paper, we propose two algorithms for this problem: formula decomposition and conditioning, which is an exact method, and formula importance sampling, which is an approximate method. The latter is, to our knowledge, the first application of model counting to approximate probabilistic inference. Unlike conventional variable-based algorithms, our algorithms work in the dual realm of logical formulas. Theoretically, we show that our algorithms can greatly improve efficiency by exploiting the structural information in the formulas. Empirically, we show that they are indeed quite powerful, often achieving substantial performance gains over state-of-the-art schemes. This paper addresses the problem of sampling from binary distributions with constraints. In particular, it proposes an MCMC method to draw samples from a distribution of the set of all states at a specified distance from some reference state. For example, when the reference state is the vector of zeros, the algorithm can draw samples from a binary distribution with a constraint on the number of active variables, say the number of 1's. We motivate the need for this algorithm with examples from statistical physics and probabilistic inference. Unlike previous algorithms proposed to sample from binary distributions with these constraints, the new algorithm allows for large moves in state space and tends to propose them such that they are energetically favourable. The algorithm is demonstrated on three Boltzmann machines of varying difficulty: A ferromagnetic Ising model (with positive potentials), a restricted Boltzmann machine with learned Gabor-like filters as potentials, and a challenging three-dimensional spin-glass (with positive and negative potentials). Over the past two decades, several consistent procedures have been designed to infer causal conclusions from observational data. We prove that if the true causal network might be an arbitrary, linear Gaussian network or a discrete Bayes network, then every unambiguous causal conclusion produced by a consistent method from non-experimental data is subject to reversal as the sample size increases any finite number of times. That result, called the causal flipping theorem, extends prior results to the effect that causal discovery cannot be reliable on a given sample size. We argue that since repeated flipping of causal conclusions is unavoidable in principle for consistent methods, the best possible discovery methods are consistent methods that retract their earlier conclusions no more than necessary. A series of simulations of various methods across a wide range of sample sizes illustrates concretely both the theorem and the principle of comparing methods in terms of retractions. Decentralized POMDPs provide an expressive framework for multi-agent sequential decision making. While fnite-horizon DECPOMDPs have enjoyed signifcant success, progress remains slow for the infnite-horizon case mainly due to the inherent complexity of optimizing stochastic controllers representing agent policies. We present a promising new class of algorithms for the infnite-horizon case, which recasts the optimization problem as inference in a mixture of DBNs. An attractive feature of this approach is the straightforward adoption of existing inference techniques in DBNs for solving DEC-POMDPs and supporting richer representations such as factored or continuous states and actions. We also derive the Expectation Maximization (EM) algorithm to optimize the joint policy represented as DBNs. Experiments on benchmark domains show that EM compares favorably against the state-of-the-art solvers. Decision theoretical troubleshooting is about minimizing the expected cost of solving a certain problem like repairing a complicated man-made device. In this paper we consider situations where you have to take apart some of the device to get access to certain clusters and actions. Specifically, we investigate troubleshooting with independent actions in a tree of clusters where actions inside a cluster cannot be performed before the cluster is opened. The problem is non-trivial because there is a cost associated with opening and closing a cluster. Troubleshooting with independent actions and no clusters can be solved in O(n lg n) time (n being the number of actions) by the well-known "P-over-C" algorithm due to Kadane and Simon, but an efficient and optimal algorithm for a tree cluster model has not yet been found. In this paper we describe a "bottom-up P-over-C" O(n lg n) time algorithm and show that it is optimal when the clusters do not need to be closed to test whether the actions solved the problem. This note deals with a class of variables that, if conditioned on, tends to amplify confounding bias in the analysis of causal effects. This class, independently discovered by Bhattacharya and Vogt (2007) and Wooldridge (2009), includes instrumental variables and variables that have greater influence on treatment selection than on the outcome. We offer a simple derivation and an intuitive explanation of this phenomenon and then extend the analysis to non linear models. We show that: 1. the bias-amplifying potential of instrumental variables extends over to non-linear models, though not as sweepingly as in linear models; 2. in non-linear models, conditioning on instrumental variables may introduce new bias where none existed before; 3. in both linear and non-linear models, instrumental variables have no effect on selection-induced bias. In many fields observations are performed irregularly along time, due to either measurement limitations or lack of a constant immanent rate. While discrete-time Markov models (as Dynamic Bayesian Networks) introduce either inefficient computation or an information loss to reasoning about such processes, continuous-time Markov models assume either a discrete state space (as Continuous-Time Bayesian Networks), or a flat continuous state space (as stochastic differential equations). To address these problems, we present a new modeling class called Irregular-Time Bayesian Networks (ITBNs), generalizing Dynamic Bayesian Networks, allowing substantially more compact representations, and increasing the expressivity of the temporal dynamics. In addition, a globally optimal solution is guaranteed when learning temporal systems, provided that they are fully observed at the same irregularly spaced time-points, and a semiparametric subclass of ITBNs is introduced to allow further adaptation to the irregular nature of the available data. Identifying effects of actions (treatments) on outcome variables from observational data and causal assumptions is a fundamental problem in causal inference. This identification is made difficult by the presence of confounders which can be related to both treatment and outcome variables. Confounders are often handled, both in theory and in practice, by adjusting for covariates, in other words considering outcomes conditioned on treatment and covariate values, weighed by probability of observing those covariate values. In this paper, we give a complete graphical criterion for covariate adjustment, which we term the adjustment criterion, and derive some interesting corollaries of the completeness of this criterion. Monte-Carlo Tree Search (MCTS) methods are drawing great interest after yielding breakthrough results in computer Go. This paper proposes a Bayesian approach to MCTS that is inspired by distributionfree approaches such as UCT [13], yet significantly differs in important respects. The Bayesian framework allows potentially much more accurate (Bayes-optimal) estimation of node values and node uncertainties from a limited number of simulation trials. We further propose propagating inference in the tree via fast analytic Gaussian approximation methods: this can make the overhead of Bayesian inference manageable in domains such as Go, while preserving high accuracy of expected-value estimates. We find substantial empirical outperformance of UCT in an idealized bandit-tree test environment, where we can obtain valuable insights by comparing with known ground truth. Additionally we rigorously prove on-policy and off-policy convergence of the proposed methods. In this paper, we present the Difference- Based Causality Learner (DBCL), an algorithm for learning a class of discrete-time dynamic models that represents all causation across time by means of difference equations driving change in a system. We motivate this representation with real-world mechanical systems and prove DBCL's correctness for learning structure from time series data, an endeavour that is complicated by the existence of latent derivatives that have to be detected. We also prove that, under common assumptions for causal discovery, DBCL will identify the presence or absence of feedback loops, making the model more useful for predicting the effects of manipulating variables when the system is in equilibrium. We argue analytically and show empirically the advantages of DBCL over vector autoregression (VAR) and Granger causality models as well as modified forms of Bayesian and constraintbased structure discovery algorithms. Finally, we show that our algorithm can discover causal directions of alpha rhythms in human brains from EEG data. It is known that fixed points of loopy belief propagation (BP) correspond to stationary points of the Bethe variational problem, where we minimize the Bethe free energy subject to normalization and marginalization constraints. Unfortunately, this does not entirely explain BP because BP is a dual rather than primal algorithm to solve the Bethe variational problem -- beliefs are infeasible before convergence. Thus, we have no better understanding of BP than as an algorithm to seek for a common zero of a system of non-linear functions, not explicitly related to each other. In this theoretical paper, we show that these functions are in fact explicitly related -- they are the partial derivatives of a single function of reparameterizations. That means, BP seeks for a stationary point of a single function, without any constraints. This function has a very natural form: it is a linear combination of local log-partition functions, exactly as the Bethe entropy is the same linear combination of local entropies. We present decentralized rollout sampling policy iteration (DecRSPI) - a new algorithm for multi-agent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte- Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that are intractable for existing planning algorithms. Experimental results confirm the effectiveness and scalability of the approach. Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed that the presented approach provides clear advantages over multi-annotator methods that do not use the unlabeled data and over methods that do not use multi-labeler information. Collaborative filtering is an effective recommendation approach in which the preference of a user on an item is predicted based on the preferences of other users with similar interests. A big challenge in using collaborative filtering methods is the data sparsity problem which often arises because each user typically only rates very few items and hence the rating matrix is extremely sparse. In this paper, we address this problem by considering multiple collaborative filtering tasks in different domains simultaneously and exploiting the relationships between domains. We refer to it as a multi-domain collaborative filtering (MCF) problem. To solve the MCF problem, we propose a probabilistic framework which uses probabilistic matrix factorization to model the rating problem in each domain and allows the knowledge to be adaptively transferred across different domains by automatically learning the correlation between domains. We also introduce the link function for different domains to correct their biases. Experiments conducted on several real-world applications demonstrate the effectiveness of our methods when compared with some representative methods. Multi-task learning is a learning paradigm which seeks to improve the generalization performance of a learning task with the help of some other related tasks. In this paper, we propose a regularization formulation for learning the relationships between tasks in multi-task learning. This formulation can be viewed as a novel generalization of the regularization framework for single-task learning. Besides modeling positive task correlation, our method, called multi-task relationship learning (MTRL), can also describe negative task correlation and identify outlier tasks based on the same underlying principle. Under this regularization framework, the objective function of MTRL is convex. For efficiency, we use an alternating method to learn the optimal model parameters for each task as well as the relationships between tasks. We study MTRL in the symmetric multi-task learning setting and then generalize it to the asymmetric setting as well. We also study the relationships between MTRL and some existing multi-task learning methods. Experiments conducted on a toy problem as well as several benchmark data sets demonstrate the effectiveness of MTRL. Despite the intractability of generic optimal partially observable Markov decision process planning, there exist important problems that have highly structured models. Previous researchers have used this insight to construct more efficient algorithms for factored domains, and for domains with topological structure in the flat state dynamics model. In our work, motivated by findings from the education community relevant to automated tutoring, we consider problems that exhibit a form of topological structure in the factored dynamics model. Our Reachable Anytime Planner for Imprecisely-sensed Domains (RAPID) leverages this structure to efficiently compute a good initial envelope of reachable states under the optimal MDP policy in time linear in the number of state variables. RAPID performs partially-observable planning over the limited envelope of states, and slowly expands the state space considered as time allows. RAPID performs well on a large tutoring-inspired problem simulation with 122 state variables, corresponding to a flat state space of over 10^30 states. UCT has recently emerged as an exciting new adversarial reasoning technique based on cleverly balancing exploration and exploitation in a Monte-Carlo sampling setting. It has been particularly successful in the game of Go but the reasons for its success are not well understood and attempts to replicate its success in other domains such as Chess have failed. We provide an in-depth analysis of the potential of UCT in domain-independent settings, in cases where heuristic values are available, and the effect of enhancing random playouts to more informed playouts between two weak minimax players. To provide further insights, we develop synthetic game tree instances and discuss interesting properties of UCT, both empirically and analytically. Successive elimination of candidates is often a route to making manipulation intractable to compute. We prove that eliminating candidates does not necessarily increase the computational complexity of manipulation. However, for many voting rules used in practice, the computational complexity increases. For example, it is already known that it is NP-hard to compute how a single voter can manipulate the result of single transferable voting (the elimination version of plurality voting). We show here that it is NP-hard to compute how a single voter can manipulate the result of the elimination version of veto voting, of the closely related Coombs' rule, and of the elimination versions of a general class of scoring rules. We identify the presence of typically quantum effects, namely 'superposition' and 'interference', in what happens when human concepts are combined, and provide a quantum model in complex Hilbert space that represents faithfully experimental data measuring the situation of combining concepts. Our model shows how 'interference of concepts' explains the effects of underextension and overextension when two concepts combine to the disjunction of these two concepts. This result supports our earlier hypothesis that human thought has a superposed two-layered structure, one layer consisting of 'classical logical thought' and a superposed layer consisting of 'quantum conceptual thought'. Possible connections with recent findings of a 'grid-structure' for the brain are analyzed, and influences on the mind/brain relation, and consequences on applied disciplines, such as artificial intelligence and quantum computation, are considered. We propose an analysis of the codified Law of France as a structured system. Fifty two legal codes are selected on the basis of explicit legal criteria and considered as vertices with their mutual quotations forming the edges in a network which properties are analyzed relying on graph theory. We find that a group of 10 codes are simultaneously the most citing and the most cited by other codes, and are also strongly connected together so forming a "rich club" sub-graph. Three other code communities are also found that somewhat partition the legal field is distinct thematic sub-domains. The legal interpretation of this partition is opening new untraditional lines of research. We also conjecture that many legal systems are forming such new kind of networks that share some properties in common with small worlds but are far denser. We propose to call "concentrated world". Most Relevant Explanation (MRE) is a method for finding multivariate explanations for given evidence in Bayesian networks [12]. This paper studies the theoretical properties of MRE and develops an algorithm for finding multiple top MRE solutions. Our study shows that MRE relies on an implicit soft relevance measure in automatically identifying the most relevant target variables and pruning less relevant variables from an explanation. The soft measure also enables MRE to capture the intuitive phenomenon of explaining away encoded in Bayesian networks. Furthermore, our study shows that the solution space of MRE has a special lattice structure which yields interesting dominance relations among the solutions. A K-MRE algorithm based on these dominance relations is developed for generating a set of top solutions that are more representative. Our empirical results show that MRE methods are promising approaches for explanation in Bayesian networks. This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together. This paper develops an inconsistency measure on conditional probabilistic knowledge bases. The measure is based on fundamental principles for inconsistency measures and thus provides a solid theoretical framework for the treatment of inconsistencies in probabilistic expert systems. We illustrate its usefulness and immediate application on several examples and present some formal results. Building on this measure we use the Shapley value-a well-known solution for coalition games-to define a sophisticated indicator that is not only able to measure inconsistencies but to reveal the causes of inconsistencies in the knowledge base. Altogether these tools guide the knowledge engineer in his aim to restore consistency and therefore enable him to build a consistent and usable knowledge base that can be employed in probabilistic expert systems. Many applications of causal analysis call for assessing, retrospectively, the effect of withholding an action that has in fact been implemented. This counterfactual quantity, sometimes called "effect of treatment on the treated," (ETT) have been used to to evaluate educational programs, critic public policies, and justify individual decision making. In this paper we explore the conditions under which ETT can be estimated from (i.e., identified in) experimental and/or observational studies. We show that, when the action invokes a singleton variable, the conditions for ETT identification have simple characterizations in terms of causal diagrams. We further give a graphical characterization of the conditions under which the effects of multiple treatments on the treated can be identified, as well as ways in which the ETT estimand can be constructed from both interventional and observational distributions. There has been a great deal of recent interest in methods for performing lifted inference; however, most of this work assumes that the first-order model is given as input to the system. Here, we describe lifted inference algorithms that determine symmetries and automatically lift the probabilistic model to speedup inference. In particular, we describe approximate lifted inference techniques that allow the user to trade off inference accuracy for computational efficiency by using a handful of tunable parameters, while keeping the error bounded. Our algorithms are closely related to the graph-theoretic concept of bisimulation. We report experiments on both synthetic and real data to show that in the presence of symmetries, run-times for inference can be improved significantly, with approximate lifted inference providing orders of magnitude speedup over ground inference. The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function. The fastest known exact algorithms for scorebased structure discovery in Bayesian networks on n nodes run in time and space 2nnO(1). The usage of these algorithms is limited to networks on at most around 25 nodes mainly due to the space requirement. Here, we study space-time tradeoffs for finding an optimal network structure. When little space is available, we apply the Gurevich-Shelah recurrence-originally proposed for the Hamiltonian path problem-and obtain time 22n-snO(1) in space 2snO(1) for any s = n/2, n/4, n/8, . . .; we assume the indegree of each node is bounded by a constant. For the more practical setting with moderate amounts of space, we present a novel scheme. It yields running time 2n(3/2)pnO(1) in space 2n(3/4)pnO(1) for any p = 0, 1, . . ., n/2; these bounds hold as long as the indegrees are at most 0.238n. Furthermore, the latter scheme allows easy and efficient parallelization beyond previous algorithms. We also explore empirically the potential of the presented techniques. Logical inference algorithms for conditional independence (CI) statements have important applications from testing consistency during knowledge elicitation to constraintbased structure learning of graphical models. We prove that the implication problem for CI statements is decidable, given that the size of the domains of the random variables is known and fixed. We will present an approximate logical inference algorithm which combines a falsification and a novel validation algorithm. The validation algorithm represents each set of CI statements as a sparse 0-1 matrix A and validates instances of the implication problem by solving specific linear programs with constraint matrix A. We will show experimentally that the algorithm is both effective and efficient in validating and falsifying instances of the probabilistic CI implication problem. The introduction of loopy belief propagation (LBP) revitalized the application of graphical models in many domains. Many recent works present improvements on the basic LBP algorithm in an attempt to overcome convergence and local optima problems. Notable among these are convexified free energy approximations that lead to inference procedures with provable convergence and quality properties. However, empirically LBP still outperforms most of its convex variants in a variety of settings, as we also demonstrate here. Motivated by this fact we seek convexified free energies that directly approximate the Bethe free energy. We show that the proposed approximations compare favorably with state-of-the art convex free energy approximations. Message-passing algorithms have emerged as powerful techniques for approximate inference in graphical models. When these algorithms converge, they can be shown to find local (or sometimes even global) optima of variational formulations to the inference problem. But many of the most popular algorithms are not guaranteed to converge. This has lead to recent interest in convergent message-passing algorithms. In this paper, we present a unified view of convergent message-passing algorithms. We present a simple derivation of an abstract algorithm, tree-consistency bound optimization (TCBO) that is provably convergent in both its sum and max product forms. We then show that many of the existing convergent algorithms are instances of our TCBO algorithm, and obtain novel convergent algorithms "for free" by exchanging maximizations and summations in existing algorithms. In particular, we show that Wainwright's non-convergent sum-product algorithm for tree based variational bounds, is actually convergent with the right update order for the case where trees are monotonic chains. We consider the task of obtaining the maximum a posteriori estimate of discrete pairwise random fields with arbitrary unary potentials and semimetric pairwise potentials. For this problem, we propose an accurate hierarchical move making strategy where each move is computed efficiently by solving an st-MINCUT problem. Unlike previous move making approaches, e.g. the widely used a-expansion algorithm, our method obtains the guarantees of the standard linear programming (LP) relaxation for the important special case of metric labeling. Unlike the existing LP relaxation solvers, e.g. interior-point algorithms or tree-reweighted message passing, our method is significantly faster as it uses only the efficient st-MINCUT algorithm in its design. Using both synthetic and real data experiments, we show that our technique outperforms several commonly used algorithms. Probabilistic programming languages and modeling toolkits are two modular ways to build and reuse stochastic models and inference procedures. Combining strengths of both, we express models and inference as generalized coroutines in the same general-purpose language. We use existing facilities of the language, such as rich libraries, optimizing compilers, and types, to develop concise, declarative, and realistic models with competitive performance on exact and approximate inference. In particular, a wide range of models can be expressed using memoization. Because deterministic parts of models run at full speed, custom inference procedures are trivial to incorporate, and inference procedures can reason about themselves without interpretive overhead. Within this framework, we introduce a new, general algorithm for importance sampling with look-ahead. A major benefit of graphical models is that most knowledge is captured in the model structure. Many models, however, produce inference problems with a lot of symmetries not reflected in the graphical structure and hence not exploitable by efficient inference techniques such as belief propagation (BP). In this paper, we present a new and simple BP algorithm, called counting BP, that exploits such additional symmetries. Starting from a given factor graph, counting BP first constructs a compressed factor graph of clusternodes and clusterfactors, corresponding to sets of nodes and factors that are indistinguishable given the evidence. Then it runs a modified BP algorithm on the compressed graph that is equivalent to running BP on the original factor graph. Our experiments show that counting BP is applicable to a variety of important AI tasks such as (dynamic) relational models and boolean model counting, and that significant efficiency gains are obtainable, often by orders of magnitude. In this paper we introduce temporal action graph games (TAGGs), a novel graphical representation of imperfect-information extensive form games. We show that when a game involves anonymity or context-specific utility independencies, its encoding as a TAGG can be much more compact than its direct encoding as a multiagent influence diagram (MAID).We also show that TAGGs can be understood as indirect MAID encodings in which many deterministic chance nodes are introduced. We provide an algorithm for computing with TAGGs, and show both theoretically and empirically that our approach improves significantly on the previous state of the art. Efficiently finding the maximum a posteriori (MAP) configuration of a graphical model is an important problem which is often implemented using message passing algorithms. The optimality of such algorithms is only well established for singly-connected graphs and other limited settings. This article extends the set of graphs where MAP estimation is in P and where message passing recovers the exact solution to so-called perfect graphs. This result leverages recent progress in defining perfect graphs (the strong perfect graph theorem), linear programming relaxations of MAP estimation and recent convergent message passing schemes. The article converts graphical models into nand Markov random fields which are straightforward to relax into linear programs. Therein, integrality can be established in general by testing for graph perfection. This perfection test is performed efficiently using a polynomial time algorithm. Alternatively, known decomposition tools from perfect graph theory may be used to prove perfection for certain families of graphs. Thus, a general graph framework is provided for determining when MAP estimation in any graphical model is in P, has an integral linear programming relaxation and is exactly recoverable by message passing. A Bayesian belief network models a joint distribution with an directed acyclic graph representing dependencies among variables and network parameters characterizing conditional distributions. The parameters are viewed as random variables to quantify uncertainty about their values. Belief nets are used to compute responses to queries; i.e., conditional probabilities of interest. A query is a function of the parameters, hence a random variable. Van Allen et al. (2001, 2008) showed how to quantify uncertainty about a query via a delta method approximation of its variance. We develop more accurate approximations for both query mean and variance. The key idea is to extend the query mean approximation to a "doubled network" involving two independent replicates. Our method assumes complete data and can be applied to discrete, continuous, and hybrid networks (provided discrete variables have only discrete parents). We analyze several improvements, and provide empirical studies to demonstrate their effectiveness. Mixed integer linear programming (MILP) is a powerful representation often used to formulate decision-making problems under uncertainty. However, it lacks a natural mechanism to reason about objects, classes of objects, and relations. First-order logic (FOL), on the other hand, excels at reasoning about classes of objects, but lacks a rich representation of uncertainty. While representing propositional logic in MILP has been extensively explored, no theory exists yet for fully combining FOL with MILP. We propose a new representation, called first-order programming or FOP, which subsumes both FOL and MILP. We establish formal methods for reasoning about first order programs, including a sound and complete lifted inference procedure for integer first order programs. Since FOP can offer exponential savings in representation and proof size compared to FOL, and since representations and proofs are never significantly longer in FOP than in FOL, we anticipate that inference in FOP will be more tractable than inference in FOL for corresponding problems. As computer clusters become more common and the size of the problems encountered in the field of AI grows, there is an increasing demand for efficient parallel inference algorithms. We consider the problem of parallel inference on large factor graphs in the distributed memory setting of computer clusters. We develop a new efficient parallel inference algorithm, DBRSplash, which incorporates over-segmented graph partitioning, belief residual scheduling, and uniform work Splash operations. We empirically evaluate the DBRSplash algorithm on a 120 processor cluster and demonstrate linear to super-linear performance gains on large factor graph models. Generating optimal plans in highly dynamic environments is challenging. Plans are predicated on an assumed initial state, but this state can change unexpectedly during plan generation, potentially invalidating the planning effort. In this paper we make three contributions: (1) We propose a novel algorithm for generating optimal plans in settings where frequent, unexpected events interfere with planning. It is able to quickly distinguish relevant from irrelevant state changes, and to update the existing planning search tree if necessary. (2) We argue for a new criterion for evaluating plan adaptation techniques: the relative running time compared to the "size" of changes. This is significant since during recovery more changes may occur that need to be recovered from subsequently, and in order for this process of repeated recovery to terminate, recovery time has to converge. (3) We show empirically that our approach can converge and find optimal plans in environments that would ordinarily defy planning due to their high dynamics. Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation, inference in such models is intractable even in relatively simple structured networks. Here we introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. We provide the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale realworld inference problem. We present a new method to propagate lower bounds on conditional probability distributions in conventional Bayesian networks. Our method guarantees to provide outer approximations of the exact lower bounds. A key advantage is that we can use any available algorithms and tools for Bayesian networks in order to represent and infer lower bounds. This new method yields results that are provable exact for trees with binary variables, and results which are competitive to existing approximations in credal networks for all other network structures. Our method is not limited to a specific kind of network structure. Basically, it is also not restricted to a specific kind of inference, but we restrict our analysis to prognostic inference in this article. The computational complexity is superior to that of other existing approaches. The approach described here allows using membership function to represent imprecise and uncertain knowledge by learning in Fuzzy Semantic Networks. This representation has a great practical interest due to the possibility to realize on the one hand, the construction of this membership function from a simple value expressing the degree of interpretation of an Object or a Goal as compared to an other and on the other hand, the adjustment of the membership function during the apprenticeship. We show, how to use these membership functions to represent the interpretation of an Object (respectively of a Goal) user as compared to an system Object (respectively to a Goal). We also show the possibility to make decision for each representation of an user Object compared to a system Object. This decision is taken by determining decision coefficient calculates according to the nucleus of the membership function of the user Object. In this paper, we consider planning in stochastic shortest path (SSP) problems, a subclass of Markov Decision Problems (MDP). We focus on medium-size problems whose state space can be fully enumerated. This problem has numerous important applications, such as navigation and planning under uncertainty. We propose a new approach for constructing a multi-level hierarchy of progressively simpler abstractions of the original problem. Once computed, the hierarchy can be used to speed up planning by first finding a policy for the most abstract level and then recursively refining it into a solution to the original problem. This approach is fully automated and delivers a speed-up of two orders of magnitude over a state-of-the-art MDP solver on sample problems while returning near-optimal solutions. We also prove theoretical bounds on the loss of solution optimality resulting from the use of abstractions. Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive inference is to take advantage of what is preserved in the model and perform inference more rapidly than from scratch. In this paper, we describe techniques for adaptive inference on general graphs that support marginal computation and updates to the conditional probabilities and dependencies in logarithmic time. We give experimental results for an implementation of our algorithm, and demonstrate its potential performance benefit in the study of protein structure. Assume that cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding linear structural equation model.We consider the identification problem of total effects in the presence of latent variables and selection bias between a treatment variable and a response variable. Pearl and his colleagues provided the back door criterion, the front door criterion (Pearl, 2000) and the conditional instrumental variable method (Brito and Pearl, 2002) as identifiability criteria for total effects in the presence of latent variables, but not in the presence of selection bias. In order to solve this problem, we propose new graphical identifiability criteria for total effects based on the identifiable factor models. The results of this paper are useful to identify total effects in observational studies and provide a new viewpoint to the identification conditions of factor models. The problem of learning discrete Bayesian networks from data is encoded as a weighted MAX-SAT problem and the MaxWalkSat local search algorithm is used to address it. For each dataset, the per-variable summands of the (BDeu) marginal likelihood for different choices of parents ('family scores') are computed prior to applying MaxWalkSat. Each permissible choice of parents for each variable is encoded as a distinct propositional atom and the associated family score encoded as a 'soft' weighted single-literal clause. Two approaches to enforcing acyclicity are considered: either by encoding the ancestor relation or by attaching a total order to each graph and encoding that. The latter approach gives better results. Learning experiments have been conducted on 21 synthetic datasets sampled from 7 BNs. The largest dataset has 10,000 datapoints and 60 variables producing (for the 'ancestor' encoding) a weighted CNF input file with 19,932 atoms and 269,367 clauses. For most datasets, MaxWalkSat quickly finds BNs with higher BDeu score than the 'true' BN. The effect of adding prior information is assessed. It is further shown that Bayesian model averaging can be effected by collecting BNs generated during the search. This paper describes a new algorithm to solve the decision making problem in Influence Diagrams based on algorithms for credal networks. Decision nodes are associated to imprecise probability distributions and a reformulation is introduced that finds the global maximum strategy with respect to the expected utility. We work with Limited Memory Influence Diagrams, which generalize most Influence Diagram proposals and handle simultaneous decisions. Besides the global optimum method, we explore an anytime approximate solution with a guaranteed maximum error and show that imprecise probabilities are handled in a straightforward way. Complexity issues and experiments with random diagrams and an effects-based military planning problem are discussed. A graphical multiagent model (GMM) represents a joint distribution over the behavior of a set of agents. One source of knowledge about agents' behavior may come from gametheoretic analysis, as captured by several graphical game representations developed in recent years. GMMs generalize this approach to express arbitrary distributions, based on game descriptions or other sources of knowledge bearing on beliefs about agent behavior. To illustrate the flexibility of GMMs, we exhibit game-derived models that allow probabilistic deviation from equilibrium, as well as models based on heuristic action choice. We investigate three different methods of integrating these models into a single model representing the combined knowledge sources. To evaluate the predictive performance of the combined model, we treat as actual outcome the behavior produced by a reinforcement learning process. We find that combining the two knowledge sources, using any of the methods, provides better predictions than either source alone. Among the combination methods, mixing data outperforms the opinion pool and direct update methods investigated in this empirical trial. We conjecture that the worst case number of experiments necessary and sufficient to discover a causal graph uniquely given its observational Markov equivalence class can be specified as a function of the largest clique in the Markov equivalence class. We provide an algorithm that computes intervention sets that we believe are optimal for the above task. The algorithm builds on insights gained from the worst case analysis in Eberhardt et al. (2005) for sequences of experiments when all possible directed acyclic graphs over N variables are considered. A simulation suggests that our conjecture is correct. We also show that a generalization of our conjecture to other classes of possible graph hypotheses cannot be given easily, and in what sense the algorithm is then no longer optimal. A central task in many applications is reasoning about processes that change over continuous time. Continuous-Time Bayesian Networks is a general compact representation language for multi-component continuous-time processes. However, exact inference in such processes is exponential in the number of components, and thus infeasible for most models of interest. Here we develop a novel Gibbs sampling procedure for multi-component processes. This procedure iteratively samples a trajectory for one of the components given the remaining ones. We show how to perform exact sampling that adapts to the natural time scale of the sampled process. Moreover, we show that this sampling procedure naturally exploits the structure of the network to reduce the computational cost of each step. This procedure is the first that can provide asymptotically unbiased approximation in such processes. In addressing the challenge of exponential scaling with the number of agents we adopt a cluster-based representation to approximately solve asymmetric games of very many players. A cluster groups together agents with a similar "strategic view" of the game. We learn the clustered approximation from data consisting of strategy profiles and payoffs, which may be obtained from observations of play or access to a simulator. Using our clustering we construct a reduced "twins" game in which each cluster is associated with two players of the reduced game. This allows our representation to be individually- responsive because we align the interests of every individual agent with the strategy of its cluster. Our approach provides agents with higher payoffs and lower regret on average than model-free methods as well as previous cluster-based methods, and requires only few observations for learning to be successful. The "twins" approach is shown to be an important component of providing these low regret approximations. An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is a well-studied problem. However, existing methods have significant limitations. Methods based on conditional independencies (Spirtes et al. 1993; Pearl 2000) cannot distinguish between independence-equivalent models, whereas approaches purely based on Independent Component Analysis (Shimizu et al. 2006) are inapplicable to data which is partially Gaussian. In this paper, we generalize and combine the two approaches, to yield a method able to learn the model structure in many cases for which the previous methods provide answers that are either incorrect or are not as informative as possible. We give exact graphical conditions for when two distinct models represent the same family of distributions, and empirically demonstrate the power of our method through thorough simulations. We study a multiagent learning problem where agents can either learn via repeated interactions, or can follow the advice of a mediator who suggests possible actions to take. We present an algorithmthat each agent can use so that, with high probability, they can verify whether or not the mediator's advice is useful. In particular, if the mediator's advice is useful then agents will reach a correlated equilibrium, but if the mediator's advice is not useful, then agents are not harmed by using our test, and can fall back to their original learning algorithm. We then generalize our algorithm and show that in the limit it always correctly verifies the mediator's advice. Bounded policy iteration is an approach to solving infinite-horizon POMDPs that represents policies as stochastic finite-state controllers and iteratively improves a controller by adjusting the parameters of each node using linear programming. In the original algorithm, the size of the linear programs, and thus the complexity of policy improvement, depends on the number of parameters of each node, which grows with the size of the controller. But in practice, the number of parameters of a node with non-zero values is often very small, and does not grow with the size of the controller. Based on this observation, we develop a version of bounded policy iteration that leverages the sparse structure of a stochastic finite-state controller. In each iteration, it improves a policy by the same amount as the original algorithm, but with much better scalability. While known algorithms for sensitivity analysis and parameter tuning in probabilistic networks have a running time that is exponential in the size of the network, the exact computational complexity of these problems has not been established as yet. In this paper we study several variants of the tuning problem and show that these problems are NPPP-complete in general. We further show that the problems remain NP-complete or PP-complete, for a number of restricted variants. These complexity results provide insight in whether or not recent achievements in sensitivity analysis and tuning can be extended to more general, practicable methods. Approximate linear programming (ALP) is an efficient approach to solving large factored Markov decision processes (MDPs). The main idea of the method is to approximate the optimal value function by a set of basis functions and optimize their weights by linear programming (LP). This paper proposes a new ALP approximation. Comparing to the standard ALP formulation, we decompose the constraint space into a set of low-dimensional spaces. This structure allows for solving the new LP efficiently. In particular, the constraints of the LP can be satisfied in a compact form without an exponential dependence on the treewidth of ALP constraints. We study both practical and theoretical aspects of the proposed approach. Moreover, we demonstrate its scale-up potential on an MDP with more than 2^100 states. This paper deals with the problem of evaluating the causal effect using observational data in the presence of an unobserved exposure/ outcome variable, when cause-effect relationships between variables can be described as a directed acyclic graph and the corresponding recursive factorization of a joint distribution. First, we propose identifiability criteria for causal effects when an unobserved exposure/outcome variable is considered to contain more than two categories. Next, when unmeasured variables exist between an unobserved outcome variable and its proxy variables, we provide the tightest bounds based on the potential outcome approach. The results of this paper are helpful to evaluate causal effects in the case where it is difficult or expensive to observe an exposure/ outcome variable in many practical fields. Graphical models are usually learned without regard to the cost of doing inference with them. As a result, even if a good model is learned, it may perform poorly at prediction, because it requires approximate inference. We propose an alternative: learning models with a score function that directly penalizes the cost of inference. Specifically, we learn arithmetic circuits with a penalty on the number of edges in the circuit (in which the cost of inference is linear). Our algorithm is equivalent to learning a Bayesian network with context-specific independence by greedily splitting conditional distributions, at each step scoring the candidates by compiling the resulting network into an arithmetic circuit, and using its size as the penalty. We show how this can be done efficiently, without compiling a circuit from scratch for each candidate. Experiments on several real-world domains show that our algorithm is able to learn tractable models with very large treewidth, and yields more accurate predictions than a standard context-specific Bayesian network learner, in far less time. We generalize Shimizu et al's (2006) ICA-based approach for discovering linear non-Gaussian acyclic (LiNGAM) Structural Equation Models (SEMs) from causally sufficient, continuous-valued observational data. By relaxing the assumption that the generating SEM's graph is acyclic, we solve the more general problem of linear non-Gaussian (LiNG) SEM discovery. LiNG discovery algorithms output the distribution equivalence class of SEMs which, in the large sample limit, represents the population distribution. We apply a LiNG discovery algorithm to simulated data. Finally, we give sufficient conditions under which only one of the SEMs in the output class is 'stable'. We present a generative model for representing and reasoning about the relationships among events in continuous time. We apply the model to the domain of networked and distributed computing environments where we fit the parameters of the model from timestamp observations, and then use hypothesis testing to discover dependencies between the events and changes in behavior for monitoring and diagnosis. After introducing the model, we present an EM algorithm for fitting the parameters and then present the hypothesis testing approach for both dependence discovery and change-point detection. We validate the approach for both tasks using real data from a trace of network events at Microsoft Research Cambridge. Finally, we formalize the relationship between the proposed model and the noisy-or gate for cases when time can be discretized. In this work we present Cutting Plane Inference (CPI), a Maximum A Posteriori (MAP) inference method for Statistical Relational Learning. Framed in terms of Markov Logic and inspired by the Cutting Plane Method, it can be seen as a meta algorithm that instantiates small parts of a large and complex Markov Network and then solves these using a conventional MAP method. We evaluate CPI on two tasks, Semantic Role Labelling and Joint Entity Resolution, while plugging in two different MAP inference methods: the current method of choice for MAP inference in Markov Logic, MaxWalkSAT, and Integer Linear Programming. We observe that when used with CPI both methods are significantly faster than when used alone. In addition, CPI improves the accuracy of MaxWalkSAT and maintains the exactness of Integer Linear Programming. Deciding what to sense is a crucial task, made harder by dependencies and by a nonadditive utility function. We develop approximation algorithms for selecting an optimal set of measurements, under a dependency structure modeled by a tree-shaped Bayesian network (BN). Our approach is a generalization of composing anytime algorithm represented by conditional performance profiles. This is done by relaxing the input monotonicity assumption, and extending the local compilation technique to more general classes of performance profiles (PPs). We apply the extended scheme to selecting a subset of measurements for choosing a maximum expectation variable in a binary valued BN, and for minimizing the worst variance in a Gaussian BN. We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dynastyle planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems. Linear Programming (LP) relaxations have become powerful tools for finding the most probable (MAP) configuration in graphical models. These relaxations can be solved efficiently using message-passing algorithms such as belief propagation and, when the relaxation is tight, provably find the MAP configuration. The standard LP relaxation is not tight enough in many real-world problems, however, and this has lead to the use of higher order cluster-based LP relaxations. The computational cost increases exponentially with the size of the clusters and limits the number and type of clusters we can use. We propose to solve the cluster selection problem monotonically in the dual LP, iteratively selecting clusters with guaranteed improvement, and quickly re-solving with the added clusters by reusing the existing solution. Our dual message-passing algorithm finds the MAP configuration in protein sidechain placement, protein design, and stereo problems, in cases where the standard LP relaxation fails. Numerous temporal inference tasks such as fault monitoring and anomaly detection exhibit a persistence property: for example, if something breaks, it stays broken until an intervention. When modeled as a Dynamic Bayesian Network, persistence adds dependencies between adjacent time slices, often making exact inference over time intractable using standard inference algorithms. However, we show that persistence implies a regular structure that can be exploited for efficient inference. We present three successively more general classes of models: persistent causal chains (PCCs), persistent causal trees (PCTs) and persistent polytrees (PPTs), and the corresponding exact inference algorithms that exploit persistence. We show that analytic asymptotic bounds for our algorithms compare favorably to junction tree inference; and we demonstrate empirically that we can perform exact smoothing on the order of 100 times faster than the approximate Boyen-Koller method on randomly generated instances of persistent tree models. We also show how to handle non-persistent variables and how persistence can be exploited effectively for approximate filtering. Planning can often be simpli ed by decomposing the task into smaller tasks arranged hierarchically. Charlin et al. [4] recently showed that the hierarchy discovery problem can be framed as a non-convex optimization problem. However, the inherent computational di culty of solving such an optimization problem makes it hard to scale to realworld problems. In another line of research, Toussaint et al. [18] developed a method to solve planning problems by maximumlikelihood estimation. In this paper, we show how the hierarchy discovery problem in partially observable domains can be tackled using a similar maximum likelihood approach. Our technique rst transforms the problem into a dynamic Bayesian network through which a hierarchical structure can naturally be discovered while optimizing the policy. Experimental results demonstrate that this approach scales better than previous techniques based on non-convex optimization. A Chain Event Graph (CEG) is a graphial model which designed to embody conditional independencies in problems whose state spaces are highly asymmetric and do not admit a natural product structure. In this paer we present a probability propagation algorithm which uses the topology of the CEG to build a transporter CEG. Intriungly,the transporter CEG is directly analogous to the triangulated Bayesian Network (BN) in the more conventional junction tree propagation algorithms used with BNs. The propagation method uses factorization formulae also analogous to (but different from) the ones using potentials on cliques and separators of the BN. It appears that the methods will be typically more efficient than the BN algorithms when applied to contexts where there is significant asymmetry present. Decision circuits have been developed to perform efficient evaluation of influence diagrams [Bhattacharjya and Shachter, 2007], building on the advances in arithmetic circuits for belief network inference [Darwiche,2003]. In the process of model building and analysis, we perform sensitivity analysis to understand how the optimal solution changes in response to changes in the model. When sequential decision problems under uncertainty are represented as decision circuits, we can exploit the efficient solution process embodied in the decision circuit and the wealth of derivative information available to compute the value of information for the uncertainties in the problem and the effects of changes to model parameters on the value and the optimal strategy. Computing the probability of evidence even with known error bounds is NP-hard. In this paper we address this hard problem by settling on an easier problem. We propose an approximation which provides high confidence lower bounds on probability of evidence but does not have any guarantees in terms of relative or absolute error. Our proposed approximation is a randomized importance sampling scheme that uses the Markov inequality. However, a straight-forward application of the Markov inequality may lead to poor lower bounds. We therefore propose several heuristic measures to improve its performance in practice. Empirical evaluation of our scheme with state-of- the-art lower bounding schemes reveals the promise of our approach. Choquet expected utility (CEU) is one of the most sophisticated decision criteria used in decision theory under uncertainty. It provides a generalisation of expected utility enhancing both descriptive and prescriptive possibilities. In this paper, we investigate the use of CEU for path-planning under uncertainty with a special focus on robust solutions. We first recall the main features of the CEU model and introduce some examples showing its descriptive potential. Then we focus on the search for Choquet-optimal paths in multivalued implicit graphs where costs depend on different scenarios. After discussing complexity issues, we propose two different heuristic search algorithms to solve the problem. Finally, numerical experiments are reported, showing the practical efficiency of the proposed algorithms. We propose a new method for parameter learning in Bayesian networks with qualitative influences. This method extends our previous work from networks of binary variables to networks of discrete variables with ordered values. The specified qualitative influences correspond to certain order restrictions on the parameters in the network. These parameters may therefore be estimated using constrained maximum likelihood estimation. We propose an alternative method, based on the isotonic regression. The constrained maximum likelihood estimates are fairly complicated to compute, whereas computation of the isotonic regression estimates only requires the repeated application of the Pool Adjacent Violators algorithm for linear orders. Therefore, the isotonic regression estimator is to be preferred from the viewpoint of computational complexity. Through experiments on simulated and real data, we show that the new learning method is competitive in performance to the constrained maximum likelihood estimator, and that both estimators improve on the standard estimator. The ways in which an agent's actions affect the world can often be modeled compactly using a set of relational probabilistic planning rules. This paper addresses the problem of learning such rule sets for multiple related tasks. We take a hierarchical Bayesian approach, in which the system learns a prior distribution over rule sets. We present a class of prior distributions parameterized by a rule set prototype that is stochastically modified to produce a task-specific rule set. We also describe a coordinate ascent algorithm that iteratively optimizes the task-specific rule sets and the prior distribution. Experiments using this algorithm show that transferring information from related tasks significantly reduces the amount of training data required to predict action effects in blocks-world domains. Remote sensors are becoming the standard for observing and recording ecological data in the field. Such sensors can record data at fine temporal resolutions, and they can operate under extreme conditions prohibitive to human access. Unfortunately, sensor data streams exhibit many kinds of errors ranging from corrupt communications to partial or total sensor failures. This means that the raw data stream must be cleaned before it can be used by domain scientists. In our application environment|the H.J. Andrews Experimental Forest|this data cleaning is performed manually. This paper introduces a Dynamic Bayesian Network model for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution. The model combines an accurate distribution of long-term and short-term temperature variations with a single generalized fault model. Experiments with historical data show that the precision and recall of the method is comparable to that of the domain expert. The system is currently being deployed to perform real-time automated data cleaning. We formulate in this paper the mini-bucket algorithm for approximate inference in terms of exact inference on an approximate model produced by splitting nodes in a Bayesian network. The new formulation leads to a number of theoretical and practical implications. First, we show that branchand- bound search algorithms that use minibucket bounds may operate in a drastically reduced search space. Second, we show that the proposed formulation inspires new minibucket heuristics and allows us to analyze existing heuristics from a new perspective. Finally, we show that this new formulation allows mini-bucket approximations to benefit from recent advances in exact inference, allowing one to significantly increase the reach of these approximations. In this paper we introduce a new network reachability problem where the goal is to find the most reliable path between two nodes in a network, represented as a directed acyclic graph. Individual edges within this network may fail according to certain probabilities, and these failure probabilities may depend on the values of one or more hidden variables. This problem may be viewed as a generalization of shortest-path problems for finding minimum cost paths or Viterbi-type problems for finding highest-probability sequences of states, where the addition of the hidden variables introduces correlations that are not handled by previous algorithms. We give theoretical results characterizing this problem including an NP-hardness proof. We also give an exact algorithm and a more efficient approximation algorithm for this problem. Although a number of related algorithms have been developed to evaluate influence diagrams, exploiting the conditional independence in the diagram, the exact solution has remained intractable for many important problems. In this paper we introduce decision circuits as a means to exploit the local structure usually found in decision problems and to improve the performance of influence diagram analysis. This work builds on the probabilistic inference algorithms using arithmetic circuits to represent Bayesian belief networks [Darwiche, 2003]. Once compiled, these arithmetic circuits efficiently evaluate probabilistic queries on the belief network, and methods have been developed to exploit both the global and local structure of the network. We show that decision circuits can be constructed in a similar fashion and promise similar benefits. We present a memory-bounded optimization approach for solving infinite-horizon decentralized POMDPs. Policies for each agent are represented by stochastic finite state controllers. We formulate the problem of optimizing these policies as a nonlinear program, leveraging powerful existing nonlinear optimization techniques for solving the problem. While existing solvers only guarantee locally optimal solutions, we show that our formulation produces higher quality controllers than the state-of-the-art approach. We also incorporate a shared source of randomness in the form of a correlation device to further increase solution quality with only a limited increase in space and time. Our experimental results show that nonlinear optimization can be used to provide high quality, concise solutions to decentralized decision problems under uncertainty. We present the mixture-of-parents maximum entropy Markov model (MoP-MEMM), a class of directed graphical models extending MEMMs. The MoP-MEMM allows tractable incorporation of long-range dependencies between nodes by restricting the conditional distribution of each node to be a mixture of distributions given the parents. We show how to efficiently compute the exact marginal posterior node distributions, regardless of the range of the dependencies. This enables us to model non-sequential correlations present within text documents, as well as between interconnected documents, such as hyperlinked web pages. We apply the MoP-MEMM to a named entity recognition task and a web page classification task. In each, our model shows significant improvement over the basic MEMM, and is competitive with other long-range sequence models that use approximate inference. We analyze the generalized Mallows model, a popular exponential model over rankings. Estimating the central (or consensus) ranking from data is NP-hard. We obtain the following new results: (1) We show that search methods can estimate both the central ranking pi0 and the model parameters theta exactly. The search is n! in the worst case, but is tractable when the true distribution is concentrated around its mode; (2) We show that the generalized Mallows model is jointly exponential in (pi0; theta), and introduce the conjugate prior for this model class; (3) The sufficient statistics are the pairwise marginal probabilities that item i is preferred to item j. Preliminary experiments confirm the theoretical predictions and compare the new algorithm and existing heuristics. Compiling graphical models has recently been under intense investigation, especially for probabilistic modeling and processing. We present here a novel data structure for compiling weighted graphical models (in particular, probabilistic models), called AND/OR Multi-Valued Decision Diagram (AOMDD). This is a generalization of our previous work on constraint networks, to weighted models. The AOMDD is based on the frameworks of AND/OR search spaces for graphical models, and Ordered Binary Decision Diagrams (OBDD). The AOMDD is a canonical representation of a graphical model, and its size and compilation time are bounded exponentially by the treewidth of the graph, rather than pathwidth as is known for OBDDs. We discuss a Variable Elimination schedule for compilation, and present the general APPLY algorithm that combines two weighted AOMDDs, and also present a search based method for compilation method. The preliminary experimental evaluation is quite encouraging, showing the potential of the AOMDD data structure. The paper evaluates the power of best-first search over AND/OR search spaces for solving the Most Probable Explanation (MPE) task in Bayesian networks. The main virtue of the AND/OR representation of the search space is its sensitivity to the structure of the problem, which can translate into significant time savings. In recent years depth-first AND/OR Branch-and- Bound algorithms were shown to be very effective when exploring such search spaces, especially when using caching. Since best-first strategies are known to be superior to depth-first when memory is utilized, exploring the best-first control strategy is called for. The main contribution of this paper is in showing that a recent extension of AND/OR search algorithms from depth-first Branch-and-Bound to best-first is indeed very effective for computing the MPE in Bayesian networks. We demonstrate empirically the superiority of the best-first search approach on various probabilistic networks. Searching the complete space of possible Bayesian networks is intractable for problems of interesting size, so Bayesian network structure learning algorithms, such as the commonly used Sparse Candidate algorithm, employ heuristics. However, these heuristics also restrict the types of relationships that can be learned exclusively from data. They are unable to learn relationships that exhibit "correlation-immunity", such as parity. To learn Bayesian networks in the presence of correlation-immune relationships, we extend the Sparse Candidate algorithm with a technique called "skewing". This technique uses the observation that relationships that are correlation-immune under a specific input distribution may not be correlation-immune under another, sufficiently different distribution. We show that by extending Sparse Candidate with this technique we are able to discover relationships between random variables that are approximately correlation-immune, with a significantly lower computational cost than the alternative of considering multiple parents of a node at a time. When observational data is available from practical studies and a directed cyclic graph for how various variables affect each other is known based on substantive understanding of the process, we consider a problem in which a control plan of a treatment variable is conducted in order to bring a response variable close to a target value with variation reduction. We formulate an optimal control plan concerning a certain treatment variable through path coefficients in the framework of linear nonrecursive structural equation models. Based on the formulation, we clarify the properties of causal effects when conducting a control plan. The results enable us to evaluate the effect of a control plan on the variance from observational data. Survey propagation (SP) is an exciting new technique that has been remarkably successful at solving very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. In a promising attempt at understanding the success of SP, it was recently shown that SP can be viewed as a form of belief propagation, computing marginal probabilities over certain objects called covers of a formula. This explanation was, however, shortly dismissed by experiments suggesting that non-trivial covers simply do not exist for large formulas. In this paper, we show that these experiments were misleading: not only do covers exist for large hard random formulas, SP is surprisingly accurate at computing marginals over these covers despite the existence of many cycles in the formulas. This re-opens a potentially simpler line of reasoning for understanding SP, in contrast to some alternative lines of explanation that have been proposed assuming covers do not exist. Ranking objects is a simple and natural procedure for organizing data. It is often performed by assigning a quality score to each object according to its relevance to the problem at hand. Ranking is widely used for object selection, when resources are limited and it is necessary to select a subset of most relevant objects for further processing. In real world situations, the object's scores are often calculated from noisy measurements, casting doubt on the ranking reliability. We introduce an analytical method for assessing the influence of noise levels on the ranking reliability. We use two similarity measures for reliability evaluation, Top-K-List overlap and Kendall's tau measure, and show that the former is much more sensitive to noise than the latter. We apply our method to gene selection in a series of microarray experiments of several cancer types. The results indicate that the reliability of the lists obtained from these experiments is very poor, and that experiment sizes which are necessary for attaining reasonably stable Top-K-Lists are much larger than those currently available. Simulations support our analytical results. Preferences play an important role in our everyday lives. CP-networks, or CP-nets in short, are graphical models for representing conditional qualitative preferences under ceteris paribus ("all else being equal") assumptions. Despite their intuitive nature and rich representation, dominance testing with CP-nets is computationally complex, even when the CP-nets are restricted to binary-valued preferences. Tractable algorithms exist for binary CP-nets, but these algorithms are incomplete for multi-valued CPnets. In this paper, we identify a class of multivalued CP-nets, which we call more-or-less CPnets, that have the same computational complexity as binary CP-nets. More-or-less CP-nets exploit the monotonicity of the attribute values and use intervals to aggregate values that induce similar preferences. We then present a search control rule for dominance testing that effectively prunes the search space while preserving completeness. Computing the exact likelihood of data in large Bayesian networks consisting of thousands of vertices is often a difficult task. When these models contain many deterministic conditional probability tables and when the observed values are extremely unlikely even alternative algorithms such as variational methods and stochastic sampling often perform poorly. We present a new importance sampling algorithm for Bayesian networks which is based on variational techniques. We use the updates of the importance function to predict whether the stochastic sampling converged above or below the true likelihood, and change the proposal distribution accordingly. The validity of the method and its contribution to convergence is demonstrated on hard networks of large genetic linkage analysis tasks. Relational Markov Decision Processes are a useful abstraction for complex reinforcement learning problems and stochastic planning problems. Recent work developed representation schemes and algorithms for planning in such problems using the value iteration algorithm. However, exact versions of more complex algorithms, including policy iteration, have not been developed or analyzed. The paper investigates this potential and makes several contributions. First we observe two anomalies for relational representations showing that the value of some policies is not well defined or cannot be calculated for restricted representation schemes used in the literature. On the other hand, we develop a variant of policy iteration that can get around these anomalies. The algorithm includes an aspect of policy improvement in the process of policy evaluation and thus differs from the original algorithm. We show that despite this difference the algorithm converges to the optimal policy. We present a functional framework for automated mechanism design based on a two-stage game model of strategic interaction between the designer and the mechanism participants, and apply it to several classes of two-player infinite games of incomplete information. At the core of our framework is a black-box optimization algorithm which guides the selection process of candidate mechanisms. Our approach yields optimal or nearly optimal mechanisms in several application domains using various objective functions. By comparing our results with known optimal mechanisms, and in some cases improving on the best known mechanisms, we provide evidence that ours is a promising approach to parametric design of indirect mechanisms. Belief propagation and its variants are popular methods for approximate inference, but their running time and even their convergence depend greatly on the schedule used to send the messages. Recently, dynamic update schedules have been shown to converge much faster on hard networks than static schedules, namely the residual BP schedule of Elidan et al. [2006]. But that RBP algorithm wastes message updates: many messages are computed solely to determine their priority, and are never actually performed. In this paper, we show that estimating the residual, rather than calculating it directly, leads to significant decreases in the number of messages required for convergence, and in the total running time. The residual is estimated using an upper bound based on recent work on message errors in BP. On both synthetic and real-world networks, this dramatically decreases the running time of BP, in some cases by a factor of five, without affecting the quality of the solution. Combining first-order logic and probability has long been a goal of AI. Markov logic (Richardson & Domingos, 2006) accomplishes this by attaching weights to first-order formulas and viewing them as templates for features of Markov networks. Unfortunately, it does not have the full power of first-order logic, because it is only defined for finite domains. This paper extends Markov logic to infinite domains, by casting it in the framework of Gibbs measures (Georgii, 1988). We show that a Markov logic network (MLN) admits a Gibbs measure as long as each ground atom has a finite number of neighbors. Many interesting cases fall in this category. We also show that an MLN admits a unique measure if the weights of its non-unit clauses are small enough. We then examine the structure of the set of consistent measures in the non-unique case. Many important phenomena, including systems with phase transitions, are represented by MLNs with non-unique measures. We relate the problem of satisfiability in first-order logic to the properties of MLN measures, and discuss how Markov logic relates to previous infinite models. Multi-fidelity methods combine inexpensive low-fidelity simulations with costly but high-fidelity simulations to produce an accurate model of a system of interest at minimal cost. They have proven useful in modeling physical systems and have been applied to engineering problems such as wing-design optimization. During human-in-the-loop experimentation, it has become increasingly common to use online platforms, like Mechanical Turk, to run low-fidelity experiments to gather human performance data in an efficient manner. One concern with these experiments is that the results obtained from the online environment generalize poorly to the actual domain of interest. To address this limitation, we extend traditional multi-fidelity approaches to allow us to combine fewer data points from high-fidelity human-in-the-loop experiments with plentiful but less accurate data from low-fidelity experiments to produce accurate models of how humans interact. We present both model-based and model-free methods, and summarize the predictive performance of each method under different conditions. Predicting the outcomes of future events is a challenging problem for which a variety of solution methods have been explored and attempted. We present an empirical comparison of a variety of online and offline adaptive algorithms for aggregating experts' predictions of the outcomes of five years of US National Football League games (1319 games) using expert probability elicitations obtained from an Internet contest called ProbabilitySports. We find that it is difficult to improve over simple averaging of the predictions in terms of prediction accuracy, but that there is room for improvement in quadratic loss. Somewhat surprisingly, a Bayesian estimation algorithm which estimates the variance of each expert's prediction exhibits the most consistent superior performance over simple averaging among our collection of algorithms. We describe an expert system, MAIES, developed for analysing forensic identification problems involving DNA mixture traces using quantitative peak area information. Peak area information is represented by conditional Gaussian distributions, and inference based on exact junction tree propagation ascertains whether individuals, whose profiles have been measured, have contributed to the mixture. The system can also be used to predict DNA profiles of unknown contributors by separating the mixture into its individual components. The use of the system is illustrated with an application to a real world example. The system implements a novel MAP (maximum a posteriori) search algorithm that is described in an appendix. We consider in this paper the formulation of approximate inference in Bayesian networks as a problem of exact inference on an approximate network that results from deleting edges (to reduce treewidth). We have shown in earlier work that deleting edges calls for introducing auxiliary network parameters to compensate for lost dependencies, and proposed intuitive conditions for determining these parameters. We have also shown that our method corresponds to IBP when enough edges are deleted to yield a polytree, and corresponds to some generalizations of IBP when fewer edges are deleted. In this paper, we propose a different criteria for determining auxiliary parameters based on optimizing the KL-divergence between the original and approximate networks. We discuss the relationship between the two methods for selecting parameters, shedding new light on IBP and its generalizations. We also discuss the application of our new method to approximating inference problems which are exponential in constrained treewidth, including MAP and nonmyopic value of information. The effect of inaccuracies in the parameters of a dynamic Bayesian network can be investigated by subjecting the network to a sensitivity analysis. Having detailed the resulting sensitivity functions in our previous work, we now study the effect of parameter inaccuracies on a recommended decision in view of a threshold decision-making model. We detail the effect of varying a single and multiple parameters from a conditional probability table and present a computational procedure for establishing bounds between which assessments for these parameters can be varied without inducing a change in the recommended decision. We illustrate the various concepts involved by means of a real-life dynamic network in the field of infectious disease. Consider a multi-agent system in a dynamic and uncertain environment. Each agent's local decision problem is modeled as a Markov decision process (MDP) and agents must coordinate on a joint action in each period, which provides a reward to each agent and causes local state transitions. A social planner knows the model of every agent's MDP and wants to implement the optimal joint policy, but agents are self-interested and have private local state. We provide an incentive-compatible mechanism for eliciting state information that achieves the optimal joint plan in a Markov perfect equilibrium of the induced stochastic game. In the special case in which local problems are Markov chains and agents compete to take a single action in each period, we leverage Gittins allocation indices to provide an efficient factored algorithm and distribute computation of the optimal policy among the agents. Distributed, optimal coordinated learning in a multi-agent variant of the multi-armed bandit problem is obtained as a special case. The paper concerns the problem of predicting the effect of actions or interventions on a system from a combination of (i) statistical data on a set of observed variables, and (ii) qualitative causal knowledge encoded in the form of a directed acyclic graph (DAG). The DAG represents a set of linear equations called Structural Equations Model (SEM), whose coefficients are parameters representing direct causal effects. Reliable quantitative conclusions can only be obtained from the model if the causal effects are uniquely determined by the data. That is, if there exists a unique parametrization for the model that makes it compatible with the data. If this is the case, the model is called identified. The main result of the paper is a general sufficient condition for identification of recursive SEM models. The paper analyzes theoretically and empirically the performance of likelihood weighting (LW) on a subset of nodes in Bayesian networks. The proposed scheme requires fewer samples to converge due to reduction in sampling variance. The method exploits the structure of the network to bound the complexity of exact inference used to compute sampling distributions, similar to Gibbs cutset sampling. Yet, the extension of the previosly proposed cutset sampling principles to likelihood weighting is non-trivial due to differences in the sampling processes of Gibbs sampler and LW. We demonstrate empirically that likelihood weighting on a cutset (LWLC) is effective time-wise and has a lower rejection rate than LW when applied to networks with many deterministic probabilities. Finally, we show that the performance of likelihood weighting on a cutset can be improved further by caching computed sampling distributions and, consequently, learning 'zeros' of the target distribution. Linear-time computational techniques have been developed for combining evidence which is available on a number of contending hypotheses. They offer a means of making the computation-intensive calculations involved more efficient in certain circumstances. Unfortunately, they restrict the orthogonal sum of evidential functions to the dichotomous structure applies only to elements and their complements. In this paper, we present a novel evidence structure in terms of a triplet and a set of algorithms for evidential reasoning. The merit of this structure is that it divides a set of evidence into three subsets, distinguishing trivial evidential elements from important ones focusing some particular elements. It avoids the deficits of the dichotomous structure in representing the preference of evidence and estimating the basic probability assignment of evidence. We have established a formalism for this structure and the general formulae for combining pieces of evidence in the form of the triplet, which have been theoretically justified. We observe that certain large-clique graph triangulations can be useful to reduce both computational and space requirements when making queries on mixed stochastic/deterministic graphical models. We demonstrate that many of these large-clique triangulations are non-minimal and are thus unattainable via the variable elimination algorithm. We introduce ancestral pairs as the basis for novel triangulation heuristics and prove that no more than the addition of edges between ancestral pairs need be considered when searching for state space optimal triangulations in such graphs. Empirical results on random and real world graphs show that the resulting triangulations that yield significant speedups are almost always non-minimal. We also give an algorithm and correctness proof for determining if a triangulation can be obtained via elimination, and we show that the decision problem associated with finding optimal state space triangulations in this mixed stochastic/deterministic setting is NP-complete. Separable Bayesian Networks, or the Influence Model, are dynamic Bayesian Networks in which the conditional probability distribution can be separated into a function of only the marginal distribution of a node's neighbors, instead of the joint distributions. In terms of modeling, separable networks has rendered possible siginificant reduction in complexity, as the state space is only linear in the number of variables on the network, in contrast to a typical state space which is exponential. In this work, We describe the connection between an arbitrary Conditional Probability Table (CPT) and separable systems using linear algebra. We give an alternate proof on the equivalence of sufficiency and separability. We present a computational method for testing whether a given CPT is separable. We consider a Bayesian method for learning the Bayesian network structure from complete data. Recently, Koivisto and Sood (2004) presented an algorithm that for any single edge computes its marginal posterior probability in O(n 2^n) time, where n is the number of attributes; the number of parents per attribute is bounded by a constant. In this paper we show that the posterior probabilities for all the n (n - 1) potential edges can be computed in O(n 2^n) total time. This result is achieved by a forward-backward technique and fast Moebius transform algorithms, which are of independent interest. The resulting speedup by a factor of about n^2 allows us to experimentally study the statistical power of learning moderate-size networks. We report results from a simulation study that covers data sets with 20 to 10,000 records over 5 to 25 discrete attributes We investigate methods for parameter learning from incomplete data that is not missing at random. Likelihood-based methods then require the optimization of a profile likelihood that takes all possible missingness mechanisms into account. Optimzing this profile likelihood poses two main difficulties: multiple (local) maxima, and its very high-dimensional parameter space. In this paper a new method is presented for optimizing the profile likelihood that addresses the second difficulty: in the proposed AI&M (adjusting imputation and mazimization) procedure the optimization is performed by operations in the space of data completions, rather than directly in the parameter space of the profile likelihood. We apply the AI&M method to learning parameters for Bayesian networks. The method is compared against conservative inference, which takes into account each possible data completion, and against EM. The results indicate that likelihood-based inference is still feasible in the case of unknown missingness mechanisms, and that conservative inference is unnecessarily weak. On the other hand, our results also provide evidence that the EM algorithm is still quite effective when the data is not missing at random. This paper is concerned with graphical criteria that can be used to solve the problem of identifying casual effects from nonexperimental data in a causal Bayesian network structure, i.e., a directed acyclic graph that represents causal relationships. We first review Pearl's work on this topic [Pearl, 1995], in which several useful graphical criteria are presented. Then we present a complete algorithm [Huang and Valtorta, 2006b] for the identifiability problem. By exploiting the completeness of this algorithm, we prove that the three basic do-calculus rules that Pearl presents are complete, in the sense that, if a causal effect is identifiable, there exists a sequence of applications of the rules of the do-calculus that transforms the causal effect formula into a formula that only includes observational quantities. Continuous-time Bayesian networks (CTBNs) are graphical representations of multi-component continuous-time Markov processes as directed graphs. The edges in the network represent direct influences among components. The joint rate matrix of the multi-component process is specified by means of conditional rate matrices for each component separately. This paper addresses the situation where some of the components evolve on a time scale that is much shorter compared to the time scale of the other components. In this paper, we prove that in the limit where the separation of scales is infinite, the Markov process converges (in distribution, or weakly) to a reduced, or effective Markov process that only involves the slow components. We also demonstrate that for reasonable separation of scale (an order of magnitude) the reduced process is a good approximation of the marginal process over the slow components. We provide a simple procedure for building a reduced CTBN for this effective process, with conditional rate matrices that can be directly calculated from the original CTBN, and discuss the implications for approximate reasoning in large systems. A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovich metric between probability distributions. However, while this metric enables many satisfying theoretical properties, it is costly to compute in practice. In this paper, we use techniques from network optimization and statistical sampling to overcome this problem. We obtain in this manner a variety of distance functions for MDP state aggregation, which differ in the tradeoff between time and space complexity, as well as the quality of the aggregation. We provide an empirical evaluation of these trade-offs. Directed possibly cyclic graphs have been proposed by Didelez (2000) and Nodelmann et al. (2002) in order to represent the dynamic dependencies among stochastic processes. These dependencies are based on a generalization of Granger-causality to continuous time, first developed by Schweder (1970) for Markov processes, who called them local dependencies. They deserve special attention as they are asymmetric unlike stochastic (in)dependence. In this paper we focus on their graphical representation and develop a suitable, i.e. asymmetric notion of separation, called delta-separation. The properties of this graph separation as well as of local independence are investigated in detail within a framework of asymmetric (semi)graphoids allowing a deeper insight into what information can be read off these graphs. Tasks such as record linkage and multi-target tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a general-purpose probabilistic modeling language (such as BLOG) and a generic Metropolis-Hastings MCMC algorithm that supports user-supplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a context-specific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our general-purpose MCMC engine compares favorably with an application-specific system. Collaborative data consist of ratings relating two distinct sets of objects: users and items. Much of the work with such data focuses on filtering: predicting unknown ratings for pairs of users and items. In this paper we focus on the problem of visualizing the information. Given all of the ratings, our task is to embed all of the users and items as points in the same Euclidean space. We would like to place users near items that they have rated (or would rate) high, and far away from those they would give a low rating. We pose this problem as a real-valued non-linear Bayesian network and employ Markov chain Monte Carlo and expectation maximization to find an embedding. We present a metric by which to judge the quality of a visualization and compare our results to local linear embedding and Eigentaste on three real-world datasets. Previous work in hierarchical reinforcement learning has faced a dilemma: either ignore the values of different possible exit states from a subroutine, thereby risking suboptimal behavior, or represent those values explicitly thereby incurring a possibly large representation cost because exit values refer to nonlocal aspects of the world (i.e., all subsequent rewards). This paper shows that, in many cases, one can avoid both of these problems. The solution is based on recursively decomposing the exit value function in terms of Q-functions at higher levels of the hierarchy. This leads to an intuitively appealing runtime architecture in which a parent subroutine passes to its child a value function on the exit states and the child reasons about how its choices affect the exit value. We also identify structural conditions on the value function and transition distributions that allow much more concise representations of exit state distributions, leading to further state abstraction. In essence, the only variables whose exit values need be considered are those that the parent cares about and the child affects. We demonstrate the utility of our algorithms on a series of increasingly complex environments. Traditional approaches to Bayes net structure learning typically assume little regularity in graph structure other than sparseness. However, in many cases, we expect more systematicity: variables in real-world systems often group into classes that predict the kinds of probabilistic dependencies they participate in. Here we capture this form of prior knowledge in a hierarchical Bayesian framework, and exploit it to enable structure learning and type discovery from small datasets. Specifically, we present a nonparametric generative model for directed acyclic graphs as a prior for Bayes net structure learning. Our model assumes that variables come in one or more classes and that the prior probability of an edge existing between two variables is a function only of their classes. We derive an MCMC algorithm for simultaneous inference of the number of classes, the class assignments of variables, and the Bayes net structure over variables. For several realistic, sparse datasets, we show that the bias towards systematicity of connections provided by our model yields more accurate learned networks than a traditional, uniform prior approach, and that the classes found by our model are appropriate. There are several existing algorithms that under appropriate assumptions can reliably identify a subset of the underlying causal relationships from observational data. This paper introduces the first computationally feasible score-based algorithm that can reliably identify causal relationships in the large sample limit for discrete models, while allowing for the possibility that there are unobserved common causes. In doing so, the algorithm does not ever need to assign scores to causal structures with unobserved common causes. The algorithm is based on the identification of so called Y substructures within Bayesian network structures that can be learned from observational data. An example of a Y substructure is A -> C, B -> C, C -> D. After providing background on causal discovery, the paper proves the conditions under which the algorithm is reliable in the large sample limit. Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous work have studied BNs sample complexity, yet it mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes. We show that structure learning is a more difficult task, compared to approximating the correct distribution, in the sense that it requires a much larger number of samples, regardless of the computational power available for the learner. We present a non-parametric Bayesian approach to structure learning with hidden causes. Previous Bayesian treatments of this problem define a prior over the number of hidden causes and use algorithms such as reversible jump Markov chain Monte Carlo to move between solutions. In contrast, we assume that the number of hidden causes is unbounded, but only a finite number influence observable variables. This makes it possible to use a Gibbs sampler to approximate the distribution over causal structures. We evaluate the performance of both approaches in discovering hidden causes in simulated data, and use our non-parametric approach to discover hidden causes in a real medical dataset. Expected Utility: Algebraic Expected Utility In this paper, we provide two axiomatizations of algebraic expected utility, which is a particular generalized expected utility, in a von Neumann-Morgenstern setting, i.e. uncertainty representation is supposed to be given and here to be described by a plausibility measure valued on a semiring, which could be partially ordered. We show that axioms identical to those for expected utility entail that preferences are represented by an algebraic expected utility. This algebraic approach allows many previous propositions (expected utility, binary possibilistic utility,...) to be unified in a same general framework and proves that the obtained utility enjoys the same nice features as expected utility: linearity, dynamic consistency, autoduality of the underlying uncertainty measure, autoduality of the decision criterion and possibility of modeling decision maker's attitude toward uncertainty. We introduce a new dynamic model with the capability of recognizing both activities that an individual is performing as well as where that ndividual is located. Our model is novel in that it utilizes a dynamic graphical model to jointly estimate both activity and spatial context over time based on the simultaneous use of asynchronous observations consisting of GPS measurements, and measurements from a small mountable sensor board. Joint inference is quite desirable as it has the ability to improve accuracy of the model. A key goal, however, in designing our overall system is to be able to perform accurate inference decisions while minimizing the amount of hardware an individual must wear. This minimization leads to greater comfort and flexibility, decreased power requirements and therefore increased battery life, and reduced cost. We show results indicating that our joint measurement model outperforms measurements from either the sensor board or GPS alone, using two types of probabilistic inference procedures, namely particle filtering and pruned exact inference. Model-based learning algorithms have been shown to use experience efficiently when learning to solve Markov Decision Processes (MDPs) with finite state and action spaces. However, their high computational cost due to repeatedly solving an internal model inhibits their use in large-scale problems. We propose a method based on real-time dynamic programming (RTDP) to speed up two model-based algorithms, RMAX and MBIE (model-based interval estimation), resulting in computationally much faster algorithms with little loss compared to existing bounds. Specifically, our two new learning algorithms, RTDP-RMAX and RTDP-IE, have considerably smaller computational demands than RMAX and MBIE. We develop a general theoretical framework that allows us to prove that both are efficient learners in a PAC (probably approximately correct) sense. We also present an experimental evaluation of these new algorithms that helps quantify the tradeoff between computational and experience demands. We study the problem of learning the best Bayesian network structure with respect to a decomposable score such as BDe, BIC or AIC. This problem is known to be NP-hard, which means that solving it becomes quickly infeasible as the number of variables increases. Nevertheless, in this paper we show that it is possible to learn the best Bayesian network structure with over 30 variables, which covers many practically interesting cases. Our algorithm is less complicated and more efficient than the techniques presented earlier. It can be easily parallelized, and offers a possibility for efficient exploration of the best networks consistent with different variable orderings. In the experimental part of the paper we compare the performance of the algorithm to the previous state-of-the-art algorithm. Free source-code and an online-demo can be found at http://b-course.hiit.fi/bene. The main goal of this paper is to describe a method for exact inference in general hybrid Bayesian networks (BNs) (with a mixture of discrete and continuous chance variables). Our method consists of approximating general hybrid Bayesian networks by a mixture of Gaussians (MoG) BNs. There exists a fast algorithm by Lauritzen-Jensen (LJ) for making exact inferences in MoG Bayesian networks, and there exists a commercial implementation of this algorithm. However, this algorithm can only be used for MoG BNs. Some limitations of such networks are as follows. All continuous chance variables must have conditional linear Gaussian distributions, and discrete chance nodes cannot have continuous parents. The methods described in this paper will enable us to use the LJ algorithm for a bigger class of hybrid Bayesian networks. This includes networks with continuous chance nodes with non-Gaussian distributions, networks with no restrictions on the topology of discrete and continuous variables, networks with conditionally deterministic variables that are a nonlinear function of their continuous parents, and networks with continuous chance variables whose variances are functions of their parents. Recent work on approximate linear programming (ALP) techniques for first-order Markov Decision Processes (FOMDPs) represents the value function linearly w.r.t. a set of first-order basis functions and uses linear programming techniques to determine suitable weights. This approach offers the advantage that it does not require simplification of the first-order value function, and allows one to solve FOMDPs independent of a specific domain instantiation. In this paper, we address several questions to enhance the applicability of this work: (1) Can we extend the first-order ALP framework to approximate policy iteration to address performance deficiencies of previous approaches? (2) Can we automatically generate basis functions and evaluate their impact on value function quality? (3) How can we decompose intractable problems with universally quantified rewards into tractable subproblems? We propose answers to these questions along with a number of novel optimizations and provide a comparative empirical evaluation on logistics problems from the ICAPS 2004 Probabilistic Planning Competition. With the aid of the concept of stable independence we can construct, in an efficient way, a compact representation of a semi-graphoid independence relation. We show that this representation provides a new necessary condition for the existence of a directed perfect map for the relation. The test for this condition is based to a large extent on the transitivity property of a special form of d-separation. The complexity of the test is linear in the size of the representation. The test, moreover, brings the additional benefit that it can be used to guide the early stages of network construction. Many real world sequences such as protein secondary structures or shell logs exhibit a rich internal structures. Traditional probabilistic models of sequences, however, consider sequences of flat symbols only. Logical hidden Markov models have been proposed as one solution. They deal with logical sequences, i.e., sequences over an alphabet of logical atoms. This comes at the expense of a more complex model selection problem. Indeed, different abstraction levels have to be explored. In this paper, we propose a novel method for selecting logical hidden Markov models from data called SAGEM. SAGEM combines generalized expectation maximization, which optimizes parameters, with structure search for model selection using inductive logic programming refinement operators. We provide convergence and experimental results that show SAGEM's effectiveness. In this paper we present a differential semantics of Lazy AR Propagation (LARP) in discrete Bayesian networks. We describe how both single and multi dimensional partial derivatives of the evidence may easily be calculated from a junction tree in LARP equilibrium. We show that the simplicity of the calculations stems from the nature of LARP. Based on the differential semantics we describe how variable propagation in the LARP architecture may give access to additional partial derivatives. The cautious LARP (cLARP) scheme is derived to produce a flexible cLARP equilibrium that offers additional opportunities for calculating single and multidimensional partial derivatives of the evidence and subsets of the evidence from a single propagation. The results of an empirical evaluation illustrates how the access to a largely increased number of partial derivatives comes at a low computational cost. This paper deals with the following problem: modify a Bayesian network to satisfy a given set of probability constraints by only change its conditional probability tables, and the probability distribution of the resulting network should be as close as possible to that of the original network. We propose to solve this problem by extending IPFP (iterative proportional fitting procedure) to probability distributions represented by Bayesian networks. The resulting algorithm E-IPFP is further developed to D-IPFP, which reduces the computational cost by decomposing a global EIPFP into a set of smaller local E-IPFP problems. Limited analysis is provided, including the convergence proofs of the two algorithms. Computer experiments were conducted to validate the algorithms. The results are consistent with the theoretical analysis. Studying the effects of one-way variation of any number of parameters on any number of output probabilities quickly becomes infeasible in practice, especially if various evidence profiles are to be taken into consideration. To provide for identifying the parameters that have a potentially large effect prior to actually performing the analysis, we need properties of sensitivity functions that are independent of the network under study, of the available evidence, or of both. In this paper, we study properties that depend upon just the probability of the entered evidence. We demonstrate that these properties provide for establishing an upper bound on the sensitivity value for a parameter; they further provide for establishing the region in which the vertex of the sensitivity function resides, thereby serving to identify parameters with a low sensitivity value that may still have a large impact on the probability of interest for relatively small parameter variations. We present multi-agent A* (MAA*), the first complete and optimal heuristic search algorithm for solving decentralized partially-observable Markov decision problems (DEC-POMDPs) with finite horizon. The algorithm is suitable for computing optimal plans for a cooperative group of agents that operate in a stochastic environment such as multirobot coordination, network traffic control, `or distributed resource allocation. Solving such problems efiectively is a major challenge in the area of planning under uncertainty. Our solution is based on a synthesis of classical heuristic search and decentralized control theory. Experimental results show that MAA* has significant advantages. We introduce an anytime variant of MAA* and conclude with a discussion of promising extensions such as an approach to solving infinite horizon problems. Structured utility models are essential for the effective representation and elicitation of complex multiattribute utility functions. Generalized additive independence (GAI) models provide an attractive structural model of user preferences, offering a balanced tradeoff between simplicity and applicability. While representation and inference with such models is reasonably well understood, elicitation of the parameters of such models has been studied less from a practical perspective. We propose a procedure to elicit GAI model parameters using only "local" utility queries rather than "global" queries over full outcomes. Our local queries take full advantage of GAI structure and provide a sound framework for extending the elicitation procedure to settings where the uncertainty over utility parameters is represented probabilistically. We describe experiments using a myopic value-of-information approach to elicitation in a large GAI model. It is well known that there may be many causal explanations that are consistent with a given set of data. Recent work has been done to represent the common aspects of these explanations into one representation. In this paper, we address what is less well known: how do the relationships common to every causal explanation among the observed variables of some DAG process change in the presence of latent variables? Ancestral graphs provide a class of graphs that can encode conditional independence relations that arise in DAG models with latent and selection variables. In this paper we present a set of orientation rules that construct the Markov equivalence class representative for ancestral graphs, given a member of the equivalence class. These rules are sound and complete. We also show that when the equivalence class includes a DAG, the equivalence class representative is the essential graph for the said DAG When a hybrid Bayesian network has conditionally deterministic variables with continuous parents, the joint density function for the continuous variables does not exist. Conditional linear Gaussian distributions can handle such cases when the continuous variables have a multi-variate normal distribution and the discrete variables do not have continuous parents. In this paper, operations required for performing inference with conditionally deterministic variables in hybrid Bayesian networks are developed. These methods allow inference in networks with deterministic variables where continuous variables may be non-Gaussian, and their density functions can be approximated by mixtures of truncated exponentials. There are no constraints on the placement of continuous and discrete nodes in the network. Planning in adversarial and uncertain environments can be modeled as the problem of devising strategies in stochastic perfect information games. These games are generalizations of Markov decision processes (MDPs): there are two (adversarial) players, and a source of randomness. The main practical obstacle to computing winning strategies in such games is the size of the state space. In practice therefore, one typically works with abstractions of the model. The diffculty is to come up with an abstraction that is neither too coarse to remove all winning strategies (plans), nor too fine to be intractable. In verification, the paradigm of counterexample-guided abstraction refinement has been successful to construct useful but parsimonious abstractions automatically. We extend this paradigm to probabilistic models (namely, perfect information games and, as a special case, MDPs). This allows us to apply the counterexample-guided abstraction paradigm to the AI planning problem. As special cases, we get planning algorithms for MDPs and deterministic systems that automatically construct system abstractions. The Bayesian Logic (BLOG) language was recently developed for defining first-order probability models over worlds with unknown numbers of objects. It handles important problems in AI, including data association and population estimation. This paper extends BLOG by adopting generative processes over function spaces - known as nonparametrics in the Bayesian literature. We introduce syntax for reasoning about arbitrary collections of objects, and their properties, in an intuitive manner. By exploiting exchangeability, distributions over unknown objects and their attributes are cast as Dirichlet processes, which resolve difficulties in model selection and inference caused by varying numbers of objects. We demonstrate these concepts with application to citation matching. Consider the case where causal relations among variables can be described as a Gaussian linear structural equation model. This paper deals with the problem of clarifying how the variance of a response variable would have changed if a treatment variable were assigned to some value (counterfactually), given that a set of variables is observed (actually). In order to achieve this aim, we reformulate the formulas of the counterfactual distribution proposed by Balke and Pearl (1995) through both the total effects and a covariance matrix of observed variables. We further extend the framework of Balke and Pearl (1995) from point observations to interval observations, and from an unconditional plan to a conditional plan. The results of this paper enable us to clarify the properties of counterfactual distribution and establish an optimal plan. We propose an efficient algorithm for estimation of possibility based qualitative expected utility. It is useful for decision making mechanisms where each possible decision is assigned a multi-attribute possibility distribution. The computational complexity of ordinary methods calculating the expected utility based on discretization is growing exponentially with the number of attributes, and may become infeasible with a high number of these attributes. We present series of theorems and lemmas proving the correctness of our algorithm that exibits a linear computational complexity. Our algorithm has been applied in the context of selecting the most prospective partners in multi-party multi-attribute negotiation, and can also be used in making decisions about potential offers during the negotiation as other similar problems. We present a framework to discover and characterize different classes of everyday activities from event-streams. We begin by representing activities as bags of event n-grams. This allows us to analyze the global structural information of activities, using their local event statistics. We demonstrate how maximal cliques in an undirected edge-weighted graph of activities, can be used for activity-class discovery in an unsupervised manner. We show how modeling an activity as a variable length Markov process, can be used to discover recurrent event-motifs to characterize the discovered activity-classes. We present results over extensive data-sets, collected from multiple active environments, to show the competence and generalizability of our proposed framework. This paper describes a general framework called Hybrid Dynamic Mixed Networks (HDMNs) which are Hybrid Dynamic Bayesian Networks that allow representation of discrete deterministic information in the form of constraints. We propose approximate inference algorithms that integrate and adjust well known algorithmic principles such as Generalized Belief Propagation, Rao-Blackwellised Particle Filtering and Constraint Propagation to address the complexity of modeling and reasoning in HDMNs. We use this framework to model a person's travel activity over time and to predict destination and routes given the current location. We present a preliminary empirical evaluation demonstrating the effectiveness of our modeling framework and algorithms using several variants of the activity model. We present a method for learning the parameters of a Bayesian network with prior knowledge about the signs of influences between variables. Our method accommodates not just the standard signs, but provides for context-specific signs as well. We show how the various signs translate into order constraints on the network parameters and how isotonic regression can be used to compute order-constrained estimates from the available data. Our experimental results show that taking prior knowledge about the signs of influences into account leads to an improved fit of the true distribution, especially when only a small sample of data is available. Moreover, the computed estimates are guaranteed to be consistent with the specified signs, thereby resulting in a network that is more likely to be accepted by experts in its domain of application. Planning and learning in Partially Observable MDPs (POMDPs) are among the most challenging tasks in both the AI and Operation Research communities. Although solutions to these problems are intractable in general, there might be special cases, such as structured POMDPs, which can be solved efficiently. A natural and possibly efficient way to represent a POMDP is through the predictive state representation (PSR) - a representation which recently has been receiving increasing attention. In this work, we relate POMDPs to multiplicity automata- showing that POMDPs can be represented by multiplicity automata with no increase in the representation size. Furthermore, we show that the size of the multiplicity automaton is equal to the rank of the predictive state representation. Therefore, we relate both the predictive state representation and POMDPs to the well-founded multiplicity automata literature. Based on the multiplicity automata representation, we provide a planning algorithm which is exponential only in the multiplicity automata rank rather than the number of states of the POMDP. As a result, whenever the predictive state representation is logarithmic in the standard POMDP representation, our planning algorithm is efficient. We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N >= 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K < 1/(2N) we provide an upper bound on the number experiments required to determine causal structure when each experiment simultaneously randomizes K variables. For large N, these bounds are significantly lower than the N - 1 bound required when each experiment randomizes at most one variable. For kmax < N/2, we show that (N/kmax-1)+N/(2kmax)log2(kmax) experiments aresufficient and in the worst case necessary. We over a conjecture as to the minimal number of experiments that are in the worst case sufficient to identify all causal relations among N observed variables that are a subset of the vertices of a DAG. A fundamental issue in real-world systems, such as sensor networks, is the selection of observations which most effectively reduce uncertainty. More specifically, we address the long standing problem of nonmyopically selecting the most informative subset of variables in a graphical model. We present the first efficient randomized algorithm providing a constant factor (1-1/e-epsilon) approximation guarantee for any epsilon > 0 with high confidence. The algorithm leverages the theory of submodular functions, in combination with a polynomial bound on sample complexity. We furthermore prove that no polynomial time algorithm can provide a constant factor approximation better than (1 - 1/e) unless P = NP. Finally, we provide extensive evidence of the effectiveness of our method on two complex real-world datasets. Tree-reweighted max-product (TRW) message passing is a modified form of the ordinary max-product algorithm for attempting to find minimal energy configurations in Markov random field with cycles. For a TRW fixed point satisfying the strong tree agreement condition, the algorithm outputs a configuration that is provably optimal. In this paper, we focus on the case of binary variables with pairwise couplings, and establish stronger properties of TRW fixed points that satisfy only the milder condition of weak tree agreement (WTA). First, we demonstrate how it is possible to identify part of the optimal solution|i.e., a provably optimal solution for a subset of nodes| without knowing a complete solution. Second, we show that for submodular functions, a WTA fixed point always yields a globally optimal solution. We establish that for binary variables, any WTA fixed point always achieves the global maximum of the linear programming relaxation underlying the TRW method. In this paper, we propose a revision-based approach for conflict resolution by generalizing the Disjunctive Maxi-Adjustment (DMA) approach (Benferhat et al. 2004). Revision operators can be classified into two different families: the model-based ones and the formula-based ones. So the revision-based approach has two different versions according to which family of revision operators is chosen. Two particular revision operators are considered, one is the Dalal's revision operator, which is a model-based revision operator, and the other is the cardinality-maximal based revision operator, which is a formulabased revision operator. When the Dalal's revision operator is chosen, the revision-based approach is independent of the syntactic form in each stratum and it captures some notion of minimal change. When the cardinalitymaximal based revision operator is chosen, the revision-based approach is equivalent to the DMA approach. We also show that both approaches are computationally easier than the DMA approach. Systems such as sensor networks and teams of autonomous robots consist of multiple autonomous entities that interact with each other in a distributed, asynchronous manner. These entities need to keep track of the state of the system as it evolves. Asynchronous systems lead to special challenges for monitoring, as nodes must update their beliefs independently of each other and no central coordination is possible. Furthermore, the state of the system continues to change as beliefs are being updated. Previous approaches to developing distributed asynchronous probabilistic reasoning systems have used static models. We present an approach using dynamic models, that take into account the way the system changes state over time. Our approach, which is based on belief propagation, is fully distributed and asynchronous, and allows the world to keep on changing as messages are being sent around. Experimental results show that our approach compares favorably to the factored frontier algorithm. Two types of probabilistic maps are popular in the mobile robotics literature: occupancy grids and geometric maps. Occupancy grids have the advantages of simplicity and speed, but they represent only a restricted class of maps and they make incorrect independence assumptions. On the other hand, current geometric approaches, which characterize the environment by features such as line segments, can represent complex environments compactly. However, they do not reason explicitly about occupancy, a necessity for motion planning; and, they lack a complete probability model over environmental structures. In this paper we present a probabilistic mapping technique based on polygonal random fields (PRF), which combines the advantages of both approaches. Our approach explicitly represents occupancy using a geometric representation, and it is based upon a consistent probability distribution over environments which avoids the incorrect independence assumptions made by occupancy grids. We show how sampling techniques for PRFs can be applied to localized laser and sonar data, and we demonstrate significant improvements in mapping performance over occupancy grids. Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. As shown previously, exact inference in CTBNs is intractable. We address the problem of approximate inference, allowing for general queries conditioned on evidence over continuous time intervals and at discrete time points. We show how CTBNs can be parameterized within the exponential family, and use that insight to develop a message passing scheme in cluster graphs and allows us to apply expectation propagation to CTBNs. The clusters in our cluster graph do not contain distributions over the cluster variables at individual time points, but distributions over trajectories of the variables throughout a duration. Thus, unlike discrete time temporal models such as dynamic Bayesian networks, we can adapt the time granularity at which we reason for different variables and in different conditions. The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets. This paper addresses a fundamental issue central to approximation methods for solving large Markov decision processes (MDPs): how to automatically learn the underlying representation for value function approximation? A novel theoretically rigorous framework is proposed that automatically generates geometrically customized orthonormal sets of basis functions, which can be used with any approximate MDP solver like least squares policy iteration (LSPI). The key innovation is a coordinate-free representation of value functions, using the theory of smooth functions on a Riemannian manifold. Hodge theory yields a constructive method for generating basis functions for approximating value functions based on the eigenfunctions of the self-adjoint (Laplace-Beltrami) operator on manifolds. In effect, this approach performs a global Fourier analysis on the state space graph to approximate value functions, where the basis functions reflect the largescale topology of the underlying state space. A new class of algorithms called Representation Policy Iteration (RPI) are presented that automatically learn both basis functions and approximately optimal policies. Illustrative experiments compare the performance of RPI with that of LSPI using two handcoded basis functions (RBF and polynomial state encodings). We introduce a new approximate solution technique for first-order Markov decision processes (FOMDPs). Representing the value function linearly w.r.t. a set of first-order basis functions, we compute suitable weights by casting the corresponding optimization as a first-order linear program and show how off-the-shelf theorem prover and LP software can be effectively used. This technique allows one to solve FOMDPs independent of a specific domain instantiation; furthermore, it allows one to determine bounds on approximation error that apply equally to all domain instantiations. We apply this solution technique to the task of elevator scheduling with a rich feature space and multi-criteria additive reward, and demonstrate that it outperforms a number of intuitive, heuristicallyguided policies. Models of dynamical systems based on predictive state representations (PSRs) are defined strictly in terms of observable quantities, in contrast with traditional models (such as Hidden Markov Models) that use latent variables or statespace representations. In addition, PSRs have an effectively infinite memory, allowing them to model some systems that finite memory-based models cannot. Thus far, PSR models have primarily been developed for domains with discrete observations. Here, we develop the Predictive Linear-Gaussian (PLG) model, a class of PSR models for domains with continuous observations. We show that PLG models subsume Linear Dynamical System models (also called Kalman filter models or state-space models) while using fewer parameters. We also introduce an algorithm to estimate PLG parameters from data, and contrast it with standard Expectation Maximization (EM) algorithms used to estimate Kalman filter parameters. We show that our algorithm is a consistent estimation procedure and present preliminary empirical results suggesting that our algorithm outperforms EM, particularly as the model dimension increases. We consider the problem of diagnosing faults in a system represented by a Bayesian network, where diagnosis corresponds to recovering the most likely state of unobserved nodes given the outcomes of tests (observed nodes). Finding an optimal subset of tests in this setting is intractable in general. We show that it is difficult even to compute the next most-informative test using greedy test selection, as it involves several entropy terms whose exact computation is intractable. We propose an approximate approach that utilizes the loopy belief propagation infrastructure to simultaneously compute approximations of marginal and conditional entropies on multiple subsets of nodes. We apply our method to fault diagnosis in computer networks, and show the algorithm to be very effective on realistic Internet-like topologies. We also provide theoretical justification for the greedy test selection approach, along with some performance guarantees. Different directed acyclic graphs (DAGs) may be Markov equivalent in the sense that they entail the same conditional independence relations among the observed variables. Chickering (1995) provided a transformational characterization of Markov equivalence for DAGs (with no latent variables), which is useful in deriving properties shared by Markov equivalent DAGs, and, with certain generalization, is needed to prove the asymptotic correctness of a search procedure over Markov equivalence classes, known as the GES algorithm. For DAG models with latent variables, maximal ancestral graphs (MAGs) provide a neat representation that facilitates model search. However, no transformational characterization -- analogous to Chickering's -- of Markov equivalent MAGs is yet available. This paper establishes such a characterization for directed MAGs, which we expect will have similar uses as it does for DAGs. One of the main problems of importance sampling in Bayesian networks is representation of the importance function, which should ideally be as close as possible to the posterior joint distribution. Typically, we represent an importance function as a factorization, i.e., product of conditional probability tables (CPTs). Given diagnostic evidence, we do not have explicit forms for the CPTs in the networks. We first derive the exact form for the CPTs of the optimal importance function. Since the calculation is hard, we usually only use their approximations. We review several popular strategies and point out their limitations. Based on an analysis of the influence of evidence, we propose a method for approximating the exact form of importance function by explicitly modeling the most important additional dependence relations introduced by evidence. Our experimental results show that the new approximation strategy offers an immediate improvement in the quality of the importance function. GBP and EP are two successful algorithms for approximate probabilistic inference, which are based on different approximation strategies. An open problem in both algorithms has been how to choose an appropriate approximation structure. We introduce 'structured region graphs', a formalism which marries these two strategies, reveals a deep connection between them, and suggests how to choose good approximation structures. In this formalism, each region has an internal structure which defines an exponential family, whose sufficient statistics must be matched by the parent region. Reduction operators on these structures allow conversion between EP and GBP free energies. Thus it is revealed that all EP approximations on discrete variables are special cases of GBP, and conversely that some wellknown GBP approximations, such as overlapping squares, are special cases of EP. Furthermore, region graphs derived from EP have a number of good structural properties, including maxent-normality and overall counting number of one. The result is a convenient framework for producing high-quality approximations with a user-adjustable level of complexity This article introduces the benefits of using computer as a tool for foreign language teaching and learning. It describes the effect of using Natural Language Processing (NLP) tools for learning Arabic. The technique explored in this particular case is the employment of pedagogically indexed corpora. This text-based method provides the teacher the advantage of building activities based on texts adapted to a particular pedagogical situation. This paper also presents ARAC: a Platform dedicated to language educators allowing them to create activities within their own pedagogical area of interest. The behavior composition problem involves automatically building a controller that is able to realize a desired, but unavailable, target system (e.g., a house surveillance) by suitably coordinating a set of available components (e.g., video cameras, blinds, lamps, a vacuum cleaner, phones, etc.) Previous work has almost exclusively aimed at bringing about the desired component in its totality, which is highly unsatisfactory for unsolvable problems. In this work, we develop an approach for approximate behavior composition without departing from the classical setting, thus making the problem applicable to a much wider range of cases. Based on the notion of simulation, we characterize what a maximal controller and the "closest" implementable target module (optimal approximation) are, and show how these can be computed using ATL model checking technology for a special case. We show the uniqueness of optimal approximations, and prove their soundness and completeness with respect to their imported controllers. We consider the problem of computing optimal generalised policies for relational Markov decision processes. We describe an approach combining some of the benefits of purely inductive techniques with those of symbolic dynamic programming methods. The latter reason about the optimal value function using first-order decision theoretic regression and formula rewriting, while the former, when provided with a suitable hypotheses language, are capable of generalising value functions or policies for small instances. Our idea is to use reasoning and in particular classical first-order regression to automatically generate a hypotheses language dedicated to the domain at hand, which is then used as input by an inductive solver. This approach avoids the more complex reasoning of symbolic dynamic programming while focusing the inductive solver's attention on concepts that are specifically relevant to the optimal value function for the domain considered. In this paper, we present a Branch and Bound algorithm called QuickBB for computing the treewidth of an undirected graph. This algorithm performs a search in the space of perfect elimination ordering of vertices of the graph. The algorithm uses novel pruning and propagation techniques which are derived from the theory of graph minors and graph isomorphism. We present a new algorithm called minor-min-width for computing a lower bound on treewidth that is used within the branch and bound algorithm and which improves over earlier available lower bounds. Empirical evaluation of QuickBB on randomly generated graphs and benchmarks in Graph Coloring and Bayesian Networks shows that it is consistently better than complete algorithms like QuickTree [Shoikhet and Geiger, 1997] in terms of cpu time. QuickBB also has good anytime performance, being able to generate a better upper bound on treewidth of some graphs whose optimal treewidth could not be computed up to now. This paper proposes a decision theory for a symbolic generalization of probability theory (SP). Darwiche and Ginsberg [2,3] proposed SP to relax the requirement of using numbers for uncertainty while preserving desirable patterns of Bayesian reasoning. SP represents uncertainty by symbolic supports that are ordered partially rather than completely as in the case of standard probability. We show that a preference relation on acts that satisfies a number of intuitive postulates is represented by a utility function whose domain is a set of pairs of supports. We argue that a subjective interpretation is as useful and appropriate for SP as it is for numerical probability. It is useful because the subjective interpretation provides a basis for uncertainty elicitation. It is appropriate because we can provide a decision theory that explains how preference on acts is based on support comparison. The representation of independence relations generally builds upon the well-known semigraphoid axioms of independence. Recently, a representation has been proposed that captures a set of dominant statements of an independence relation from which any other statement can be generated by means of the axioms; the cardinality of this set is taken to indicate the complexity of the relation. Building upon the idea of dominance, we introduce the concept of stability to provide for a more compact representation of independence. We give an associated algorithm for establishing such a representation.We show that, with our concept of stability, many independence relations are found to be of lower complexity than with existing representations. This paper investigates a representation language with flexibility inspired by probabilistic logic and compactness inspired by relational Bayesian networks. The goal is to handle propositional and first-order constructs together with precise, imprecise, indeterminate and qualitative probabilistic assessments. The paper shows how this can be achieved through the theory of credal networks. New exact and approximate inference algorithms based on multilinear programming and iterated/loopy propagation of interval probabilities are presented; their superior performance, compared to existing ones, is shown empirically. Early, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatio-temporal patterns of a non-contagious disease (respiratory anthrax infection) in a population of people. The number of parameters in such a network can become enormous, if not carefully managed. Also, inference needs to be performed in real time as population data stream in. We describe techniques we have applied to address both the modeling and inference challenges. A key contribution of this paper is the explication of assumptions and techniques that are sufficient to allow the scaling of Bayesian network modeling and inference to millions of nodes for real-time surveillance applications. The results reported here provide a proof-of-concept that Bayesian networks can serve as the foundation of a system that effectively performs Bayesian biosurveillance of disease outbreaks. Defeasible argumentation frameworks have evolved to become a sound setting to formalize commonsense, qualitative reasoning from incomplete and potentially inconsistent knowledge. Defeasible Logic Programming (DeLP) is a defeasible argumentation formalism based on an extension of logic programming. Although DeLP has been successfully integrated in a number of different real-world applications, DeLP cannot deal with explicit uncertainty, nor with vague knowledge, as defeasibility is directly encoded in the object language. This paper introduces P-DeLP, a new logic programming language that extends original DeLP capabilities for qualitative reasoning by incorporating the treatment of possibilistic uncertainty and fuzzy knowledge. Such features will be formalized on the basis of PGL, a possibilistic logic based on Godel fuzzy logic. Previous work on sensitivity analysis in Bayesian networks has focused on single parameters, where the goal is to understand the sensitivity of queries to single parameter changes, and to identify single parameter changes that would enforce a certain query constraint. In this paper, we expand the work to multiple parameters which may be in the CPT of a single variable, or the CPTs of multiple variables. Not only do we identify the solution space of multiple parameter changes that would be needed to enforce a query constraint, but we also show how to find the optimal solution, that is, the one which disturbs the current probability distribution the least (with respect to a specific measure of disturbance). We characterize the computational complexity of our new techniques and discuss their applications to developing and debugging Bayesian networks, and to the problem of reasoning about the value (reliability) of new information. The complexity of a reasoning task over a graphical model is tied to the induced width of the underlying graph. It is well-known that the conditioning (assigning values) on a subset of variables yields a subproblem of the reduced complexity where instantiated variables are removed. If the assigned variables constitute a cycle-cutset, the rest of the network is singly-connected and therefore can be solved by linear propagation algorithms. A w-cutset is a generalization of a cycle-cutset defined as a subset of nodes such that the subgraph with cutset nodes removed has induced-width of w or less. In this paper we address the problem of finding a minimal w-cutset in a graph. We relate the problem to that of finding the minimal w-cutset of a treedecomposition. The latter can be mapped to the well-known set multi-cover problem. This relationship yields a proof of NP-completeness on one hand and a greedy algorithm for finding a w-cutset of a tree decomposition on the other. Empirical evaluation of the algorithms is presented. Humans currently use arguments for explaining choices which are already made, or for evaluating potential choices. Each potential choice has usually pros and cons of various strengths. In spite of the usefulness of arguments in a decision making process, there have been few formal proposals handling this idea if we except works by Fox and Parsons and by Bonet and Geffner. In this paper we propose a possibilistic logic framework where arguments are built from an uncertain knowledge base and a set of prioritized goals. The proposed approach can compute two kinds of decisions by distinguishing between pessimistic and optimistic attitudes. When the available, maybe uncertain, knowledge is consistent, as well as the set of prioritized goals (which have to be fulfilled as far as possible), the method for evaluating decisions on the basis of arguments agrees with the possibility theory-based approach to decision-making under uncertainty. Taking advantage of its relation with formal approaches to defeasible argumentation, the proposed framework can be generalized in case of partially inconsistent knowledge, or goal bases. We introduce a probabilistic formalism subsuming Markov random fields of bounded tree width and probabilistic context free grammars. Our models are based on a representation of Boolean formulas that we call case-factor diagrams (CFDs). CFDs are similar to binary decision diagrams (BDDs) but are concise for circuits of bounded tree width (unlike BDDs) and can concisely represent the set of parse trees over a given string undera given context free grammar (also unlike BDDs). A probabilistic model consists of aCFD defining a feasible set of Boolean assignments and a weight (or cost) for each individual Boolean variable. We give an insideoutside algorithm for simultaneously computing the marginal of each Boolean variable, and a Viterbi algorithm for finding the mininum cost variable assignment. Both algorithms run in time proportional to the size of the CFD. Based on a recent development in the area of error control coding, we introduce the notion of convolutional factor graphs (CFGs) as a new class of probabilistic graphical models. In this context, the conventional factor graphs are referred to as multiplicative factor graphs (MFGs). This paper shows that CFGs are natural models for probability functions when summation of independent latent random variables is involved. In particular, CFGs capture a large class of linear models, where the linearity is in the sense that the observed variables are obtained as a linear ransformation of the latent variables taking arbitrary distributions. We use Gaussian models and independent factor models as examples to emonstrate the use of CFGs. The requirement of a linear transformation between latent variables (with certain independence restriction) and the bserved variables, to an extent, limits the modelling flexibility of CFGs. This structural restriction however provides a powerful analytic tool to the framework of CFGs; that is, upon taking the Fourier transform of the function represented by the CFG, the resulting function is represented by a FG with identical structure. This Fourier transform duality allows inference problems on a CFG to be solved on the corresponding dual MFG. As real-world Bayesian networks continue to grow larger and more complex, it is important to investigate the possibilities for improving the performance of existing algorithms of probabilistic inference. Motivated by examples, we investigate the dependency of the performance of Lazy propagation on the message computation algorithm. We show how Symbolic Probabilistic Inference (SPI) and Arc-Reversal (AR) can be used for computation of clique to clique messages in the addition to the traditional use of Variable Elimination (VE). In addition, the paper resents the results of an empirical evaluation of the performance of Lazy propagation using VE, SPI, and AR as the message computation algorithm. The results of the empirical evaluation show that for most networks, the performance of inference did not depend on the choice of message computation algorithm, but for some randomly generated networks the choice had an impact on both space and time performance. In the cases where the choice had an impact, AR produced the best results. Suppose that the only available information in a multi-class problem are expert estimates of the conditional probabilities of occurrence for a set of binary features. The aim is to select a subset of features to be measured in subsequent data collection experiments. In the lack of any information about the dependencies between the features, we assume that all features are conditionally independent and hence choose the Naive Bayes classifier as the optimal classifier for the problem. Even in this (seemingly trivial) case of complete knowledge of the distributions, choosing an optimal feature subset is not straightforward. We discuss the properties and implementation details of Sequential Forward Selection (SFS) as a feature selection procedure for the current problem. A sensitivity analysis was carried out to investigate whether the same features are selected when the probabilities vary around the estimated values. The procedure is illustrated with a set of probability estimates for Scrapie in sheep. Although many real-world stochastic planning problems are more naturally formulated by hybrid models with both discrete and continuous variables, current state-of-the-art methods cannot adequately address these problems. We present the first framework that can exploit problem structure for modeling and solving hybrid problems efficiently. We formulate these problems as hybrid Markov decision processes (MDPs with continuous and discrete state and action variables), which we assume can be represented in a factored way using a hybrid dynamic Bayesian network (hybrid DBN). This formulation also allows us to apply our methods to collaborative multiagent settings. We present a new linear program approximation method that exploits the structure of the hybrid MDP and lets us compute approximate value functions more efficiently. In particular, we describe a new factored discretization of continuous variables that avoids the exponential blow-up of traditional approaches. We provide theoretical bounds on the quality of such an approximation and on its scale-up potential. We support our theoretical arguments with experiments on a set of control problems with up to 28-dimensional continuous state space and 22-dimensional action space. Maximum a Posteriori assignment (MAP) is the problem of finding the most probable instantiation of a set of variables given the partial evidence on the other variables in a Bayesian network. MAP has been shown to be a NP-hard problem [22], even for constrained networks, such as polytrees [18]. Hence, previous approaches often fail to yield any results for MAP problems in large complex Bayesian networks. To address this problem, we propose AnnealedMAP algorithm, a simulated annealing-based MAP algorithm. The AnnealedMAP algorithm simulates a non-homogeneous Markov chain whose invariant function is a probability density that concentrates itself on the modes of the target density. We tested this algorithm on several real Bayesian networks. The results show that, while maintaining good quality of the MAP solutions, the AnnealedMAP algorithm is also able to solve many problems that are beyond the reach of previous approaches. In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost J in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. Weshow the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous. Generalized belief propagation (GBP) has proven to be a promising technique for approximate inference tasks in AI and machine learning. However, the choice of a good set of clusters to be used in GBP has remained more of an art then a science until this day. This paper proposes a sequential approach to adding new clusters of nodes and their interactions (i.e. "regions") to the approximation. We first review and analyze the recently introduced region graphs and find that three kinds of operations ("split", "merge" and "death") leave the free energy and (under some conditions) the fixed points of GBP invariant. This leads to the notion of "weakly irreducible" regions as the natural candidates to be added to the approximation. Computational complexity of the GBP algorithm is controlled by restricting attention to regions with small "region-width". Combining the above with an efficient (i.e. local in the graph) measure to predict the improved accuracy of GBP leads to the sequential "region pursuit" algorithm for adding new regions bottom-up to the region graph. Experiments show that this algorithm can indeed perform close to optimally. For many real-life Bayesian networks, common knowledge dictates that the output established for the main variable of interest increases with higher values for the observable variables. We define two concepts of monotonicity to capture this type of knowledge. We say that a network is isotone in distribution if the probability distribution computed for the output variable given specific observations is stochastically dominated by any such distribution given higher-ordered observations; a network is isotone in mode if a probability distribution given higher observations has a higher mode. We show that establishing whether a network exhibits any of these properties of monotonicity is coNPPP-complete in general, and remains coNP-complete for polytrees. We present an approximate algorithm for deciding whether a network is monotone in distribution and illustrate its application to a real-life network in oncology. Modeling dynamical systems, both for control purposes and to make predictions about their behavior, is ubiquitous in science and engineering. Predictive state representations (PSRs) are a recently introduced class of models for discrete-time dynamical systems. The key idea behind PSRs and the closely related OOMs (Jaeger's observable operator models) is to represent the state of the system as a set of predictions of observable outcomes of experiments one can do in the system. This makes PSRs rather different from history-based models such as nth-order Markov models and hidden-state-based models such as HMMs and POMDPs. We introduce an interesting construct, the systemdynamics matrix, and show how PSRs can be derived simply from it. We also use this construct to show formally that PSRs are more general than both nth-order Markov models and HMMs/POMDPs. Finally, we discuss the main difference between PSRs and OOMs and conclude with directions for future work. We characterize probabilities in Bayesian networks in terms of algebraic expressions called quasi-probabilities. These are arrived at by casting Bayesian networks as noisy AND-OR-NOT networks, and viewing the subnetworks that lead to a node as arguments for or against a node. Quasi-probabilities are in a sense the "natural" algebra of Bayesian networks: we can easily compute the marginal quasi-probability of any node recursively, in a compact form; and we can obtain the joint quasi-probability of any set of nodes by multiplying their marginals (using an idempotent product operator). Quasi-probabilities are easily manipulated to improve the efficiency of probabilistic inference. They also turn out to be representable as square-wave pulse trains, and joint and marginal distributions can be computed by multiplication and complementation of pulse trains. The sensitivities revealed by a sensitivity analysis of a probabilistic network typically depend on the entered evidence. For a real-life network therefore, the analysis is performed a number of times, with different evidence. Although efficient algorithms for sensitivity analysis exist, a complete analysis is often infeasible because of the large range of possible combinations of observations. In this paper we present a method for studying sensitivities that are invariant to the evidence entered. Our method builds upon the idea of establishing bounds between which a parameter can be varied without ever inducing a change in the most likely value of a variable of interest. Probabilistic inference problems arise naturally in distributed systems such as sensor networks and teams of mobile robots. Inference algorithms that use message passing are a natural fit for distributed systems, but they must be robust to the failure situations that arise in real-world settings, such as unreliable communication and node failures. Unfortunately, the popular sum-product algorithm can yield very poor estimates in these settings because the nodes' beliefs before convergence can be arbitrarily different from the correct posteriors. In this paper, we present a new message passing algorithm for probabilistic inference which provides several crucial guarantees that the standard sum-product algorithm does not. Not only does it converge to the correct posteriors, but it is also guaranteed to yield a principled approximation at any point before convergence. In addition, the computational complexity of the message passing updates depends only upon the model, and is dependent of the network topology of the distributed system. We demonstrate the approach with detailed experimental results on a distributed sensor calibration task using data from an actual sensor network deployment. We consider the problem of estimating the distribution underlying an observed sample of data. Instead of maximum likelihood, which maximizes the probability of the ob served values, we propose a different estimate, the high-profile distribution, which maximizes the probability of the observed profile the number of symbols appearing any given number of times. We determine the high-profile distribution of several data samples, establish some of its general properties, and show that when the number of distinct symbols observed is small compared to the data size, the high-profile and maximum-likelihood distributions are roughly the same, but when the number of symbols is large, the distributions differ, and high-profile better explains the data. A diagnostic policy specifies what test to perform next, based on the results of previous tests, and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the cost of tests and (b) the cost of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost. We formalize this diagnosis process as a Markov Decision Process (MDP). We investigate two types of algorithms for solving this MDP: systematic search based on AO* algorithm and greedy search (particularly the Value of Information method). We investigate the issue of learning the MDP probabilities from examples, but only as they are relevant to the search for good policies. We do not learn nor assume a Bayesian network for the diagnosis process. Regularizers are developed to control overfitting and speed up the search. This research is the first that integrates overfitting prevention into systematic search. The paper has two contributions: it discusses the factors that make systematic search feasible for diagnosis, and it shows experimentally, on benchmark data sets, that systematic search methods produce better diagnostic policies than greedy methods. Mixtures of truncated exponentials (MTE) potentials are an alternative to discretization for representing continuous chance variables in influence diagrams. Also, MTE potentials can be used to approximate utility functions. This paper introduces MTE influence diagrams, which can represent decision problems without restrictions on the relationships between continuous and discrete chance variables, without limitations on the distributions of continuous chance variables, and without limitations on the nature of the utility functions. In MTE influence diagrams, all probability distributions and the joint utility function (or its multiplicative factors) are represented by MTE potentials and decision nodes are assumed to have discrete state spaces. MTE influence diagrams are solved by variable elimination using a fusion algorithm. In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available. Optimizing the cost of evaluating a polynomial is a classic problem in computer science. For polynomials in one variable, Horner's method provides a scheme for producing a computationally efficient form. For multivariate polynomials it is possible to generalize Horner's method, but this leaves freedom in the order of the variables. Traditionally, greedy schemes like most-occurring variable first are used. This simple textbook algorithm has given remarkably efficient results. Finding better algorithms has proved difficult. In trying to improve upon the greedy scheme we have implemented Monte Carlo tree search, a recent search method from the field of artificial intelligence. This results in better Horner schemes and reduces the cost of evaluating polynomials, sometimes by factors up to two. This paper presents a novel fuzzy logic based Adaptive Super-twisting Sliding Mode Controller for the control of dynamic uncertain systems. The proposed controller combines the advantages of Second order Sliding Mode Control, Fuzzy Logic Control and Adaptive Control. The reaching conditions, stability and robustness of the system with the proposed controller are guaranteed. In addition, the proposed controller is well suited for simple design and implementation. The effectiveness of the proposed controller over the first order Sliding Mode Fuzzy Logic controller is illustrated by Matlab based simulations performed on a DC-DC Buck converter. Based on this comparison, the proposed controller is shown to obtain the desired transient response without causing chattering and error under steady-state conditions. The proposed controller is able to give robust performance in terms of rejection to input voltage variations and load variations. This paper presents a predictive control strategy based on neural network model of the plant is applied to Continuous Stirred Tank Reactor (CSTR). This system is a highly nonlinear process; therefore, a nonlinear predictive method, e.g., neural network predictive control, can be a better match to govern the system dynamics. In the paper, the NN model and the way in which it can be used to predict the behavior of the CSTR process over a certain prediction horizon are described, and some comments about the optimization procedure are made. Predictive control algorithm is applied to control the concentration in a continuous stirred tank reactor (CSTR), whose parameters are optimally determined by solving quadratic performance index using the optimization algorithm. An efficient control of the product concentration in cstr can be achieved only through accurate model. Here an attempt is made to alleviate the modeling difficulties using Artificial Intelligent technique such as Neural Network. Simulation results demonstrate the feasibility and effectiveness of the NNMPC technique. The evolution of the human society raises more and more difficult endeavors. For some of the real-life problems, the computing time-restriction enhances their complexity. The Matrix Bandwidth Minimization Problem (MBMP) seeks for a simultaneous permutation of the rows and the columns of a square matrix in order to keep its nonzero entries close to the main diagonal. The MBMP is a highly investigated P-complete problem, as it has broad applications in industry, logistics, artificial intelligence or information recovery. This paper describes a new attempt to use the Ant Colony Optimization framework in tackling MBMP. The introduced model is based on the hybridization of the Ant Colony System technique with new local search mechanisms. Computational experiments confirm a good performance of the proposed algorithm for the considered set of MBMP instances. The Matrix Bandwidth Minimization Problem (MBMP) seeks for a simultaneous reordering of the rows and the columns of a square matrix such that the nonzero entries are collected within a band of small width close to the main diagonal. The MBMP is a NP-complete problem, with applications in many scientific domains, linear systems, artificial intelligence, and real-life situations in industry, logistics, information recovery. The complex problems are hard to solve, that is why any attempt to improve their solutions is beneficent. Genetic algorithms and ant-based systems are Soft Computing methods used in this paper in order to solve some MBMP instances. Our approach is based on a learning agent-based model involving a local search procedure. The algorithm is compared with the classical Cuthill-McKee algorithm, and with a hybrid genetic algorithm, using several instances from Matrix Market collection. Computational experiments confirm a good performance of the proposed algorithms for the considered set of MBMP instances. On Soft Computing basis, we also propose a new theoretical Reinforcement Learning model for solving the MBMP problem. Effective search for graph automorphisms allows identifying symmetries in many discrete structures, ranging from chemical molecules to microprocessor circuits. Using this type of structure can enhance visualization as well as speed up computational optimization and verification. Competitive algorithms for the graph automorphism problem are based on efficient partition refinement augmented with group-theoretic pruning techniques. In this paper, we improve prior algorithms for the graph automorphism problem by introducing simultaneous refinement of multiple partitions, which enables the anticipation of future conflicts in search and leads to significant pruning, reducing overall runtimes. Empirically, we observe an exponential speedup for the family of Miyazaki graphs, which have been shown to impede leading graph-automorphism algorithms. The process of sorting marble plates according to their surface texture is an important task in the automated marble plate production. Nowadays some inspection systems in marble industry that automate the classification tasks are too expensive and are compatible only with specific technological equipment in the plant. In this paper a new approach to the design of an Automated Marble Plate Classification System (AMPCS),based on different neural network input training sets is proposed, aiming at high classification accuracy using simple processing and application of only standard devices. It is based on training a classification MLP neural network with three different input training sets: extracted texture histograms, Discrete Cosine and Wavelet Transform over the histograms. The algorithm is implemented in a PLC for real-time operation. The performance of the system is assessed with each one of the input training sets. The experimental test results regarding classification accuracy and quick operation are represented and discussed. Qualitative modelling is a technique integrating the fields of theoretical computer science, artificial intelligence and the physical and biological sciences. The aim is to be able to model the behaviour of systems without estimating parameter values and fixing the exact quantitative dynamics. Traditional applications are the study of the dynamics of physical and biological systems at a higher level of abstraction than that obtained by estimation of numerical parameter values for a fixed quantitative model. Qualitative modelling has been studied and implemented to varying degrees of sophistication in Petri nets, process calculi and constraint programming. In this paper we reflect on the strengths and weaknesses of existing frameworks, we demonstrate how recent advances in constraint programming can be leveraged to produce high quality qualitative models, and we describe the advances in theory and technology that would be needed to make constraint programming the best option for scientific investigation in the broadest sense. A significant theoretical advantage of search-and-score methods for learning Bayesian Networks is that they can accept informative prior beliefs for each possible network, thus complementing the data. In this paper, a method is presented for assigning priors based on beliefs on the presence or absence of certain paths in the true network. Such beliefs correspond to knowledge about the possible causal and associative relations between pairs of variables. This type of knowledge naturally arises from prior experimental and observational data, among others. In addition, a novel search-operator is proposed to take advantage of such prior knowledge. Experiments show that, using path beliefs improves the learning of the skeleton, as well as the edge directions in the network. We propose an approach to lifted approximate inference for first-order probabilistic models, such as Markov logic networks. It is based on performing exact lifted inference in a simplified first-order model, which is found by relaxing first-order constraints, and then compensating for the relaxation. These simplified models can be incrementally improved by carefully recovering constraints that have been relaxed, also at the first-order level. This leads to a spectrum of approximations, with lifted belief propagation on one end, and exact lifted inference on the other. We discuss how relaxation, compensation, and recovery can be performed, all at the firstorder level, and show empirically that our approach substantially improves on the approximations of both propositional solvers and lifted belief propagation. Much effort has been directed at algorithms for obtaining the highest probability configuration in a probabilistic random field model known as the maximum a posteriori (MAP) inference problem. In many situations, one could benefit from having not just a single solution, but the top M most probable solutions known as the M-Best MAP problem. In this paper, we propose an efficient message-passing based algorithm for solving the M-Best MAP problem. Specifically, our algorithm solves the recently proposed Linear Programming (LP) formulation of M-Best MAP [7], while being orders of magnitude faster than a generic LP-solver. Our approach relies on studying a particular partial Lagrangian relaxation of the M-Best MAP LP which exposes a natural combinatorial structure of the problem that we exploit. We address the problem of estimating the effect of intervening on a set of variables X from experiments on a different set, Z, that is more accessible to manipulation. This problem, which we call z-identifiability, reduces to ordinary identifiability when Z = empty and, like the latter, can be given syntactic characterization using the do-calculus [Pearl, 1995; 2000]. We provide a graphical necessary and sufficient condition for z-identifiability for arbitrary sets X,Z, and Y (the outcomes). We further develop a complete algorithm for computing the causal effect of X on Y using information provided by experiments on Z. Finally, we use our results to prove completeness of do-calculus relative to z-identifiability, a result that does not follow from completeness relative to ordinary identifiability. The MPE (Most Probable Explanation) query plays an important role in probabilistic inference. MPE solution algorithms for probabilistic relational models essentially adapt existing belief assessment method, replacing summation with maximization. But the rich structure and symmetries captured by relational models together with the properties of the maximization operator offer an opportunity for additional simplification with potentially significant computational ramifications. Specifically, these models often have groups of variables that define symmetric distributions over some population of formulas. The maximizing choice for different elements of this group is the same. If we can realize this ahead of time, we can significantly reduce the size of the model by eliminating a potentially significant portion of random variables. This paper defines the notion of uniformly assigned and partially uniformly assigned sets of variables, shows how one can recognize these sets efficiently, and how the model can be greatly simplified once we recognize them, with little computational effort. We demonstrate the effectiveness of these ideas empirically on a number of models. The do-calculus was developed in 1995 to facilitate the identification of causal effects in non-parametric models. The completeness proofs of [Huang and Valtorta, 2006] and [Shpitser and Pearl, 2006] and the graphical criteria of [Tian and Shpitser, 2010] have laid this identification problem to rest. Recent explorations unveil the usefulness of the do-calculus in three additional areas: mediation analysis [Pearl, 2012], transportability [Pearl and Bareinboim, 2011] and metasynthesis. Meta-synthesis (freshly coined) is the task of fusing empirical results from several diverse studies, conducted on heterogeneous populations and under different conditions, so as to synthesize an estimate of a causal relation in some target environment, potentially different from those under study. The talk surveys these results with emphasis on the challenges posed by meta-synthesis. For background material, see http://bayes.cs.ucla.edu/csl_papers.html We consider a setting where an agent's uncertainty is represented by a set of probability measures, rather than a single measure. Measure-bymeasure updating of such a set of measures upon acquiring new information is well-known to suffer from problems; agents are not always able to learn appropriately. To deal with these problems, we propose using weighted sets of probabilities: a representation where each measure is associated with a weight, which denotes its significance. We describe a natural approach to updating in such a situation and a natural approach to determining the weights. We then show how this representation can be used in decision-making, by modifying a standard approach to decision making-minimizing expected regret-to obtain minimax weighted expected regret (MWER).We provide an axiomatization that characterizes preferences induced by MWER both in the static and dynamic case. This paper presents a novel approach to the problem of semantic parsing via learning the correspondences between complex sentences and rich sets of events. Our main intuition is that correct correspondences tend to occur more frequently. Our model benefits from a discriminative notion of similarity to learn the correspondence between sentence and an event and a ranking machinery that scores the popularity of each correspondence. Our method can discover a group of events (called macro-events) that best describes a sentence. We evaluate our method on our novel dataset of professional soccer commentaries. The empirical results show that our method significantly outperforms the state-of-theart. This paper provides some new guidance in the construction of region graphs for Generalized Belief Propagation (GBP). We connect the problem of choosing the outer regions of a LoopStructured Region Graph (SRG) to that of finding a fundamental cycle basis of the corresponding Markov network. We also define a new class of tree-robust Loop-SRG for which GBP on any induced (spanning) tree of the Markov network, obtained by setting to zero the off-tree interactions, is exact. This class of SRG is then mapped to an equivalent class of tree-robust cycle bases on the Markov network. We show that a treerobust cycle basis can be identified by proving that for every subset of cycles, the graph obtained from the edges that participate in a single cycle only, is multiply connected. Using this we identify two classes of tree-robust cycle bases: planar cycle bases and "star" cycle bases. In experiments we show that tree-robustness can be successfully exploited as a design principle to improve the accuracy and convergence of GBP. We consider the problem of sampling from solutions defined by a set of hard constraints on a combinatorial space. We propose a new sampling technique that, while enforcing a uniform exploration of the search space, leverages the reasoning power of a systematic constraint solver in a black-box scheme. We present a series of challenging domains, such as energy barriers and highly asymmetric spaces, that reveal the difficulties introduced by hard constraints. We demonstrate that standard approaches such as Simulated Annealing and Gibbs Sampling are greatly affected, while our new technique can overcome many of these difficulties. Finally, we show that our sampling scheme naturally defines a new approximate model counting technique, which we empirically show to be very accurate on a range of benchmark problems. Decentralized partially observable Markov decision processes (Dec-POMDPs) are rich models for cooperative decision-making under uncertainty, but are often intractable to solve optimally (NEXP-complete). The transition and observation independent Dec-MDP is a general subclass that has been shown to have complexity in NP, but optimal algorithms for this subclass are still inefficient in practice. In this paper, we first provide an updated proof that an optimal policy does not depend on the histories of the agents, but only the local observations. We then present a new algorithm based on heuristic search that is able to expand search nodes by using constraint optimization. We show experimental results comparing our approach with the state-of-the-art DecMDP and Dec-POMDP solvers. These results show a reduction in computation time and an increase in scalability by multiple orders of magnitude in a number of benchmarks. We target the problem of accuracy and robustness in causal inference from finite data sets. Some state-of-the-art algorithms produce clear output complete with solid theoretical guarantees but are susceptible to propagating erroneous decisions, while others are very adept at handling and representing uncertainty, but need to rely on undesirable assumptions. Our aim is to combine the inherent robustness of the Bayesian approach with the theoretical strength and clarity of constraint-based methods. We use a Bayesian score to obtain probability estimates on the input statements used in a constraint-based procedure. These are subsequently processed in decreasing order of reliability, letting more reliable decisions take precedence in case of con icts, until a single output model is obtained. Tests show that a basic implementation of the resulting Bayesian Constraint-based Causal Discovery (BCCD) algorithm already outperforms established procedures such as FCI and Conservative PC. It can also indicate which causal decisions in the output have high reliability and which do not. Orienteering problems (OPs) are a variant of the well-known prize-collecting traveling salesman problem, where the salesman needs to choose a subset of cities to visit within a given deadline. OPs and their extensions with stochastic travel times (SOPs) have been used to model vehicle routing problems and tourist trip design problems. However, they suffer from two limitations travel times between cities are assumed to be time independent and the route provided is independent of the risk preference (with respect to violating the deadline) of the user. To address these issues, we make the following contributions: We introduce (1) a dynamic SOP (DSOP) model, which is an extension of SOPs with dynamic (time-dependent) travel times; (2) a risk-sensitive criterion to allow for different risk preferences; and (3) a local search algorithm to solve DSOPs with this risk-sensitive criterion. We evaluated our algorithms on a real-world dataset for a theme park navigation problem as well as synthetic datasets employed in the literature. Stochastic Shortest Path (SSP) MDPs is a problem class widely studied in AI, especially in probabilistic planning. They describe a wide range of scenarios but make the restrictive assumption that the goal is reachable from any state, i.e., that dead-end states do not exist. Because of this, SSPs are unable to model various scenarios that may have catastrophic events (e.g., an airplane possibly crashing if it flies into a storm). Even though MDP algorithms have been used for solving problems with dead ends, a principled theory of SSP extensions that would allow dead ends, including theoretically sound algorithms for solving such MDPs, has been lacking. In this paper, we propose three new MDP classes that admit dead ends under increasingly weaker assumptions. We present Value Iteration-based as well as the more efficient heuristic search algorithms for optimally solving each class, and explore theoretical relationships between these classes. We also conduct a preliminary empirical study comparing the performance of our algorithms on different MDP classes, especially on scenarios with unavoidable dead ends. Much of scientific data is collected as randomized experiments intervening on some and observing other variables of interest. Quite often, a given phenomenon is investigated in several studies, and different sets of variables are involved in each study. In this article we consider the problem of integrating such knowledge, inferring as much as possible concerning the underlying causal structure with respect to the union of observed variables from such experimental or passive observational overlapping data sets. We do not assume acyclicity or joint causal sufficiency of the underlying data generating model, but we do restrict the causal relationships to be linear and use only second order statistics of the data. We derive conditions for full model identifiability in the most generic case, and provide novel techniques for incorporating an assumption of faithfulness to aid in inference. In each case we seek to establish what is and what is not determined by the data at hand. In typical real-time strategy (RTS) games, enemy units are visible only when they are within sight range of a friendly unit. Knowledge of an opponent's disposition is limited to what can be observed through scouting. Information is costly, since units dedicated to scouting are unavailable for other purposes, and the enemy will resist scouting attempts. It is important to infer as much as possible about the opponent's current and future strategy from the available observations. We present a dynamic Bayes net model of strategies in the RTS game Starcraft that combines a generative model of how strategies relate to observable quantities with a principled framework for incorporating evidence gained via scouting. We demonstrate the model's ability to infer unobserved aspects of the game from realistic observations. Cooperative Bayesian games (BGs) can model decision-making problems for teams of agents under imperfect information, but require space and computation time that is exponential in the number of agents. While agent independence has been used to mitigate these problems in perfect information settings, we propose a novel approach for BGs based on the observation that BGs additionally possess a different types of structure, which we call type independence. We propose a factor graph representation that captures both forms of independence and present a theoretical analysis showing that non-serial dynamic programming cannot effectively exploit type independence, while Max-Sum can. Experimental results demonstrate that our approach can tackle cooperative Bayesian games of unprecedented size. A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes' rule to these distribution embeddings. Policies and value functions are defined on the feature space over states, which leads to a feature space expression for the Bellman equation. Value iteration may then be used to estimate the optimal value function and associated policy. Experimental results confirm that the correct policy is learned using the feature space representation. Learning a Bayesian network structure from data is an NP-hard problem and thus exact algorithms are feasible only for small data sets. Therefore, network structures for larger networks are usually learned with various heuristics. Another approach to scaling up the structure learning is local learning. In local learning, the modeler has one or more target variables that are of special interest; he wants to learn the structure near the target variables and is not interested in the rest of the variables. In this paper, we present a score-based local learning algorithm called SLL. We conjecture that our algorithm is theoretically sound in the sense that it is optimal in the limit of large sample size. Empirical results suggest that SLL is competitive when compared to the constraint-based HITON algorithm. We also study the prospects of constructing the network structure for the whole node set based on local results by presenting two algorithms and comparing them to several heuristics. Agents learning to act autonomously in real-world domains must acquire a model of the dynamics of the domain in which they operate. Learning domain dynamics can be challenging, especially where an agent only has partial access to the world state, and/or noisy external sensors. Even in standard STRIPS domains, existing approaches cannot learn from noisy, incomplete observations typical of real-world domains. We propose a method which learns STRIPS action models in such domains, by decomposing the problem into first learning a transition function between states in the form of a set of classifiers, and then deriving explicit STRIPS rules from the classifiers' parameters. We evaluate our approach on simulated standard planning domains from the International Planning Competition, and show that it learns useful domain descriptions from noisy, incomplete observations. Markov networks (MNs) are a powerful way to compactly represent a joint probability distribution, but most MN structure learning methods are very slow, due to the high cost of evaluating candidates structures. Dependency networks (DNs) represent a probability distribution as a set of conditional probability distributions. DNs are very fast to learn, but the conditional distributions may be inconsistent with each other and few inference algorithms support DNs. In this paper, we present a closed-form method for converting a DN into an MN, allowing us to enjoy both the efficiency of DN learning and the convenience of the MN representation. When the DN is consistent, this conversion is exact. For inconsistent DNs, we present averaging methods that significantly improve the approximation. In experiments on 12 standard datasets, our methods are orders of magnitude faster than and often more accurate than combining conditional distributions using weight learning. We consider the linear programming relaxation of an energy minimization problem for Markov Random Fields. The dual objective of this problem can be treated as a concave and unconstrained, but non-smooth function. The idea of smoothing the objective prior to optimization was recently proposed in a series of papers. Some of them suggested the idea to decrease the amount of smoothing (so called temperature) while getting closer to the optimum. However, no theoretical substantiation was provided. We propose an adaptive smoothing diminishing algorithm based on the duality gap between relaxed primal and dual objectives and demonstrate the efficiency of our approach with a smoothed version of Sequential Tree-Reweighted Message Passing (TRW-S) algorithm. The strategy is applicable to other algorithms as well, avoids adhoc tuning of the smoothing during iterations, and provably guarantees convergence to the optimum. EDML is a recently proposed algorithm for learning MAP parameters in Bayesian networks. In this paper, we present a number of new advances and insights on the EDML algorithm. First, we provide the multivalued extension of EDML, originally proposed for Bayesian networks over binary variables. Next, we identify a simplified characterization of EDML that further implies a simple fixed-point algorithm for the convex optimization problem that underlies it. This characterization further reveals a connection between EDML and EM: a fixed point of EDML is a fixed point of EM, and vice versa. We thus identify also a new characterization of EM fixed points, but in the semantics of EDML. Finally, we propose a hybrid EDML/EM algorithm that takes advantage of the improved empirical convergence behavior of EDML, while maintaining the monotonic improvement property of EM. Recently two search algorithms, A* and breadth-first branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper. We describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several teaching frameworks based on previously defined supervised protocols, such as Teaching Dimension, extending them to handle noise and sequences of inputs encountered in an MDP.We provide theoretical bounds on the learnability of several important model classes in this setting and suggest a practical algorithm for dynamic teaching. Easy access and vast amount of data, especially from long period of time, allows to divide social network into timeframes and create temporal social network. Such network enables to analyse its dynamics. One aspect of the dynamics is analysis of social communities evolution, i.e., how particular group changes over time. To do so, the complete group evolution history is needed. That is why in this paper the new method for group evolution extraction called GED is presented. Object detection is a fundamental task in computer vision and has many applications in image processing. This paper proposes a new approach for object detection by applying scale invariant feature transform (SIFT) in an automatic segmentation algorithm. SIFT is an invariant algorithm respect to scale, translation and rotation. The features are very distinct and provide stable keypoints that can be used for matching an object in different images. At first, an object is trained with different aspects for finding best keypoints. The object can be recognized in the other images by using achieved keypoints. Then, a robust segmentation algorithm is used to detect the object with full boundary based on SIFT keypoints. In segmentation algorithm, a merging role is defined to merge the regions in image with the assistance of keypoints. The results show that the proposed approach is reliable for object detection and can extract object boundary well. Requirements engineers should strive to get a better insight into decision making processes. During elicitation of requirements, decision making influences how stakeholders communicate with engineers, thereby affecting the engineers' understanding of requirements for the future information system. Empirical studies issued from Artificial Intelligence offer an adequate groundwork to understand how decision making is influenced by some particular contextual factors. However, no research has gone into the validation of such empirical studies in the process of collecting needs of the future system's users. As an answer, the paper empirically studies factors, initially identified by AI literature, that influence decision making and communication during requirements elicitation. We argue that the context's structure of the decision should be considered as a cornerstone to adequately study how stakeholders decide to communicate or not a requirement. The paper proposes a context framework to categorize former factors into specific families, and support the engineers during the elicitation process. A wide range of engineering design problems have been solved by the algorithms that simulates collective intelligence in swarms of birds or insects. The Artificial Bee Colony or ABC is one of the recent additions to the class of swarm intelligence based algorithms that mimics the foraging behavior of honey bees. ABC consists of three groups of bees namely employed, onlooker and scout bees. In ABC, the food locations represent the potential candidate solution. In the present study an attempt is made to generate the population of food sources (Colony Size) adaptively and the variant is named as A-ABC. A-ABC is further enhanced to improve convergence speed and exploitation capability, by employing the concept of elitism, which guides the bees towards the best food source. This enhanced variant is called E-ABC. The proposed algorithms are validated on a set of standard benchmark problems with varying dimensions taken from literature and on five engineering design problems. The numerical results are compared with the basic ABC and three recent variant of ABC. Numerically and statistically simulated results illustrate that the proposed method is very efficient and competitive. The improvement of medical care quality is a significant interest for the future years. The fight against nosocomial infections (NI) in the intensive care units (ICU) is a good example. We will focus on a set of observations which reflect the dynamic aspect of the decision, result of the application of a Medical Decision Support System (MDSS). This system has to make dynamic decision on temporal data. We use dynamic Bayesian network (DBN) to model this dynamic process. It is a temporal reasoning within a real-time environment; we are interested in the Dynamic Decision Support Systems in healthcare domain (MDDSS). Traffic regulation must be respected by all vehicles, either human- or computer- driven. However, extreme traffic situations might exhibit practical cases in which a vehicle should safely and reasonably relax traffic regulation, e.g., in order not to be indefinitely blocked and to keep circulating. In this paper, we propose a high-level representation of an automated vehicle, other vehicles and their environment, which can assist drivers in taking such "illegal" but practical relaxation decisions. This high-level representation (an ontology) includes topological knowledge and inference rules, in order to compute the next high-level motion an automated vehicle should take, as assistance to a driver. Results on practical cases are presented. We look at the problem of revising fuzzy belief bases, i.e., belief base revision in which both formulas in the base as well as revision-input formulas can come attached with varying truth-degrees. Working within a very general framework for fuzzy logic which is able to capture a variety of types of inference under uncertainty, such as truth-functional fuzzy logics and certain types of probabilistic inference, we show how the idea of rational change from 'crisp' base revision, as embodied by the idea of partial meet revision, can be faithfully extended to revising fuzzy belief bases. We present and axiomatise an operation of partial meet fuzzy revision and illustrate how the operation works in several important special instances of the framework. WA qualitative probabilistic network models the probabilistic relationships between its variables by means of signs. Non-monotonic influences have associated an ambiguous sign. These ambiguous signs typically lead to uninformative results upon inference. A non-monotonic influence can, however, be associated with a, more informative, sign that indicates its effect in the current state of the network. To capture this effect, we introduce the concept of situational sign. Furthermore, if the network converts to a state in which all variables that provoke the non-monotonicity have been observed, a non-monotonic influence reduces to a monotonic influence. We study the persistence and propagation of situational signs upon inference and give a method to establish the sign of a reduced influence. The paper studies empirically the time-space trade-off between sampling and inference in a sl cutset sampling algorithm. The algorithm samples over a subset of nodes in a Bayesian network and applies exact inference over the rest. Consequently, while the size of the sampling space decreases, requiring less samples for convergence, the time for generating each single sample increases. The w-cutset sampling selects a sampling set such that the induced-width of the network when the sampling set is observed is bounded by w, thus requiring inference whose complexity is exponential in w. In this paper, we investigate performance of w-cutset sampling over a range of w values and measure the accuracy of w-cutset sampling as a function of w. Our experiments demonstrate that the cutset sampling idea is quite powerful showing that an optimal balance between inference and sampling benefits substantially from restricting the cutset size, even at the cost of more complex inference. Backtracking search is a powerful algorithmic paradigm that can be used to solve many problems. It is in a certain sense the dual of variable elimination; but on many problems, e.g., SAT, it is vastly superior to variable elimination in practice. Motivated by this we investigate the application of backtracking search to the problem of Bayesian inference (Bayes). We show that natural generalizations of known techniques allow backtracking search to achieve performance guarantees similar to standard algorithms for Bayes, and that there exist problems on which backtracking can in fact do much better. We also demonstrate that these ideas can be applied to implement a Bayesian inference engine whose performance is competitive with standard algorithms. Since backtracking search can very naturally take advantage of context specific structure, the potential exists for performance superior to standard algorithms on many problems. Recursive Conditioning (RC) was introduced recently as the first any-space algorithm for inference in Bayesian networks which can trade time for space by varying the size of its cache at the increment needed to store a floating point number. Under full caching, RC has an asymptotic time and space complexity which is comparable to mainstream algorithms based on variable elimination and clustering (exponential in the network treewidth and linear in its size). We show two main results about RC in this paper. First, we show that its actual space requirements under full caching are much more modest than those needed by mainstream methods and study the implications of this finding. Second, we show that RC can effectively deal with determinism in Bayesian networks by employing standard logical techniques, such as unit resolution, allowing a significant reduction in its time requirements in certain cases. We illustrate our results using a number of benchmark networks, including the very challenging ones that arise in genetic linkage analysis. Symbolic representations have been used successfully in off-line planning algorithms for Markov decision processes. We show that they can also improve the performance of on-line planners. In addition to reducing computation time, symbolic generalization can reduce the amount of costly real-world interactions required for convergence. We introduce Symbolic Real-Time Dynamic Programming (or sRTDP), an extension of RTDP. After each step of on-line interaction with an environment, sRTDP uses symbolic model-checking techniques to generalizes its experience by updating a group of states rather than a single state. We examine two heuristic approaches to dynamic grouping of states and show that they accelerate the planning process significantly in terms of both CPU time and the number of steps of interaction with the environment. In Non - ergodic belief networks the posterior belief OF many queries given evidence may become zero.The paper shows that WHEN belief propagation IS applied iteratively OVER arbitrary networks(the so called, iterative OR loopy belief propagation(IBP)) it IS identical TO an arc - consistency algorithm relative TO zero - belief queries(namely assessing zero posterior probabilities). This implies that zero - belief conclusions derived BY belief propagation converge AND are sound.More importantly it suggests that the inference power OF IBP IS AS strong AND AS weak, AS that OF arc - consistency.This allows the synthesis OF belief networks FOR which belief propagation IS useless ON one hand, AND focuses the investigation OF classes OF belief network FOR which belief propagation may be zero - complete.Finally, ALL the above conclusions apply also TO Generalized belief propagation algorithms that extend loopy belief propagation AND allow a crisper understanding OF their power. Constraint-based (CB) learning is a formalism for learning a causal network with a database D by performing a series of conditional-independence tests to infer structural information. This paper considers a new test of independence that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test. The new test can be calculated in the same asymptotic time and space required for the standard tests such as the chi-squared test, but it allows the specification of a prior distribution over parameters and can be used when the database is incomplete. We prove that the test is correct, and we demonstrate empirically that, when used with a CB causal discovery algorithm with noninformative priors, it recovers structural features more reliably and it produces networks with smaller KL-Divergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small. In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3. Pearls concept OF a d - connecting path IS one OF the foundations OF the modern theory OF graphical models : the absence OF a d - connecting path IN a DAG indicates that conditional independence will hold IN ANY distribution factorising according TO that graph. IN this paper we show that IN singly - connected Gaussian DAGs it IS possible TO USE the form OF a d - connection TO obtain qualitative information about the strength OF conditional dependence.More precisely, the squared partial correlations BETWEEN two given variables, conditioned ON different subsets may be partially ordered BY examining the relationship BETWEEN the d - connecting path AND the SET OF variables conditioned upon. Bayesian network classifiers are used in many fields, and one common class of classifiers are naive Bayes classifiers. In this paper, we introduce an approach for reasoning about Bayesian network classifiers in which we explicitly convert them into Ordered Decision Diagrams (ODDs), which are then used to reason about the properties of these classifiers. Specifically, we present an algorithm for converting any naive Bayes classifier into an ODD, and we show theoretically and experimentally that this algorithm can give us an ODD that is tractable in size even given an intractable number of instances. Since ODDs are tractable representations of classifiers, our algorithm allows us to efficiently test the equivalence of two naive Bayes classifiers and characterize discrepancies between them. We also show a number of additional results including a count of distinct classifiers that can be induced by changing some CPT in a naive Bayes classifier, and the range of allowable changes to a CPT which keeps the current classifier unchanged. In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI improves on runtime over a maximum likelihood model-based policy evaluation approach and on both runtime and accuracy over the temporal differencing (TD) policy evaluation approach. We further improve on MCMI policy evaluation by adding an importance sampling technique to our algorithm to reduce the variance of our estimator. Lastly, we illustrate techniques for scaling up MCMI to large state spaces in order to perform policy improvement. In this paper, we introduce a method for approximating the solution to inference and optimization tasks in uncertain and deterministic reasoning. Such tasks are in general intractable for exact algorithms because of the large number of dependency relationships in their structure. Our method effectively maps such a dense problem to a sparser one which is in some sense "closest". Exact methods can be run on the sparser problem to derive bounds on the original answer, which can be quite sharp. We present empirical results demonstrating that our method works well on the tasks of belief inference and finding the probability of the most probable explanation in belief networks, and finding the cost of the solution that violates the smallest number of constraints in constraint satisfaction problems. On one large CPCS network, for example, we were able to calculate upper and lower bounds on the conditional probability of a variable, given evidence, that were almost identical in the average case. We analyze a new property of directed acyclic graphs (DAGs), called layerwidth, arising from a class of DAGs proposed by Eiter and Lukasiewicz. This class of DAGs permits certain problems of structural model-based causality and explanation to be tractably solved. In this paper, we first address an open question raised by Eiter and Lukasiewicz - the computational complexity of deciding whether a given graph has a bounded layerwidth. After proving that this problem is NP-complete, we proceed by proving numerous important properties of layerwidth that are helpful in efficiently computing the optimal layerwidth. Finally, we compare this new DAG property to two other important DAG properties: treewidth and bandwidth. Loopy and generalized belief propagation are popular algorithms for approximate inference in Markov random fields and Bayesian networks. Fixed points of these algorithms correspond to extrema of the Bethe and Kikuchi free energy. However, belief propagation does not always converge, which explains the need for approaches that explicitly minimize the Kikuchi/Bethe free energy, such as CCCP and UPS. Here we describe a class of algorithms that solves this typically nonconvex constrained minimization of the Kikuchi free energy through a sequence of convex constrained minimizations of upper bounds on the Kikuchi free energy. Intuitively one would expect tighter bounds to lead to faster algorithms, which is indeed convincingly demonstrated in our simulations. Several ideas are applied to obtain tight convex bounds that yield dramatic speed-ups over CCCP. Real-world distributed systems and networks are often unreliable and subject to random failures of its components. Such a stochastic behavior affects adversely the complexity of optimization tasks performed routinely upon such systems, in particular, various resource allocation tasks. In this work we investigate and develop Monte Carlo solutions for a class of two-stage optimization problems in stochastic networks in which the expected value of resource allocations before and after stochastic failures needs to be optimized. The limitation of these problems is that their exact solutions are exponential in the number of unreliable network components: thus, exact methods do not scale-up well to large networks often seen in practice. We first prove that Monte Carlo optimization methods can overcome the exponential bottleneck of exact methods. Next we support our theoretical findings on resource allocation experiments and show a very good scale-up potential of the new methods to large stochastic networks. This paper examines a number of solution methods for decision processes with non-Markovian rewards (NMRDPs). They all exploit a temporal logic specification of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to well-known MDP solution methods. They differ however in the representation of the target MDP and the class of MDP solution methods to which they are suited. As a result, they adopt different temporal logics and different translations. Unfortunately, no implementation of these methods nor experimental let alone comparative results have ever been reported. This paper is the first step towards filling this gap. We describe an integrated system for solving NMRDPs which implements these methods and several variants under a common interface; we use it to compare the various approaches and identify the problem features favoring one over the other. This paper studies decision making for Walley's partially consonant belief functions (pcb). In a pcb, the set of foci are partitioned. Within each partition, the foci are nested. The pcb class includes probability functions and possibility functions as extreme cases. Unlike earlier proposals for a decision theory with belief functions, we employ an axiomatic approach. We adopt an axiom system similar in spirit to von Neumann - Morgenstern's linear utility theory for a preference relation on pcb lotteries. We prove a representation theorem for this relation. Utility for a pcb lottery is a combination of linear utility for probabilistic lottery and binary utility for possibilistic lottery. There has been great interest in identifying tractable subclasses of NP complete problems and designing efficient algorithms for these tractable classes. Constraint satisfaction and Bayesian network inference are two examples of such problems that are of great importance in AI and algorithms. In this paper we study, under the frameworks of random constraint satisfaction problems and random Bayesian networks, a typical tractable subclass characterized by the treewidth of the problems. We show that the property of having a bounded treewidth for CSPs and Bayesian network inference problem has a phase transition that occurs while the underlying structures of problems are still sparse. This implies that algorithms making use of treewidth based structural knowledge only work efficiently in a limited range of random instance. This paper presents a scalable Bayesian technique for decentralized state estimation from multiple platforms in dynamic environments. As has long been recognized, centralized architectures impose severe scaling limitations for distributed systems due to the enormous communication overheads. We propose a strictly decentralized approach in which only nearby platforms exchange information. They do so through an interactive communication protocol aimed at maximizing information flow. Our approach is evaluated in the context of a distributed surveillance scenario that arises in a robotic system for playing the game of laser tag. Our results, both from simulation and using physical robots, illustrate an unprecedented scaling capability to large teams of vehicles. MAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network given some evidence. Unlike computing posterior probabilities, or MPE (a special case of MAP), the time and space complexity of structural solutions for MAP are not only exponential in the network treewidth, but in a larger parameter known as the "constrained" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computing posterior probabilities or MPE. This paper introduces a new, simple upper bound on the probability of a MAP solution, which admits a tradeoff between the bound quality and the time needed to compute it. The bound is shown to be generally much tighter than those of other methods of comparable complexity. We use this proposed upper bound to develop a branch-and-bound search algorithm for solving MAP exactly. Experimental results demonstrate that the search algorithm is able to solve many problems that are far beyond the reach of any structure-based method for MAP. For example, we show that the proposed algorithm can compute MAP exactly and efficiently for some networks whose constrained treewidth is more than 40. Group elevator scheduling is an NP-hard sequential decision-making problem with unbounded state spaces and substantial uncertainty. Decision-theoretic reasoning plays a surprisingly limited role in fielded systems. A new opportunity for probabilistic methods has opened with the recent discovery of a tractable solution for the expected waiting times of all passengers in the building, marginalized over all possible passenger itineraries. Though commercially competitive, this solution does not contemplate future passengers. Yet in up-peak traffic, the effects of future passengers arriving at the lobby and entering elevator cars can dominate all waiting times. We develop a probabilistic model of how these arrivals affect the behavior of elevator cars at the lobby, and demonstrate how this model can be used to very significantly reduce the average waiting time of all passengers. This paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima. When greediness is set at maximum, KES corresponds to the greedy equivalence search algorithm (GES). When greediness is kept at minimum, we prove that under mild assumptions KES asymptotically returns any inclusion optimal BN with nonzero probability. Experimental results for both synthetic and real data are reported showing that KES often finds a better local optima than GES. Moreover, we use KES to experimentally confirm that the number of different local optima is often huge. For a given problem, the optimal Markov policy can be considerred as a conditional or contingent plan containing a (potentially large) number of branches. Unfortunately, there are applications where it is desirable to strictly limit the number of decision points and branches in a plan. For example, it may be that plans must later undergo more detailed simulation to verify correctness and safety, or that they must be simple enough to be understood and analyzed by humans. As a result, it may be necessary to limit consideration to plans with only a small number of branches. This raises the question of how one goes about finding optimal plans containing only a limited number of branches. In this paper, we present an any-time algorithm for optimal k-contingency planning (OKP). It is the first optimal algorithm for limited contingency planning that is not an explicit enumeration of possible contingent plans. By modelling the problem as a Partially Observable Markov Decision Process, it implements the Bellman optimality principle and prunes the solution space. We present experimental results of applying this algorithm to some simple test cases. The property of perfectness plays an important role in the theory of Bayesian networks. First, the existence of perfect distributions for arbitrary sets of variables and directed acyclic graphs implies that various methods for reading independence from the structure of the graph (e.g., Pearl, 1988; Lauritzen, Dawid, Larsen & Leimer, 1990) are complete. Second, the asymptotic reliability of various search methods is guaranteed under the assumption that the generating distribution is perfect (e.g., Spirtes, Glymour & Scheines, 2000; Chickering & Meek, 2002). We provide a lower-bound on the probability of sampling a non-perfect distribution when using a fixed number of bits to represent the parameters of the Bayesian network. This bound approaches zero exponentially fast as one increases the number of bits used to represent the parameters. This result implies that perfect distributions with fixed-length representations exist. We also provide a lower-bound on the number of bits needed to guarantee that a distribution sampled from a uniform Dirichlet distribution is perfect with probability greater than 1/2. This result is useful for constructing randomized reductions for hardness proofs. The paper continues the study of partitioning based inference of heuristics for search in the context of solving the Most Probable Explanation task in Bayesian Networks. We compare two systematic Branch and Bound search algorithms, BBBT (for which the heuristic information is constructed during search and allows dynamic variable/value ordering) and its predecessor BBMB (for which the heuristic information is pre-compiled), against a number of popular local search algorithms for the MPE problem. We show empirically that, when viewed as approximation schemes, BBBT/BBMB are superior to all of these best known SLS algorithms, especially when the domain sizes increase beyond 2. This is in contrast with the performance of SLS vs. systematic search on CSP/SAT problems, where SLS often significantly outperforms systematic algorithms. As far as we know, BBBT/BBMB are currently the best performing algorithms for solving the MPE task. Precision achieved by stochastic sampling algorithms for Bayesian networks typically deteriorates in face of extremely unlikely evidence. To address this problem, we propose the Evidence Pre-propagation Importance Sampling algorithm (EPIS-BN), an importance sampling algorithm that computes an approximate importance function by the heuristic methods: loopy belief Propagation and e-cutoff. We tested the performance of e-cutoff on three large real Bayesian networks: ANDES, CPCS, and PATHFINDER. We observed that on each of these networks the EPIS-BN algorithm gives us a considerable improvement over the current state of the art algorithm, the AIS-BN algorithm. In addition, it avoids the costly learning stage of the AIS-BN algorithm. Published experiments on spidering the Web suggest that, given training data in the form of a (relatively small) subgraph of the Web containing a subset of a selected class of target pages, it is possible to conduct a directed search and find additional target pages significantly faster (with fewer page retrievals) than by performing a blind or uninformed random or systematic search, e.g., breadth-first search. If true, this claim motivates a number of practical applications. Unfortunately, these experiments were carried out in specialized domains or under conditions that are difficult to replicate. We present and apply an experimental framework designed to reexamine and resolve the basic claims of the earlier work, so that the supporting experiments can be replicated and built upon. We provide high-performance tools for building experimental spiders, make use of the ground truth and static nature of the WT10g TREC Web corpus, and rely on simple well understand machine learning techniques to conduct our experiments. In this paper, we describe the basic framework, motivate the experimental design, and report on our findings supporting and qualifying the conclusions of the earlier research. We present an application of hierarchical Bayesian estimation to robot map building. The revisiting problem occurs when a robot has to decide whether it is seeing a previously-built portion of a map, or is exploring new territory. This is a difficult decision problem, requiring the probability of being outside of the current known map. To estimate this probability, we model the structure of a "typical" environment as a hidden Markov model that generates sequences of views observed by a robot navigating through the environment. A Dirichlet prior over structural models is learned from previously explored environments. Whenever a robot explores a new environment, the posterior over the model is estimated by Dirichlet hyperparameters. Our approach is implemented and tested in the context of multi-robot map merging, a particularly difficult instance of the revisiting problem. Experiments with robot data show that the technique yields strong improvements over alternative methods. In this paper we examine the problem of inference in Bayesian Networks with discrete random variables that have very large or even unbounded domains. For example, in a domain where we are trying to identify a person, we may have variables that have as domains, the set of all names, the set of all postal codes, or the set of all credit card numbers. We cannot just have big tables of the conditional probabilities, but need compact representations. We provide an inference algorithm, based on variable elimination, for belief networks containing both large domain and normal discrete random variables. We use intensional (i.e., in terms of procedures) and extensional (in terms of listing the elements) representations of conditional probabilities and of the intermediate factors. Ancestral graphs are a class of graphs that encode conditional independence relations arising in DAG models with latent and selection variables, corresponding to marginalization and conditioning. However, for any ancestral graph, there may be several other graphs to which it is Markov equivalent. We introduce a simple representation of a Markov equivalence class of ancestral graphs, thereby facilitating model search. \ More specifically, we define a join operation on ancestral graphs which will associate a unique graph with a Markov equivalence class. We also extend the separation criterion for ancestral graphs (which is an extension of d-separation) and provide a proof of the pairwise Markov property for joined ancestral graphs. The problem of learning Markov equivalence classes of Bayesian network structures may be solved by searching for the maximum of a scoring metric in a space of these classes. This paper deals with the definition and analysis of one such search space. We use a theoretically motivated neighbourhood, the inclusion boundary, and represent equivalence classes by essential graphs. We show that this search space is connected and that the score of the neighbours can be evaluated incrementally. We devise a practical way of building this neighbourhood for an essential graph that is purely graphical and does not explicitely refer to the underlying independences. We find that its size can be intractable, depending on the complexity of the essential graph of the equivalence class. The emphasis is put on the potential use of this space with greedy hill -climbing search The ability to make decisions and to assess potential courses of action is a corner-stone of many AI applications, and usually this requires explicit information about the decision-maker s preferences. IN many applications, preference elicitation IS a serious bottleneck.The USER either does NOT have the time, the knowledge, OR the expert support required TO specify complex multi - attribute utility functions. IN such cases, a method that IS based ON intuitive, yet expressive, preference statements IS required. IN this paper we suggest the USE OF TCP - nets, an enhancement OF CP - nets, AS a tool FOR representing, AND reasoning about qualitative preference statements.We present AND motivate this framework, define its semantics, AND show how it can be used TO perform constrained optimization. The paper presents an iterative version of join-tree clustering that applies the message passing of join-tree clustering algorithm to join-graphs rather than to join-trees, iteratively. It is inspired by the success of Pearl's belief propagation algorithm as an iterative approximation scheme on one hand, and by a recently introduced mini-clustering i. success as an anytime approximation method, on the other. The proposed Iterative Join-graph Propagation IJGP belongs to the class of generalized belief propagation methods, recently proposed using analogy with algorithms in statistical physics. Empirical evaluation of this approach on a number of problem classes demonstrates that even the most time-efficient variant is almost always superior to IBP and MC i, and is sometimes more accurate by as much as several orders of magnitude. Most reinforcement learning methods operate on propositional representations of the world state. Such representations are often intractably large and generalize poorly. Using a deictic representation is believed to be a viable alternative: they promise generalization while allowing the use of existing reinforcement-learning methods. Yet, there are few experiments on learning with deictic representations reported in the literature. In this paper we explore the effectiveness of two forms of deictic representation and a na\"{i}ve propositional representation in a simple blocks-world domain. We find, empirically, that the deictic representations actually worsen learning performance. We conclude with a discussion of possible causes of these results and strategies for more effective learning in domains with objects. We present a principled and efficient planning algorithm for collaborative multiagent dynamical systems. All computation, during both the planning and the execution phases, is distributed among the agents; each agent only needs to model and plan for a small part of the system. Each of these local subsystems is small, but once they are combined they can represent an exponentially larger problem. The subsystems are connected through a subsystem hierarchy. Coordination and communication between the agents is not imposed, but derived directly from the structure of this hierarchy. A globally consistent plan is achieved by a message passing algorithm, where messages correspond to natural local reward functions and are computed by local linear programs; another message passing algorithm allows us to execute the resulting policy. When two portions of the hierarchy share the same structure, our algorithm can reuse plans and messages to speed up computation. We extend the language of influence diagrams to cope with decision scenarios where the order of decisions and observations is not determined. As the ordering of decisions is dependent on the evidence, a step-strategy of such a scenario is a sequence of dependent choices of the next action. A strategy is a step-strategy together with selection functions for decision actions. The structure of a step-strategy can be represented as a DAG with nodes labeled with action variables. We introduce the concept of GS-DAG: a DAG incorporating an optimal step-strategy for any instantiation. We give a method for constructing GS-DAGs, and we show how to use a GS-DAG for determining an optimal strategy. Finally we discuss how analysis of relevant past can be used to reduce the size of the GS-DAG. We describe CFW, a computationally efficient algorithm for collaborative filtering that uses posteriors over weights of evidence. In experiments on real data, we show that this method predicts as well or better than other methods in situations where the size of the user query is small. The new approach works particularly well when the user s query CONTAINS low frequency(unpopular) items.The approach complements that OF dependency networks which perform well WHEN the size OF the query IS large.Also IN this paper, we argue that the USE OF posteriors OVER weights OF evidence IS a natural way TO recommend similar items collaborative - filtering task. We introduce a new Bayesian network (BN) scoring metric called the Global Uniform (GU) metric. This metric is based on a particular type of default parameter prior. Such priors may be useful when a BN developer is not willing or able to specify domain-specific parameter priors. The GU parameter prior specifies that every prior joint probability distribution P consistent with a BN structure S is considered to be equally likely. Distribution P is consistent with S if P includes just the set of independence relations defined by S. We show that the GU metric addresses some undesirable behavior of the BDeu and K2 Bayesian network scoring metrics, which also use particular forms of default parameter priors. A closed form formula for computing GU for special classes of BNs is derived. Efficiently computing GU for an arbitrary BN remains an open problem. This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem. Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudo-polynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest average reward cycle on a DMDP problem in heta(n^2) iterations, or heta(mn^2) total time, where n denotes the number of states, and m the number of edges. We give two extensions of value iteration that solve the DMDP in heta(mn) time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs. This paper is about searching the combinatorial space of contingency tables during the inner loop of a nonlinear statistical optimization. Examples of this operation in various data analytic communities include searching for nonlinear combinations of attributes that contribute significantly to a regression (Statistics), searching for items to include in a decision list (machine learning) and association rule hunting (Data Mining). This paper investigates a new, efficient approach to this class of problems, called RADSEARCH (Real-valued All-Dimensions-tree Search). RADSEARCH finds the global optimum, and this gives us the opportunity to empirically evaluate the question: apart from algorithmic elegance what does this attention to optimality buy us? We compare RADSEARCH with other recent successful search algorithms such as CN2, PRIM, APriori, OPUS and DenseMiner. Finally, we introduce RADREG, a new regression algorithm for learning real-valued outputs based on RADSEARCHing for high-order interactions. In this paper we present a language for finite state continuous time Bayesian networks (CTBNs), which describe structured stochastic processes that evolve over continuous time. The state of the system is decomposed into a set of local variables whose values change over time. The dynamics of the system are described by specifying the behavior of each local variable as a function of its parents in a directed (possibly cyclic) graph. The model specifies, at any given point in time, the distribution over two aspects: when a local variable changes its value and the next value it takes. These distributions are determined by the variable s CURRENT value AND the CURRENT VALUES OF its parents IN the graph.More formally, each variable IS modelled AS a finite state continuous time Markov process whose transition intensities are functions OF its parents.We present a probabilistic semantics FOR the language IN terms OF the generative model a CTBN defines OVER sequences OF events.We list types OF queries one might ask OF a CTBN, discuss the conceptual AND computational difficulties associated WITH exact inference, AND provide an algorithm FOR approximate inference which takes advantage OF the structure within the process. We develop a model of how information flows into a market, and derive algorithms for automatically detecting and explaining relevant events. We analyze data from twenty-two "political stock markets" (i.e., betting markets on political outcomes) on the Iowa Electronic Market (IEM). We prove that, under certain efficiency assumptions, prices in such betting markets will on average approach the correct outcomes over time, and show that IEM data conforms closely to the theory. We present a simple model of a betting market where information is revealed over time, and show a qualitative correspondence between the model and real market data. We also present an algorithm for automatically detecting significant events and generating semantic explanations of their origin. The algorithm operates by discovering significant changes in vocabulary on online news sources (using expected entropy loss) that align with major price spikes in related betting markets. Quantification is well known to be a major obstacle in the construction of a probabilistic network, especially when relying on human experts for this purpose. The construction of a qualitative probabilistic network has been proposed as an initial step in a network s quantification, since the qualitative network can be used TO gain preliminary insight IN the projected networks reasoning behaviour. We extend on this idea and present a new type of network in which both signs and numbers are specified; we further present an associated algorithm for probabilistic inference. Building upon these semi-qualitative networks, a probabilistic network can be quantified and studied in a stepwise manner. As a result, modelling inadequacies can be detected and amended at an early stage in the quantification process. Typical Recommender systems adopt a static view of the recommendation process and treat it as a prediction problem. We argue that it is more appropriate to view the problem of generating recommendations as a sequential decision problem and, consequently, that Markov decision processes (MDP) provide a more appropriate model for Recommender systems. MDPs introduce two benefits: they take into account the long-term effects of each recommendation, and they take into account the expected value of each recommendation. To succeed in practice, an MDP-based Recommender system must employ a strong initial model; and the bulk of this paper is concerned with the generation of such a model. In particular, we suggest the use of an n-gram predictive model for generating the initial MDP. Our n-gram model induces a Markov-chain model of user behavior whose predictive accuracy is greater than that of existing predictive models. We describe our predictive model in detail and evaluate its performance on real data. In addition, we show how the model can be used in an MDP-based Recommender system. In many supervised learning tasks, the entities to be labeled are related to each other in complex ways and their labels are not independent. For example, in hypertext classification, the labels of linked pages are highly correlated. A standard approach is to classify each entity independently, ignoring the correlations between them. Recently, Probabilistic Relational Models, a relational version of Bayesian networks, were used to define a joint probabilistic model for a collection of related entities. In this paper, we present an alternative framework that builds on (conditional) Markov networks and addresses two limitations of the previous approach. First, undirected models do not impose the acyclicity constraint that hinders representation of many important relational dependencies in directed models. Second, undirected models are well suited for discriminative training, where we optimize the conditional likelihood of the labels given the features, which generally improves classification accuracy. We show how to train these models effectively, and how to use approximate probabilistic inference over the learned model for collective classification of multiple related entities. We provide experimental results on a webpage classification task, showing that accuracy can be significantly improved by modeling relational dependencies. A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic (FLTL) to express rewards. Our translation has the effect of embedding model-checking in the solution method. It results in an MDP of the minimal size achievable without stepping outside the anytime framework, and consequently in better policies by the deadline. We propose an efficient method for Bayesian network inference in models with functional dependence. We generalize the multiplicative factorization method originally designed by Takikawa and D Ambrosio(1999) FOR models WITH independence OF causal influence.Using a hidden variable, we transform a probability potential INTO a product OF two - dimensional potentials.The multiplicative factorization yields more efficient inference. FOR example, IN junction tree propagation it helps TO avoid large cliques. IN ORDER TO keep potentials small, the number OF states OF the hidden variable should be minimized.We transform this problem INTO a combinatorial problem OF minimal base IN a particular space.We present an example OF a computerized adaptive test, IN which the factorization method IS significantly more efficient than previous inference methods. This paper uses decision-theoretic principles to obtain new insights into the assessment and updating of probabilities. First, a new foundation of Bayesianism is given. It does not require infinite atomless uncertainties as did Savage s classical result, AND can therefore be applied TO ANY finite Bayesian network.It neither requires linear utility AS did de Finetti s classical result, AND r ntherefore allows FOR the empirically AND normatively desirable risk r naversion.Finally, BY identifying AND fixing utility IN an elementary r nmanner, our result can readily be applied TO identify methods OF r nprobability updating.Thus, a decision - theoretic foundation IS given r nto the computationally efficient method OF inductive reasoning r ndeveloped BY Rudolf Carnap.Finally, recent empirical findings ON r nprobability assessments are discussed.It leads TO suggestions FOR r ncorrecting biases IN probability assessments, AND FOR an alternative r nto the Dempster - Shafer belief functions that avoids the reduction TO r ndegeneracy after multiple updatings.r n Iterative Proportional Fitting (IPF), combined with EM, is commonly used as an algorithm for likelihood maximization in undirected graphical models. In this paper, we present two iterative algorithms that generalize upon IPF. The first one is for likelihood maximization in discrete chain factor graphs, which we define as a wide class of discrete variable models including undirected graphical models and Bayesian networks, but also chain graphs and sigmoid belief networks. The second one is for conditional likelihood maximization in standard undirected models and Bayesian networks. In both algorithms, the iteration steps are expressed in closed form. Numerical simulations show that the algorithms are competitive with state of the art methods. We select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blum & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find "good" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem. Possibility theory offers either a qualitive, or a numerical framework for representing uncertainty, in terms of dual measures of possibility and necessity. This leads to the existence of two kinds of possibilistic causal graphs where the conditioning is either based on the minimum, or the product operator. Benferhat et al. (1999) have investigated the connections between min-based graphs and possibilistic logic bases (made of classical formulas weighted in terms of certainty). This paper deals with a more difficult issue : the product-based graphical representations of possibilistic bases, which provides an easy structural reading of possibilistic bases. Moreover, this paper also provides another reading of possibilistic bases in terms of comparative preferences of the form "in the context p, q is preferred to not q". This enables us to explicit preferences underlying a set of goals with different levels of priority. The currently most efficient algorithm for inference with a probabilistic network builds upon a triangulation of a network's graph. In this paper, we show that pre-processing can help in finding good triangulations forprobabilistic networks, that is, triangulations with a minimal maximum clique size. We provide a set of rules for stepwise reducing a graph, without losing optimality. This reduction allows us to solve the triangulation problem on a smaller graph. From the smaller graph's triangulation, a triangulation of the original graph is obtained by reversing the reduction steps. Our experimental results show that the graphs of some well-known real-life probabilistic networks can be triangulated optimally just by preprocessing; for other networks, huge reductions in their graph's size are obtained. We present two sampling algorithms for probabilistic confidence inference in Bayesian networks. These two algorithms (we call them AIS-BN-mu and AIS-BN-sigma algorithms) guarantee that estimates of posterior probabilities are with a given probability within a desired precision bound. Our algorithms are based on recent advances in sampling algorithms for (1) estimating the mean of bounded random variables and (2) adaptive importance sampling in Bayesian networks. In addition to a simple stopping rule for sampling that they provide, the AIS-BN-mu and AIS-BN-sigma algorithms are capable of guiding the learning process in the AIS-BN algorithm. An empirical evaluation of the proposed algorithms shows excellent performance, even for very unlikely evidence. In a causal graphical model, an instrument for a variable X and its effect Y is a random variable that is a cause of X and independent of all the causes of Y except X. (Pearl (1995), Spirtes et al (2000)). Instrumental variables can be used to estimate how the distribution of an effect will respond to a manipulation of its causes, even in the presence of unmeasured common causes (confounders). In typical instrumental variable estimation, instruments are chosen based on domain knowledge. There is currently no statistical test for validating a variable as an instrument. In this paper, we introduce the concept of semi-instrument, which generalizes the concept of instrument. We show that in the framework of additive models, under certain conditions, we can test whether a variable is semi-instrumental. Moreover, adding some distribution assumptions, we can test whether two semi-instruments are instrumental. We give algorithms to estimate the p-value that a random variable is semi-instrumental, and the p-value that two semi-instruments are both instrumental. These algorithms can be used to test the experts' choice of instruments, or to identify instruments automatically. On roads showing significant violations of posted speed limits, one measure of the safety effect of speeding is the difference between the road's actual accident count and the count that would have occurred if the posted speed limit had been strictly obeyed. An estimate of this accident reduction can be had by computing the probability that speeding was a necessary condition for each of set of accidents. This is an instance of assessing individual probabilities of causation, which is generally not possible absent prior knowledge of causal structure. For traffic accidents such prior knowledge is often available and this paper illustrates how, for a commonly occurring class of vehicle/pedestrian accidents, approaches to uncertainty and causal analyses appearing in the accident reconstruction literature can be unified using Bayesian networks. Measured skidmarks, pedestrian throw distances, and pedestrian injury severity are treated as evidence, and using the Gibbs Sampling routine BUGS, the posterior probability distribution over exogenous variables, such as the vehicle's initial speed, location, and driver reaction time, is computed. This posterior distribution is then used to compute the "probability of necessity" for speeding. In this paper, we present an efficient way of performing stepwise selection in the class of decomposable models. The main contribution of the paper is a simple characterization of the edges that canbe added to a decomposable model while keeping the resulting model decomposable and an efficient algorithm for enumerating all such edges for a given model in essentially O(1) time per edge. We also discuss how backward selection can be performed efficiently using our data structures.We also analyze the complexity of the complete stepwise selection procedure, including the complexity of choosing which of the eligible dges to add to (or delete from) the current model, with the aim ofminimizing the Kullback-Leibler distance of the resulting model from the saturated model for the data. Global variational approximation methods in graphical models allow efficient approximate inference of complex posterior distributions by using a simpler model. The choice of the approximating model determines a tradeoff between the complexity of the approximation procedure and the quality of the approximation. In this paper, we consider variational approximations based on two classes of models that are richer than standard Bayesian networks, Markov networks or mixture models. As such, these classes allow to find better tradeoffs in the spectrum of approximations. The first class of models are chain graphs, which capture distributions that are partially directed. The second class of models are directed graphs (Bayesian networks) with additional latent variables. Both classes allow representation of multi-variable dependencies that cannot be easily represented within a Bayesian network. A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. Detecting hidden variables poses two problems: determining the relations to other variables in the model and determining the number of states of the hidden variable. In this paper, we address the latter problem in the context of Bayesian networks. We describe an approach that utilizes a score-based agglomerative state-clustering. As we show, this approach allows us to efficiently evaluate models with a range of cardinalities for the hidden variable. We show how to extend this procedure to deal with multiple interacting hidden variables. We demonstrate the effectiveness of this approach by evaluating it on synthetic and real-life data. We show that our approach learns models with hidden variables that generalize better and have better structure than previous approaches. The Information bottleneck method is an unsupervised non-parametric data organization technique. Given a joint distribution P(A,B), this method constructs a new variable T that extracts partitions, or clusters, over the values of A that are informative about B. The information bottleneck has already been applied to document classification, gene expression, neural code, and spectral analysis. In this paper, we introduce a general principled framework for multivariate extensions of the information bottleneck method. This allows us to consider multiple systems of data partitions that are inter-related. Our approach utilizes Bayesian networks for specifying the systems of clusters and what information each captures. We show that this construction provides insight about bottleneck variations and enables us to characterize solutions of these variations. We also present a general framework for iterative algorithms for constructing solutions, and apply it to several examples. In this paper we analyze two recent axiomatic approaches proposed by Dubois et al and by Giang and Shenoy to qualitative decision making where uncertainty is described by possibility theory. Both axiomtizations are inspired by von Neumann and Morgenstern's system of axioms for the case of probability theory. We show that our approach naturally unifies two axiomatic systems that correspond respectively to pessimistic and optimistic decision criteria proposed by Dubois et al. The simplifying unification is achieved by (i) replacing axioms that are supposed to reflect two informational attitudes (uncertainty aversion and uncertainty attraction) by an axiom that imposes order on set of standard lotteries and (ii) using a binary utility scale in which each utility level is represented by a pair of numbers. Planning problems are hard, motion planning, for example, isPSPACE-hard. Such problems are even more difficult in the presence of uncertainty. Although, Markov Decision Processes (MDPs) provide a formal framework for such problems, finding solutions to high dimensional continuous MDPs is usually difficult, especially when the actions and time measurements are continuous. Fortunately, problem-specific knowledge allows us to design controllers that are good locally, though having no global guarantees. We propose a method of nonparametrically combining local controllers to obtain globally good solutions. We apply this formulation to two types of problems : motion planning (stochastic shortest path) and discounted MDPs. For motion planning, we argue that usual MDP optimality criterion (expected cost) may not be practically relevant. Wepropose an alternative: finding the minimum cost path,subject to the constraint that the robot must reach the goal withhigh probability. For this problem, we prove that a polynomial number of samples is sufficient to obtain a high probability path. For discounted MDPs, we propose a formulation that explicitly deals with model uncertainty, i.e., the problem introduced when transition probabilities are not known exactly. We formulate the problem as a robust linear program which directly incorporates this type of uncertainty. In this work we focus on efficient heuristics for solving a class of stochastic planning problems that arise in a variety of business, investment, and industrial applications. The problem is best described in terms of future buy and sell contracts. By buying less reliable, but less expensive, buy (supply) contracts, a company or a trader can cover a position of more reliable and more expensive sell contracts. The goal is to maximize the expected net gain (profit) by constructing a dose to optimum portfolio out of the available buy and sell contracts. This stochastic planning problem can be formulated as a two-stage stochastic linear programming problem with recourse. However, this formalization leads to solutions that are exponential in the number of possible failure combinations. Thus, this approach is not feasible for large scale problems. In this work we investigate heuristic approximation techniques alleviating the efficiency problem. We primarily focus on the clustering approach and devise heuristics for finding clusterings leading to good approximations. We illustrate the quality and feasibility of the approach through experimental data. We are developing a general framework for using learned Bayesian models for decision-theoretic control of search and reasoningalgorithms. We illustrate the approach on the specific task of controlling both general and domain-specific solvers on a hard class of structured constraint satisfaction problems. A successful strategyfor reducing the high (and even infinite) variance in running time typically exhibited by backtracking search algorithms is to cut off and restart the search if a solution is not found within a certainamount of time. Previous work on restart strategies have employed fixed cut off values. We show how to create a dynamic cut off strategy by learning a Bayesian model that predicts the ultimate length of a trial based on observing the early behavior of the search algorithm. Furthermore, we describe the general conditions under which a dynamic restart strategy can outperform the theoretically optimal fixed strategy. Every directed acyclic graph (DAG) over a finite non-empty set of variables (= nodes) N induces an independence model over N, which is a list of conditional independence statements over N.The inclusion problem is how to characterize (in graphical terms) whether all independence statements in the model induced by a DAG K are in the model induced by a second DAG L. Meek (1997) conjectured that this inclusion holds iff there exists a sequence of DAGs from L to K such that only certain 'legal' arrow reversal and 'legal' arrow adding operations are performed to get the next DAG in the sequence.In this paper we give several characterizations of inclusion of DAG models and verify Meek's conjecture in the case that the DAGs K and L differ in at most one adjacency. As a warming up a rigorous proof of well-known graphical characterizations of equivalence of DAGs, which is a highly related problem, is given. The search space of Bayesian Network structures is usually defined as Acyclic Directed Graphs (DAGs) and the search is done by local transformations of DAGs. But the space of Bayesian Networks is ordered by DAG Markov model inclusion and it is natural to consider that a good search policy should take this into account. First attempt to do this (Chickering 1996) was using equivalence classes of DAGs instead of DAGs itself. This approach produces better results but it is significantly slower. We present a compromise between these two approaches. It uses DAGs to search the space in such a way that the ordering by inclusion is taken into account. This is achieved by repetitive usage of local moves within the equivalence class of DAGs. We show that this new approach produces better results than the original DAGs approach without substantial change in time complexity. We present empirical results, within the framework of heuristic search and Markov Chain Monte Carlo, provided through the Alarm dataset. An important subclass of hybrid Bayesian networks are those that represent Conditional Linear Gaussian (CLG) distributions --- a distribution with a multivariate Gaussian component for each instantiation of the discrete variables. In this paper we explore the problem of inference in CLGs. We show that inference in CLGs can be significantly harder than inference in Bayes Nets. In particular, we prove that even if the CLG is restricted to an extremely simple structure of a polytree in which every continuous node has at most one discrete ancestor, the inference task is NP-hard.To deal with the often prohibitive computational cost of the exact inference algorithm for CLGs, we explore several approximate inference algorithms. These algorithms try to find a small subset of Gaussians which are a good approximation to the full mixture distribution. We consider two Monte Carlo approaches and a novel approach that enumerates mixture components in order of prior probability. We compare these methods on a variety of problems and show that our novel algorithm is very promising for large, hybrid diagnosis problems. In this paper we present a method ofcomputing the posterior probability ofconditional independence of two or morecontinuous variables from data,examined at several resolutions. Ourapproach is motivated by theobservation that the appearance ofcontinuous data varies widely atvarious resolutions, producing verydifferent independence estimatesbetween the variablesinvolved. Therefore, it is difficultto ascertain independence withoutexamining data at several carefullyselected resolutions. In our paper, weaccomplish this using the exactcomputation of the posteriorprobability of independence, calculatedanalytically given a resolution. Ateach examined resolution, we assume amultinomial distribution with Dirichletpriors for the discretized tableparameters, and compute the posteriorusing Bayesian integration. Acrossresolutions, we use a search procedureto approximate the Bayesian integral ofprobability over an exponential numberof possible histograms. Our methodgeneralizes to an arbitrary numbervariables in a straightforward manner.The test is suitable for Bayesiannetwork learning algorithms that useindependence tests to infer the networkstructure, in domains that contain anymix of continuous, ordinal andcategorical variables. We consider the task of aggregating beliefs of severalexperts. We assume that these beliefs are represented as probabilitydistributions. We argue that the evaluation of any aggregationtechnique depends on the semantic context of this task. We propose aframework, in which we assume that nature generates samples from a`true' distribution and different experts form their beliefs based onthe subsets of the data they have a chance to observe. Naturally, theideal aggregate distribution would be the one learned from thecombined sample sets. Such a formulation leads to a natural way tomeasure the accuracy of the aggregation mechanism.We show that the well-known aggregation operator LinOP is ideallysuited for that task. We propose a LinOP-based learning algorithm,inspired by the techniques developed for Bayesian learning, whichaggregates the experts' distributions represented as Bayesiannetworks. Our preliminary experiments show that this algorithmperforms well in practice. This paper presents a new deterministic approximation technique in Bayesian networks. This method, "Expectation Propagation", unifies two previous techniques: assumed-density filtering, an extension of the Kalman filter, and loopy belief propagation, an extension of belief propagation in Bayesian networks. All three algorithms try to recover an approximate distribution which is close in KL divergence to the true distribution. Loopy belief propagation, because it propagates exact belief states, is useful for a limited class of belief networks, such as those which are purely discrete. Expectation Propagation approximates the belief states by only retaining certain expectations, such as mean and variance, and iterates until these expectations are consistent throughout the network. This makes it applicable to hybrid networks with discrete and continuous nodes. Expectation Propagation also extends belief propagation in the opposite direction - it can propagate richer belief states that incorporate correlations between nodes. Experiments with Gaussian mixture models show Expectation Propagation to be convincingly better than methods with similar computational cost: Laplace's method, variational Bayes, and Monte Carlo. Expectation Propagation also provides an efficient algorithm for training Bayes point machine classifiers. The Factored Frontier (FF) algorithm is a simple approximate inferencealgorithm for Dynamic Bayesian Networks (DBNs). It is very similar tothe fully factorized version of the Boyen-Koller (BK) algorithm, butinstead of doing an exact update at every step followed bymarginalisation (projection), it always works with factoreddistributions. Hence it can be applied to models for which the exactupdate step is intractable. We show that FF is equivalent to (oneiteration of) loopy belief propagation (LBP) on the original DBN, andthat BK is equivalent (to one iteration of) LBP on a DBN where wecluster some of the nodes. We then show empirically that byiterating, LBP can improve on the accuracy of both FF and BK. Wecompare these algorithms on two real-world DBNs: the first is a modelof a water treatment plant, and the second is a coupled HMM, used tomodel freeway traffic. A standard approach to approximate inference in state-space models isto apply a particle filter, e.g., the Condensation Algorithm.However, the performance of particle filters often varies significantlydue to their stochastic nature.We present a class of algorithms, called lattice particle filters, thatcircumvent this difficulty by placing the particles deterministicallyaccording to a Quasi-Monte Carlo integration rule.We describe a practical realization of this idea, discuss itstheoretical properties, and its efficiency.Experimental results with a synthetic 2D tracking problem show that thelattice particle filter is equivalent to a conventional particle filterthat has between 10 and 60% more particles, depending ontheir "sparsity" in the state-space.We also present results on inferring 3D human motion frommoving light displays. MAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network, given evidence. Unlike computing marginals, posteriors, and MPE (a special case of MAP), the time and space complexity of MAP is not only exponential in the network treewidth, but also in a larger parameter known as the "constrained" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computingposteriors or MPE. Thus, practitioners generally avoid MAP computations, resorting instead to approximating them by the most likely value for each MAP variableseparately, or by MPE.We present a method for approximating MAP using local search. This method has space complexity which is exponential onlyin the treewidth, as is the complexity of each search step. We investigate the effectiveness of different local searchmethods and several initialization strategies and compare them to otherapproximation schemes.Experimental results show that local search provides a much more accurate approximation of MAP, while requiring few search steps.Practically, this means that the complexity of local search is often exponential only in treewidth as opposed to the constrained treewidth, making approximating MAP as efficient as other computations. Suppose we are given the conditional probability of one variable given some other variables.Normally the full joint distribution over the conditioning variablesis required to determine the probability of the conditioned variable.Under what circumstances are the marginal distributions over the conditioning variables sufficient to determine the probability ofthe conditioned variable?Sufficiency in this sense is equivalent to additive separability ofthe conditional probability distribution.Such separability structure is natural and can be exploited forefficient inference.Separability has a natural generalization to conditional separability.Separability provides a precise notion of weaklyinteracting subsystems in temporal probabilistic models.Given a system that is decomposed into separable subsystems, exactmarginal probabilities over subsystems at future points in time can becomputed by propagating marginal subsystem probabilities, rather thancomplete system joint probabilities.Thus, separability can make exact prediction tractable.However, observations can break separability,so exact monitoring of dynamic systems remains hard. There is increasing interest within the research community in the design and use of recursive probability models. Although there still remains concern about computational complexity costs and the fact that computing exact solutions can be intractable for many nonrecursive models and impossible in the general case for recursive problems, several research groups are actively developing computational techniques for recursive stochastic languages. We have developed an extension to the traditional lambda-calculus as a framework for families of Turing complete stochastic languages. We have also developed a class of exact inference algorithms based on the traditional reductions of the lambda-calculus. We further propose that using the deBruijn notation (a lambda-calculus notation with nameless dummies) supports effective caching in such systems (caching being an essential component of efficient computation). Finally, our extension to the lambda-calculus offers a foundation and general theory for the construction of recursive stochastic modeling languages as well as promise for effective caching and efficient approximation algorithms for inference. We consider the problem of approximate belief-state monitoring using particle filtering for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP). While particle filtering has become a widely-used tool in AI for monitoring dynamical systems, rather scant attention has been paid to their use in the context of decision making. Assuming the existence of a value function, we derive error bounds on decision quality associated with filtering using importance sampling. We also describe an adaptive procedure that can be used to dynamically determine the number of samples required to meet specific error bounds. Empirical evidence is offered supporting this technique as a profitable means of directing sampling effort where it is needed to distinguish policies. We investigate a model for planning under uncertainty with temporallyextended actions, where multiple actions can be taken concurrently at each decision epoch. Our model is based on the options framework, and combines it with factored state space models,where the set of options can be partitioned into classes that affectdisjoint state variables. We show that the set of decisionepochs for concurrent options defines a semi-Markov decisionprocess, if the underlying temporally extended actions being parallelized arerestricted to Markov options. This property allows us to use SMDPalgorithms for computing the value function over concurrentoptions. The concurrent options model allows overlapping execution ofoptions in order to achieve higher performance or in order to performa complex task. We describe a simple experiment using a navigationtask which illustrates how concurrent options results in a faster planwhen compared to the case when only one option is taken at a time. We present a new method for estimating the expected return of a POMDP from experience. The method does not assume any knowledge of the POMDP and allows the experience to be gathered from an arbitrary sequence of policies. The return is estimated for any new policy of the POMDP. We motivate the estimator from function-approximation and importance sampling points-of-view and derive its theoretical properties. Although the estimator is biased, it has low variance and the bias is often irrelevant when the estimator is used for pair-wise comparisons. We conclude by extending the estimator to policies with memory and compare its performance in a greedy search algorithm to REINFORCE algorithms showing an order of magnitude reduction in the number of trials required. Chow and Liu (1968) studied the problem of learning a maximumlikelihood Markov tree. We generalize their work to more complexMarkov networks by considering the problem of learning a maximumlikelihood Markov network of bounded complexity. We discuss howtree-width is in many ways the appropriate measure of complexity andthus analyze the problem of learning a maximum likelihood Markovnetwork of bounded tree-width.Similar to the work of Chow and Liu, we are able to formalize thelearning problem as a combinatorial optimization problem on graphs. Weshow that learning a maximum likelihood Markov network of boundedtree-width is equivalent to finding a maximum weight hypertree. Thisequivalence gives rise to global, integer-programming based,approximation algorithms with provable performance guarantees, for thelearning problem. This contrasts with heuristic local-searchalgorithms which were previously suggested (e.g. by Malvestuto 1991).The equivalence also allows us to study the computational hardness ofthe learning problem. We show that learning a maximum likelihoodMarkov network of bounded tree-width is NP-hard, and discuss thehardness of approximation. A Bayesian Belief Network (BN) is a model of a joint distribution over a setof n variables, with a DAG structure to represent the immediate dependenciesbetween the variables, and a set of parameters (aka CPTables) to represent thelocal conditional probabilities of a node, given each assignment to itsparents. In many situations, these parameters are themselves random variables - this may reflect the uncertainty of the domain expert, or may come from atraining sample used to estimate the parameter values. The distribution overthese "CPtable variables" induces a distribution over the response the BNwill return to any "What is Pr(H | E)?" query. This paper investigates thevariance of this response, showing first that it is asymptotically normal,then providing its mean and asymptotical variance. We then present aneffective general algorithm for computing this variance, which has the samecomplexity as simply computing the (mean value of) the response itself - ie,O(n 2^w), where n is the number of variables and w is the effective treewidth. Finally, we provide empirical evidence that this algorithm, whichincorporates assumptions and approximations, works effectively in practice,given only small samples. With the advance of efficient analytical methods for sensitivity analysis ofprobabilistic networks, the interest in the sensitivities revealed by real-life networks is rekindled. As the amount of data resulting from a sensitivity analysis of even a moderately-sized network is alreadyoverwhelming, methods for extracting relevant information are called for. One such methodis to study the derivative of the sensitivity functions yielded for a network's parameters. We further propose to build upon the concept of admissible deviation, that is, the extent to which a parameter can deviate from the true value without inducing a change in the most likely outcome. We illustrate these concepts by means of a sensitivity analysis of a real-life probabilistic network in oncology. There exist a number of reinforcement learning algorithms which learnby climbing the gradient of expected reward. Their long-runconvergence has been proved, even in partially observableenvironments with non-deterministic actions, and without the need fora system model. However, the variance of the gradient estimator hasbeen found to be a significant practical problem. Recent approacheshave discounted future rewards, introducing a bias-variance trade-offinto the gradient estimate. We incorporate a reward baseline into thelearning system, and show that it affects variance without introducingfurther bias. In particular, as we approach the zero-bias,high-variance parameterization, the optimal (or variance minimizing)constant reward baseline is equal to the long-term average expectedreward. Modified policy-gradient algorithms are presented, and anumber of experiments demonstrate their improvement over previous work. We present a novel inference algorithm for arbitrary, binary, undirected graphs. Unlike loopy belief propagation, which iterates fixed point equations, we directly descend on the Bethe free energy. The algorithm consists of two phases, first we update the pairwise probabilities, given the marginal probabilities at each unit,using an analytic expression. Next, we update the marginal probabilities, given the pairwise probabilities by following the negative gradient of the Bethe free energy. Both steps are guaranteed to decrease the Bethe free energy, and since it is lower bounded, the algorithm is guaranteed to converge to a local minimum. We also show that the Bethe free energy is equal to the TAP free energy up to second order in the weights. In experiments we confirm that when belief propagation converges it usually finds identical solutions as our belief optimization method. However, in cases where belief propagation fails to converge, belief optimization continues to converge to reasonable beliefs. The stable nature of belief optimization makes it ideally suited for learning graphical models from data. Automatic continuous speech recognition (CSR) is sufficiently mature that a variety of real world applications are now possible including large vocabulary transcription and interactive spoken dialogues. This paper reviews the evolution of the statistical modelling techniques which underlie current-day systems, specifically hidden Markov models (HMMs) and N-grams. Starting from a description of the speech signal and its parameterisation, the various modelling assumptions and their consequences are discussed. It then describes various techniques by which the effects of these assumptions can be mitigated. Despite the progress that has been made, the limitations of current modelling techniques are still evident. The paper therefore concludes with a brief review of some of the more fundamental modelling work now in progress. Uncertainty plays a central role in spoken dialogue systems. Some stochastic models like Markov decision process (MDP) are used to model the dialogue manager. But the partially observable system state and user intention hinder the natural representation of the dialogue state. MDP-based system degrades fast when uncertainty about a user's intention increases. We propose a novel dialogue model based on the partially observable Markov decision process (POMDP). We use hidden system states and user intentions as the state set, parser results and low-level information as the observation set, domain actions and dialogue repair actions as the action set. Here the low-level information is extracted from different input modals, including speech, keyboard, mouse, etc., using Bayesian networks. Because of the limitation of the exact algorithms, we focus on heuristic approximation algorithms and their applicability in POMDP for dialogue management. We also propose two methods for grid point selection in grid-based approximation algorithms. In this paper we present a propositional logic programming language for reasoning under possibilistic uncertainty and representing vague knowledge. Formulas are represented by pairs (A, c), where A is a many-valued proposition and c is value in the unit interval [0,1] which denotes a lower bound on the belief on A in terms of necessity measures. Belief states are modeled by possibility distributions on the set of all many-valued interpretations. In this framework, (i) we define a syntax and a semantics of the general underlying uncertainty logic; (ii) we provide a modus ponens-style calculus for a sublanguage of Horn-rules and we prove that it is complete for determining the maximum degree of possibilistic belief with which a fuzzy propositional variable can be entailed from a set of formulas; and finally, (iii) we show how the computation of a partial matching between fuzzy propositional variables, in terms of necessity measures for fuzzy sets, can be included in our logic programming system. Planning for distributed agents with partial state information is considered from a decision- theoretic perspective. We describe generalizations of both the MDP and POMDP models that allow for decentralized control. For even a small number of agents, the finite-horizon problems corresponding to both of our models are complete for nondeterministic exponential time. These complexity results illustrate a fundamental difference between centralized and decentralized control of Markov processes. In contrast to the MDP and POMDP problems, the problems we consider provably do not admit polynomial-time algorithms and most likely require doubly exponential time to solve in the worst case. We have thus provided mathematical evidence corresponding to the intuition that decentralized planning problems cannot easily be reduced to centralized problems and solved exactly using established techniques. In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and class-conditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters. Decision theory does not traditionally include uncertainty over utility functions. We argue that the a person's utility value for a given outcome can be treated as we treat other domain attributes: as a random variable with a density function over its possible values. We show that we can apply statistical density estimation techniques to learn such a density function from a database of partially elicited utility functions. In particular, we define a Bayesian learning framework for this problem, assuming the distribution over utilities is a mixture of Gaussians, where the mixture components represent statistically coherent subpopulations. We can also extend our techniques to the problem of discovering generalized additivity structure in the utility functions in the population. We define a Bayesian model selection criterion for utility function structure and a search procedure over structures. The factorization of the utilities in the learned model, and the generalization obtained from density estimation, allows us to provide robust estimates of utilities using a significantly smaller number of utility elicitation questions. We experiment with our technique on synthetic utility data and on a real database of utility functions in the domain of prenatal diagnosis. Monte Carlo sampling has become a major vehicle for approximate inference in Bayesian networks. In this paper, we investigate a family of related simulation approaches, known collectively as quasi-Monte Carlo methods based on deterministic low-discrepancy sequences. We first outline several theoretical aspects of deterministic low-discrepancy sequences, show three examples of such sequences, and then discuss practical issues related to applying them to belief updating in Bayesian networks. We propose an algorithm for selecting direction numbers for Sobol sequence. Our experimental results show that low-discrepancy sequences (especially Sobol sequence) significantly improve the performance of simulation algorithms in Bayesian networks compared to Monte Carlo sampling. A simple advertising strategy that can be used to help increase sales of a product is to mail out special offers to selected potential customers. Because there is a cost associated with sending each offer, the optimal mailing strategy depends on both the benefit obtained from a purchase and how the offer affects the buying behavior of the customers. In this paper, we describe two methods for partitioning the potential customers into groups, and show how to perform a simple cost-benefit analysis to decide which, if any, of the groups should be targeted. In particular, we consider two decision-tree learning algorithms. The first is an "off the shelf" algorithm used to model the probability that groups of customers will buy the product. The second is a new algorithm that is similar to the first, except that for each group, it explicitly models the probability of purchase under the two mailing scenarios: (1) the mail is sent to members of that group and (2) the mail is not sent to members of that group. Using data from a real-world advertising experiment, we compare the algorithms to each other and to a naive mail-to-all strategy. This paper describes a Bayesian method for learning causal networks using samples that were selected in a non-random manner from a population of interest. Examples of data obtained by non-random sampling include convenience samples and case-control data in which a fixed number of samples with and without some condition is collected; such data are not uncommon. The paper describes a method for combining data under selection with prior beliefs in order to derive a posterior probability for a model of the causal processes that are generating the data in the population of interest. The priors include beliefs about the nature of the non-random sampling procedure. Although exact application of the method would be computationally intractable for most realistic datasets, efficient special-case and approximation methods are discussed. Finally, the paper describes how to combine learning under selection with previous methods for learning from observational and experimental data that are obtained on random samples of the population of interest. The net result is a Bayesian methodology that supports causal modeling and discovery from a rich mixture of different types of data. This paper analyzes independence concepts for sets of probability measures associated with directed acyclic graphs. The paper shows that epistemic independence and the standard Markov condition violate desirable separation properties. The adoption of a contraction condition leads to d-separation but still fails to guarantee a belief separation property. To overcome this unsatisfactory situation, a strong Markov condition is proposed, based on epistemic independence. The main result is that the strong Markov condition leads to strong independence and does enforce separation properties; this result implies that (1) separation properties of Bayesian networks do extend to epistemic independence and sets of probability measures, and (2) strong independence has a clear justification based on epistemic independence and the strong Markov condition. We present a new approach for inference in Bayesian networks, which is mainly based on partial differentiation. According to this approach, one compiles a Bayesian network into a multivariate polynomial and then computes the partial derivatives of this polynomial with respect to each variable. We show that once such derivatives are made available, one can compute in constant-time answers to a large class of probabilistic queries, which are central to classical inference, parameter estimation, model validation and sensitivity analysis. We present a number of complexity results relating to the compilation of such polynomials and to the computation of their partial derivatives. We argue that the combined simplicity, comprehensiveness and computational complexity of the presented framework is unique among existing frameworks for inference in Bayesian networks. We have recently introduced an any-space algorithm for exact inference in Bayesian networks, called Recursive Conditioning, RC, which allows one to trade space with time at increments of X-bytes, where X is the number of bytes needed to cache a floating point number. In this paper, we present three key extensions of RC. First, we modify the algorithm so it applies to more general factorization of probability distributions, including (but not limited to) Bayesian network factorizations. Second, we present a forgetting mechanism which reduces the space requirements of RC considerably and then compare such requirmenets with those of variable elimination on a number of realistic networks, showing orders of magnitude improvements in certain cases. Third, we present a version of RC for computing maximum a posteriori hypotheses (MAP), which turns out to be the first MAP algorithm allowing a smooth time-space tradeoff. A key advantage of presented MAP algorithm is that it does not have to start from scratch each time a new query is presented, but can reuse some of its computations across multiple queries, leading to significant savings in ceratain cases. Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in low-dimensional continuous space. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kd-trees (Moore, 1999). In this paper, we propose a kind of Bayesian networks in which low-dimensional mixtures of Gaussians over different subsets of the domain's variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrated how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications. Particle filters (PFs) are powerful sampling-based inference/learning algorithms for dynamic Bayesian networks (DBNs). They allow us to treat, in a principled way, any type of probability distribution, nonlinearity and non-stationarity. They have appeared in several fields under such names as "condensation", "sequential Monte Carlo" and "survival of the fittest". In this paper, we show how we can exploit the structure of the DBN to increase the efficiency of particle filtering, using a technique known as Rao-Blackwellisation. Essentially, this samples some of the variables, and marginalizes out the rest exactly, using the Kalman filter, HMM filter, junction tree algorithm, or any other finite dimensional optimal filter. We show that Rao-Blackwellised particle filters (RBPFs) lead to more accurate estimates than standard PFs. We demonstrate RBPFs on two problems, namely non-stationary online regression with radial basis function networks and robot localization and map building. We also discuss other potential application areas and provide references to some finite dimensional optimal filters. In this paper, we use evidence-specific value abstraction for speeding Bayesian networks inference. This is done by grouping variable values and treating the combined values as a single entity. As we show, such abstractions can exploit regularities in conditional probability distributions and also the specific values of observed variables. To formally justify value abstraction, we define the notion of safe value abstraction and devise inference algorithms that use it to reduce the cost of inference. Our procedure is particularly useful for learning complex networks with many hidden variables. In such cases, repeated likelihood computations are required for EM or other parameter optimization techniques. Since these computations are repeated with respect to the same evidence set, our methods can provide significant speedup to the learning procedure. We demonstrate the algorithm on genetic linkage problems where the use of value abstraction sometimes differentiates between a feasible and non-feasible solution. Inference for belief networks using Gibbs sampling produces a distribution for unobserved variables that differs from the correct distribution by a (usually) unknown error, since convergence to the right distribution occurs only asymptotically. The method of "coupling from the past" samples from exactly the correct distribution by (conceptually) running dependent Gibbs sampling simulations from every possible starting state from a time far enough in the past that all runs reach the same state at time t=0. Explicitly considering every possible state is intractable for large networks, however. We propose a method for layered noisy-or networks that uses a compact, but often imprecise, summary of a set of states. This method samples from exactly the correct distribution, and requires only about twice the time per step as ordinary Gibbs sampling, but it may require more simulation steps than would be needed if chains were tracked exactly. We describe a graphical model for probabilistic relationships---an alternative to the Bayesian network---called a dependency network. The graph of a dependency network, unlike a Bayesian network, is potentially cyclic. The probability component of a dependency network, like a Bayesian network, is a set of conditional distributions, one for each node given its parents. We identify several basic properties of this representation and describe a computationally efficient procedure for learning the graph and probability components from data. We describe the application of this representation to probabilistic inference, collaborative filtering (the task of predicting preferences), and the visualization of acausal predictive relationships. There are two main objectives of this paper. The first is to present a statistical framework for models with context specific independence structures, i.e., conditional independences holding only for sepcific values of the conditioning variables. This framework is constituted by the class of split models. Split models are extension of graphical models for contigency tables and allow for a more sophisticiated modelling than graphical models. The treatment of split models include estimation, representation and a Markov property for reading off those independencies holding in a specific context. The second objective is to present a software package named YGGDRASIL which is designed for statistical inference in split models, i.e., for learning such models on the basis of data. Composition of low-dimensional distributions, whose foundations were laid in the papaer published in the Proceeding of UAI'97 (Jirousek 1997), appeared to be an alternative apparatus to describe multidimensional probabilistic models. In contrast to Graphical Markov Models, which define multidomensinoal distributions in a declarative way, this approach is rather procedural. Ordering of low-dimensional distributions into a proper sequence fully defines the resepctive computational procedure; therefore, a stury of different type of generating sequences is one fo the central problems in this field. Thus, it appears that an important role is played by special sequences that are called perfect. Their main characterization theorems are presetned in this paper. However, the main result of this paper is a solution to the problem of margnialization for general sequences. The main theorem describes a way to obtain a generating sequence that defines the model corresponding to the marginal of the distribution defined by an arbitrary genearting sequence. From this theorem the reader can see to what extent these comutations are local; i.e., the sequence consists of marginal distributions whose computation must be made by summing up over the values of the variable eliminated (the paper deals with finite model). Stochastic games generalize Markov decision processes (MDPs) to a multiagent setting by allowing the state transitions to depend jointly on all player actions, and having rewards determined by multiplayer matrix games at each state. We consider the problem of computing Nash equilibria in stochastic games, the analogue of planning in MDPs. We begin by providing a generalization of finite-horizon value iteration that computes a Nash strategy for each player in generalsum stochastic games. The algorithm takes an arbitrary Nash selection function as input, which allows the translation of local choices between multiple Nash equilibria into the selection of a single global Nash equilibrium. Our main technical result is an algorithm for computing near-Nash equilibria in large or infinite state spaces. This algorithm builds on our finite-horizon value iteration algorithm, and adapts the sparse sampling methods of Kearns, Mansour and Ng (1999) to stochastic games. We conclude by descrbing a counterexample showing that infinite-horizon discounted value iteration, which was shown by shaplely to converge in the zero-sum case (a result we give extend slightly here), does not converge in the general-sum case. To investigate the robustness of the output probabilities of a Bayesian network, a sensitivity analysis can be performed. A one-way sensitivity analysis establishes, for each of the probability parameters of a network, a function expressing a posterior marginal probability of interest in terms of the parameter. Current methods for computing the coefficients in such a function rely on a large number of network evaluations. In this paper, we present a method that requires just a single outward propagation in a junction tree for establishing the coefficients in the functions for all possible parameters; in addition, an inward propagation is required for processing evidence. Conversely, the method requires a single outward propagation for computing the coefficients in the functions expressing all possible posterior marginals in terms of a single parameter. We extend these results to an n-way sensitivity analysis in which sets of parameters are studied. We introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria. This paper shows how the Bayesian network paradigm can be used in order to solve combinatorial optimization problems. To do it some methods of structure learning from data and simulation of Bayesian networks are inserted inside Estimation of Distribution Algorithms (EDA). EDA are a new tool for evolutionary computation in which populations of individuals are created by estimation and simulation of the joint probability distribution of the selected individuals. We propose new approaches to EDA for combinatorial optimization based on the theory of probabilistic graphical models. Experimental results are also presented. We apply the principle of maximum entropy to select a unique joint probability distribution from the set of all joint probability distributions specified by a credal network. In detail, we start by showing that the unique joint distribution of a Bayesian tree coincides with the maximum entropy model of its conditional distributions. This result, however, does not hold anymore for general Bayesian networks. We thus present a new kind of maximum entropy models, which are computed sequentially. We then show that for all general Bayesian networks, the sequential maximum entropy model coincides with the unique joint distribution. Moreover, we apply the new principle of sequential maximum entropy to interval Bayesian networks and more generally to credal networks. We especially show that this application is equivalent to a number of small local entropy maximizations. In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial time. This follows from two main results: First, we show that factored distributions over spanning trees in a graph can be integrated in closed form. Second, we examine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al. 1995) constrain the tree parameter priors to be a compactly parameterized product of Dirichlet distributions. Beside allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is computed through an ensemble average over tree structures. This paper deals with the representation and solution of asymmetric Bayesian decision problems. We present a formal framework, termed asymmetric influence diagrams, that is based on the influence diagram and allows an efficient representation of asymmetric decision problems. As opposed to existing frameworks, the asymmetric influece diagram primarily encodes asymmetry at the qualitative level and it can therefore be read directly from the model. We give an algorithm for solving asymmetric influence diagrams. The algorithm initially decomposes the asymmetric decision problem into a structure of symmetric subproblems organized as a tree. A solution to the decision problem can then be found by propagating from the leaves toward the root using existing evaluation methods to solve the sub-problems. When using Bayesian networks for modelling the behavior of man-made machinery, it usually happens that a large part of the model is deterministic. For such Bayesian networks deterministic part of the model can be represented as a Boolean function, and a central part of belief updating reduces to the task of calculating the number of satisfying configurations in a Boolean function. In this paper we explore how advances in the calculation of Boolean functions can be adopted for belief updating, in particular within the context of troubleshooting. We present experimental results indicating a substantial speed-up compared to traditional junction tree propagation. Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we want to have a sampling distribution that provides optimal-variance estimators. In this paper, we present methods that improve the sampling distribution by systematically adapting it as we obtain information from the samples. We present a stochastic-gradient-descent method for sequentially updating the sampling distribution based on the direct minization of the variance. We also present other stochastic-gradient-descent methods based on the minimizationof typical notions of distance between the current sampling distribution and approximations of the target, optimal distribution. We finally validate and compare the different methods empirically by applying them to the problem of action evaluation in influence diagrams. Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents containing words are just three typical examples. Real-time query selectivity estimation (the problem of estimating the number of rows in the data satisfying a given predicate) is an important practical problem for such databases. We investigate the application of probabilistic models to this problem. In particular, we study a Markov random field (MRF) approach based on frequent sets and maximum entropy, and compare it to the independence model and the Chow-Liu tree model. We find that the MRF model provides substantially more accurate probability estimates than the other methods but is more expensive from a computational and memory viewpoint. To alleviate the computational requirements we show how one can apply bucket elimination and clique tree approaches to take advantage of structure in the models and in the queries. We provide experimental results on two large real-world transaction datasets. We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approximate the belief state. Other schemes for belief-state approximation (e.g., based on minimixing a measures such as KL-diveregence between the true and estimated state) are not necessarily appropriate for POMDPs. Instead we propose a framework for analyzing value-directed approximation schemes, where approximation quality is determined by the expected error in utility rather than by the error in the belief state itself. We propose heuristic methods for finding good projection schemes for belief state estimation - exhibiting anytime characteristics - given a POMDP value fucntion. We also describe several algorithms for constructing bounds on the error in decision quality (expected utility) associated with acting in accordance with a given belief state approximation. Techniques for plan recognition under uncertainty require a stochastic model of the plan-generation process. We introduce Probabilistic State-Dependent Grammars (PSDGs) to represent an agent's plan-generation process. The PSDG language model extends probabilistic context-free grammars (PCFGs) by allowing production probabilities to depend on an explicit model of the planning agent's internal and external state. Given a PSDG description of the plan-generation process, we can then use inference algorithms that exploit the particular independence properties of the PSDG language to efficiently answer plan-recognition queries. The combination of the PSDG language model and inference algorithms extends the range of plan-recognition domains for which practical probabilistic inference is possible, as illustrated by applications in traffic monitoring and air combat. Dynamic trees are mixtures of tree structured belief networks. They solve some of the problems of fixed tree networks at the cost of making exact inference intractable. For this reason approximate methods such as sampling or mean field approaches have been used. However, mean field approximations assume a factorized distribution over node states. Such a distribution seems unlickely in the posterior, as nodes are highly correlated in the prior. Here a structured variational approach is used, where the posterior distribution over the non-evidential nodes is itself approximated by a dynamic tree. It turns out that this form can be used tractably and efficiently. The result is a set of update rules which can propagate information through the network to obtain both a full variational approximation, and the relevant marginals. The progagtion rules are more efficient than the mean field approach and give noticeable quantitative and qualitative improvement in the inference. The marginals calculated give better approximations to the posterior than loopy propagation on a small toy problem. We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distribution correspond to the nodes (clusters) of the tree representing the hierarchy. We apply this general model to the problem of document clustering for which we use a multinomial likelihood function and Dirichlet priors. Our algorithm consists of a two-stage process wherein we first perform a flat clustering followed by a modified hierarchical agglomerative merging process that includes determining the features that will have common distributions over the merged clusters. The regularization induced by using the marginal likelihood automatically determines the optimal model structure including number of clusters, the depth of the tree and the subset of features to be modeled as having a common distribution at each node. We present experimental results on both synthetic data and a real document collection. Conditional independence and Markov properties are powerful tools allowing expression of multidimensional probability distributions by means of low-dimensional ones. As multidimensional possibilistic models have been studied for several years, the demand for analogous tools in possibility theory seems to be quite natural. This paper is intended to be a promotion of de Cooman's measure-theoretic approcah to possibility theory, as this approach allows us to find analogies to many important results obtained in probabilistic framework. First, we recall semi-graphoid properties of conditional possibilistic independence, parameterized by a continuous t-norm, and find sufficient conditions for a class of Archimedean t-norms to have the graphoid property. Then we introduce Markov properties and factorization of possibility distrubtions (again parameterized by a continuous t-norm) and find the relationships between them. These results are accompanied by a number of conterexamples, which show that the assumptions of specific theorems are substantial. Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction trees. We derive generalized mean field equations to optimize the cluster potentials. We show that the method bridges the gap between the standard mean field approximation and the exact junction tree algorithm. In addition, we address the problem of how to choose the graphical structure of the approximating distribution. From the generalised mean field equations we derive rules to simplify the structure of the approximating distribution in advance without affecting the quality of the approximation. We also show how the method fits into some other variational approximations that are currently popular. Algorithms for learning the conditional probabilities of Bayesian networks with hidden variables typically operate within a high-dimensional search space and yield only locally optimal solutions. One way of limiting the search space and avoiding local optima is to impose qualitative constraints that are based on background knowledge concerning the domain. We present a method for integrating formal statements of qualitative constraints into two learning algorithms, APN and EM. In our experiments with synthetic data, this method yielded networks that satisfied the constraints almost perfectly. The accuracy of the learned networks was consistently superior to that of corresponding networks learned without constraints. The exploitation of qualitative constraints therefore appears to be a promising way to increase both the interpretability and the accuracy of learned Bayesian networks with known structure. Elicitation of probabilities is one of the most laborious tasks in building decision-theoretic models, and one that has so far received only moderate attention in decision-theoretic systems. We propose a set of user interface tools for graphical probabilistic models, focusing on two aspects of probability elicitation: (1) navigation through conditional probability tables and (2) interactive graphical assessment of discrete probability distributions. We propose two new graphical views that aid navigation in very large conditional probability tables: the CPTree (Conditional Probability Tree) and the SCPT (shrinkable Conditional Probability Table). Based on what is known about graphical presentation of quantitative data to humans, we offer several useful enhancements to probability wheel and bar graph, including different chart styles and options that can be adapted to user preferences and needs. We present the results of a simple usability study that proves the value of the proposed tools. Computer Poker's unique characteristics present a well-suited challenge for research in artificial intelligence. For that reason, and due to the Poker's market increase in popularity in Portugal since 2008, several members of LIACC have researched in this field. Several works were published as papers and master theses and more recently a member of LIACC engaged on a research in this area as a Ph.D. thesis in order to develop a more extensive and in-depth work. This paper describes the existing research in LIACC about Computer Poker, with special emphasis on the completed master's theses and plans for future work. This paper means to present a summary of the lab's work to the research community in order to encourage the exchange of ideas with other labs / individuals. LIACC hopes this will improve research in this area so as to reach the goal of creating an agent that surpasses the best human players. Diagnosis and prediction in some domains, like medical and industrial diagnosis, require a representation that combines uncertainty management and temporal reasoning. Based on the fact that in many cases there are few state changes in the temporal range of interest, we propose a novel representation called Temporal Nodes Bayesian Networks (TNBN). In a TNBN each node represents an event or state change of a variable, and an arc corresponds to a causal-temporal relationship. The temporal intervals can differ in number and size for each temporal node, so this allows multiple granularity. Our approach is contrasted with a dynamic Bayesian network for a simple medical example. An empirical evaluation is presented for a more complex problem, a subsystem of a fossil power plant, in which this approach is used for fault diagnosis and prediction with good results. Possibilistic logic bases and possibilistic graphs are two different frameworks of interest for representing knowledge. The former stratifies the pieces of knowledge (expressed by logical formulas) according to their level of certainty, while the latter exhibits relationships between variables. The two types of representations are semantically equivalent when they lead to the same possibility distribution (which rank-orders the possible interpretations). A possibility distribution can be decomposed using a chain rule which may be based on two different kinds of conditioning which exist in possibility theory (one based on product in a numerical setting, one based on minimum operation in a qualitative setting). These two types of conditioning induce two kinds of possibilistic graphs. In both cases, a translation of these graphs into possibilistic bases is provided. The converse translation from a possibilistic knowledge base into a min-based graph is also described. In many domains it is desirable to assess the preferences of users in a qualitative rather than quantitative way. Such representations of qualitative preference orderings form an importnat component of automated decision tools. We propose a graphical representation of preferences that reflects conditional dependence and independence of preference statements under a ceteris paribus (all else being equal) interpretation. Such a representation is ofetn compact and arguably natural. We describe several search algorithms for dominance testing based on this representation; these algorithms are quite effective, especially in specific network topologies, such as chain-and tree- structured networks, as well as polytrees. Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way. In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers - Naive-Bayes, tree augmented Naive-Bayes, BN augmented Naive-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BN-learning algorithm. Experimental results show the obtained classifiers, learned using the CI based algorithms, are competitive with (or superior to) the best known classifiers, based on both Bayesian networks and other formalisms; and that the computational time for learning and using these classifiers is relatively small. Moreover, these results also suggest a way to learn yet more effective classifiers; we demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities. We present a hybrid constraint-based/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraint-based techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants of the algorithm are developed and tested using data from randomly generated networks of sizes from 15 to 45 nodes with data sizes ranging from 250 to 2000 records. Both variations are compared to, and found to consistently outperform two variations of greedy search with restarts. Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. Hybrid Probabilistic Programs (HPPs) are logic programs that allow the programmer to explicitly encode his knowledge of the dependencies between events being described in the program. In this paper, we classify HPPs into three classes called HPP_1,HPP_2 and HPP_r,r>= 3. For these classes, we provide three types of results for HPPs. First, we develop algorithms to compute the set of all ground consequences of an HPP. Then we provide algorithms and complexity results for the problems of entailment ("Given an HPP P and a query Q as input, is Q a logical consequence of P?") and consistency ("Given an HPP P as input, is P consistent?"). Our results provide a fine characterization of when polynomial algorithms exist for the above problems, and when these problems become intractable. The problem of assessing the value of a candidate is viewed here as a multiple combination problem. On the one hand a candidate can be evaluated according to different criteria, and on the other hand several experts are supposed to assess the value of candidates according to each criterion. Criteria are not equally important, experts are not equally competent or reliable. Moreover levels of satisfaction of criteria, or levels of confidence are only assumed to take their values in qualitative scales which are just linearly ordered. The problem is discussed within two frameworks, the transferable belief model and the qualitative possibility theory. They respectively offer a quantitative and a qualitative setting for handling the problem, providing thus a way to compare the nature of the underlying assumptions. This paper investigates a purely qualitative version of Savage's theory for decision making under uncertainty. Until now, most representation theorems for preference over acts rely on a numerical representation of utility and uncertainty where utility and uncertainty are commensurate. Disrupting the tradition, we relax this assumption and introduce a purely ordinal axiom requiring that the Decision Maker (DM) preference between two acts only depends on the relative position of their consequences for each state. Within this qualitative framework, we determine the only possible form of the decision rule and investigate some instances compatible with the transitivity of the strict preference. Finally we propose a mild relaxation of our ordinality axiom, leaving room for a new family of qualitative decision rules compatible with transitivity. In recent years there has been significant progress in algorithms and methods for inducing Bayesian networks from data. However, in complex data analysis problems, we need to go beyond being satisfied with inducing networks with high scores. We need to provide confidence measures on features of these networks: Is the existence of an edge between two nodes warranted? Is the Markov blanket of a given node robust? Can we say something about the ordering of the variables? We should be able to address these questions, even when the amount of data is not enough to induce a high scoring network. In this paper we propose Efron's Bootstrap as a computationally efficient approach for answering these questions. In addition, we propose to use these confidence measures to induce better structures from the data, and to detect the presence of latent variables. Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper, we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures. In this paper, we analyze the relationship between probability and Spohn's theory for representation of uncertain beliefs. Using the intuitive idea that the more probable a proposition is, the more believable it is, we study transformations from probability to Sphonian disbelief and vice-versa. The transformations described in this paper are different from those described in the literature. In particular, the former satisfies the principles of ordinal congruence while the latter does not. Such transformations between probability and Spohn's calculi can contribute to (1) a clarification of the semantics of nonprobabilistic degree of uncertain belief, and (2) to a construction of a decision theory for such calculi. In practice, the transformations will allow a meaningful combination of more than one calculus in different stages of using an expert system such as knowledge acquisition, inference, and interpretation of results. Classical Decision Theory provides a normative framework for representing and reasoning about complex preferences. Straightforward application of this theory to automate decision making is difficult due to high elicitation cost. In response to this problem, researchers have recently developed a number of qualitative, logic-oriented approaches for representing and reasoning about references. While effectively addressing some expressiveness issues, these logics have not proven powerful enough for building practical automated decision making systems. In this paper we present a hybrid approach to preference elicitation and decision making that is grounded in classical multi-attribute utility theory, but can make effective use of the expressive power of qualitative approaches. Specifically, assuming a partially specified multilinear utility function, we show how comparative statements about classes of decision alternatives can be used to further constrain the utility function and thus identify sup-optimal alternatives. This work demonstrates that quantitative and qualitative approaches can be synergistically integrated to provide effective and flexible decision support. Markov decisions processes (MDPs) are becoming increasing popular as models of decision theoretic planning. While traditional dynamic programming methods perform well for problems with small state spaces, structured methods are needed for large problems. We propose and examine a value iteration algorithm for MDPs that uses algebraic decision diagrams(ADDs) to represent value functions and policies. An MDP is represented using Bayesian networks and ADDs and dynamic programming is applied directly to these ADDs. We demonstrate our method on large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions). We introduce utility-directed procedures for mediating the flow of potentially distracting alerts and communications to computer users. We present models and inference procedures that balance the context-sensitive costs of deferring alerts with the cost of interruption. We describe the challenge of reasoning about such costs under uncertainty via an analysis of user activity and the content of notifications. After introducing principles of attention-sensitive alerting, we focus on the problem of guiding alerts about email messages. We dwell on the problem of inferring the expected criticality of email and discuss work on the Priorities system, centering on prioritizing email by criticality and modulating the communication of notifications to users about the presence and nature of incoming email. The paper is a second in a series of two papers evaluating the power of a new scheme that generates search heuristics mechanically. The heuristics are extracted from an approximation scheme called mini-bucket elimination that was recently introduced. The first paper introduced the idea and evaluated it within Branch-and-Bound search. In the current paper the idea is further extended and evaluated within Best-First search. The resulting algorithms are compared on coding and medical diagnosis problems, using varying strength of the mini-bucket heuristics. Our results demonstrate an effective search scheme that permits controlled tradeoff between preprocessing (for heuristic generation) and search. Best-first search is shown to outperform Branch-and-Bound, when supplied with good heuristics, and sufficient memory space. The clique tree algorithm is the standard method for doing inference in Bayesian networks. It works by manipulating clique potentials - distributions over the variables in a clique. While this approach works well for many networks, it is limited by the need to maintain an exact representation of the clique potentials. This paper presents a new unified approach that combines approximate inference and the clique tree algorithm, thereby circumventing this limitation. Many known approximate inference algorithms can be viewed as instances of this approach. The algorithm essentially does clique tree propagation, using approximate inference to estimate the densities in each clique. In many settings, the computation of the approximate clique potential can be done easily using statistical importance sampling. Iterations are used to gradually improve the quality of the estimation. Poker is ideal for testing automated reasoning under uncertainty. It introduces uncertainty both by physical randomization and by incomplete information about opponents hands.Another source OF uncertainty IS the limited information available TO construct psychological models OF opponents, their tendencies TO bluff, play conservatively, reveal weakness, etc. AND the relation BETWEEN their hand strengths AND betting behaviour. ALL OF these uncertainties must be assessed accurately AND combined effectively FOR ANY reasonable LEVEL OF skill IN the game TO be achieved, since good decision making IS highly sensitive TO those tasks.We describe our Bayesian Poker Program(BPP), which uses a Bayesian network TO model the programs poker hand, the opponents hand AND the opponents playing behaviour conditioned upon the hand, and betting curves which govern play given a probability of winning. The history of play with opponents is used to improve BPPs understanding OF their behaviour.We compare BPP experimentally WITH : a simple RULE - based system; a program which depends exclusively ON hand probabilities(i.e., without opponent modeling); AND WITH human players.BPP has shown itself TO be an effective player against ALL these opponents, barring the better humans.We also sketch out SOME likely ways OF improving play. Solving symmetric Bayesian decision problems is a computationally intensive task to perform regardless of the algorithm used. In this paper we propose a method for improving the efficiency of algorithms for solving Bayesian decision problems. The method is based on the principle of lazy evaluation - a principle recently shown to improve the efficiency of inference in Bayesian networks. The basic idea is to maintain decompositions of potentials and to postpone computations for as long as possible. The efficiency improvements obtained with the lazy evaluation based method is emphasized through examples. Finally, the lazy evaluation based method is compared with the hugin and valuation-based systems architectures for solving symmetric Bayesian decision problems. Solving partially observable Markov decision processes (POMDPs) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP and/or policy are further constrained. We demonstrate good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method for finding locally optimal stochastic policies. Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step. As observations and student models become complex, educational assessments that exploit advances in technology and cognitive psychology can outstrip familiar testing models and analytic methods. Within the Portal conceptual framework for assessment design, Bayesian inference networks (BINs) record beliefs about students' knowledge and skills, in light of what they say and do. Joining evidence model BIN fragments- which contain observable variables and pointers to student model variables - to the student model allows one to update belief about knowledge and skills as observations arrive. Markov Chain Monte Carlo (MCMC) techniques can estimate the required conditional probabilities from empirical data, supplemented by expert judgment or substantive theory. Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter. In this paper we present a new Bayesian network model for classification that combines the naive-Bayes (NB) classifier and the finite-mixture (FM) classifier. The resulting classifier aims at relaxing the strong assumptions on which the two component models are based, in an attempt to improve on their classification performance, both in terms of accuracy and in terms of calibration of the estimated probabilities. The proposed classifier is obtained by superimposing a finite mixture model on the set of feature variables of a naive Bayes model. We present experimental results that compare the predictive performance on real datasets of the new classifier with the predictive performance of the NB classifier and the FM classifier. Recently, researchers have demonstrated that loopy belief propagation - the use of Pearls polytree algorithm IN a Bayesian network WITH loops OF error- correcting codes.The most dramatic instance OF this IS the near Shannon - limit performance OF Turbo Codes codes whose decoding algorithm IS equivalent TO loopy belief propagation IN a chain - structured Bayesian network. IN this paper we ask : IS there something special about the error - correcting code context, OR does loopy propagation WORK AS an approximate inference schemeIN a more general setting? We compare the marginals computed using loopy propagation TO the exact ones IN four Bayesian network architectures, including two real - world networks : ALARM AND QMR.We find that the loopy beliefs often converge AND WHEN they do, they give a good approximation TO the correct marginals.However,ON the QMR network, the loopy beliefs oscillated AND had no obvious relationship TO the correct posteriors. We present SOME initial investigations INTO the cause OF these oscillations, AND show that SOME simple methods OF preventing them lead TO the wrong results. A major problem for the learning of Bayesian networks (BNs) is the exponential number of parameters needed for conditional probability tables. Recent research reduces this complexity by modeling local structure in the probability tables. We examine the use of log-linear local models. While log-linear models in this context are not new (Whittaker, 1990; Buntine, 1991; Neal, 1992; Heckerman and Meek, 1997), for structure learning they are generally subsumed under a naive Bayes model. We describe an alternative interpretation, and use a Minimum Message Length (MML) (Wallace, 1987) metric for structure learning of networks exhibiting causal independence, which we term first-order networks (FONs). We also investigate local model selection on a node-by-node basis. The need to help people choose among large numbers of items and to filter through large amounts of information has led to a flood of research in construction of personal recommendation agents. One of the central issues in constructing such agents is the representation and elicitation of user preferences or interests. This topic has long been studied in Decision Theory, but surprisingly little work in the area of recommender systems has made use of formal decision-theoretic techniques. This paper describes DIVA, a decision-theoretic agent for recommending movies that contains a number of novel features. DIVA represents user preferences using pairwise comparisons among items, rather than numeric ratings. It uses a novel similarity measure based on the concept of the probability of conflict between two orderings of items. The system has a rich representation of preference, distinguishing between a user's general taste in movies and his immediate interests. It takes an incremental approach to preference elicitation in which the user can provide feedback if not satisfied with the recommendation list. We empirically evaluate the performance of the system using the EachMovie collaborative filtering database. Graphical models based on conditional independence support concise encodings of the subjective belief of a single agent. A natural question is whether the consensus belief of a group of agents can be represented with equal parsimony. We prove, under relatively mild assumptions, that even if everyone agrees on a common graph topology, no method of combining beliefs can maintain that structure. Even weaker conditions rule out local aggregation within conditional probability tables. On a more positive note, we show that if probabilities are combined with the logarithmic opinion pool (LogOP), then commonly held Markov independencies are maintained. This suggests a straightforward procedure for constructing a consensus Markov network. We describe an algorithm for computing the LogOP with time complexity comparable to that of exact Bayesian inference. Bayesian Networks (BN) provide robust probabilistic methods of reasoning under uncertainty, but despite their formal grounds are strictly based on the notion of conditional dependence, not much attention has been paid so far to their use in dependability analysis. The aim of this paper is to propose BN as a suitable tool for dependability analysis, by challenging the formalism with basic issues arising in dependability tasks. We will discuss how both modeling and analysis issues can be naturally dealt with by BN. Moreover, we will show how some limitations intrinsic to combinatorial dependability methods such as Fault Trees can be overcome using BN. This will be pursued through the study of a real-world example concerning the reliability analysis of a redundant digital Programmable Logic Controller (PLC) with majority voting 2:3 Inference networks have a variety of important uses and are constructed by persons having quite different standpoints. Discussed in this paper are three different but complementary methods for generating and analyzing probabilistic inference networks. The first method, though over eighty years old, is very useful for knowledge representation in the task of constructing probabilistic arguments. It is also useful as a heuristic device in generating new forms of evidence. The other two methods are formally equivalent ways for combining probabilities in the analysis of inference networks. The use of these three methods is illustrated in an analysis of a mass of evidence in a celebrated American law case. Hidden Markov models (HMMs) and partially observable Markov decision processes (POMDPs) form a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here is concerned with acquiring such models. We demonstrate how domain-specific information and constraints can be incorporated into the statistical estimation process, greatly improving the learned models in terms of the model quality, the number of iterations required for convergence and robustness to reduction in the amount of available data. We present new initialization heuristics which can be used even when the data suffers from cumulative rotational error, new update rules for the model parameters, as an instance of generalized EM, and a strategy for enforcing complete geometrical consistency in the model. Experimental results demonstrate the effectiveness of our approach for both simulated and real robot data, in traditionally hard-to-learn environments. The deontic logic DUS is a Deontic Update Semantics for prescriptive obligations based on the update semantics of Veltman. In DUS the definition of logical validity of obligations is not based on static truth values but on dynamic action transitions. In this paper prescriptive defeasible obligations are formalized in update semantics and the diagnostic problem of defeasible deontic logic is discussed. Assume a defeasible obligation `normally A ought to be (done)' together withthe fact `A is not (done).' Is this an exception of the normality claim, or is it a violation of the obligation? In this paper we formalize the heuristic principle that it is a violation, unless there is a more specific overriding obligation. The underlying motivation from legal reasoning is that criminals should have as little opportunities as possible to excuse themselves by claiming that their behavior was exceptional rather than criminal. In building Bayesian belief networks, the elicitation of all probabilities required can be a major obstacle. We learned the extent of this often-cited observation in the construction of the probabilistic part of a complex influence diagram in the field of cancer treatment. Based upon our negative experiences with existing methods, we designed a new method for probability elicitation from domain experts. The method combines various ideas, among which are the ideas of transcribing probabilities and of using a scale with both numerical and verbal anchors for marking assessments. In the construction of the probabilistic part of our influence diagram, the method proved to allow for the elicitation of many probabilities in little time. The AGM theory of belief revision has become an important paradigm for investigating rational belief changes. Unfortunately, researchers working in this paradigm have restricted much of their attention to rather simple representations of belief states, namely logically closed sets of propositional sentences. In our opinion, this has resulted in a too abstract categorisation of belief change operations: expansion, revision, or contraction. Occasionally, in the AGM paradigm, also probabilistic belief changes have been considered, and it is widely accepted that the probabilistic version of expansion is conditioning. However, we argue that it may be more correct to view conditioning and expansion as two essentially different kinds of belief change, and that what we call constraining is a better candidate for being considered probabilistic expansion. It is well-known that the notion of (strong) conditional independence (CI) is too restrictive to capture independencies that only hold in certain contexts. This kind of contextual independency, called context-strong independence (CSI), can be used to facilitate the acquisition, representation, and inference of probabilistic knowledge. In this paper, we suggest the use of contextual weak independence (CWI) in Bayesian networks. It should be emphasized that the notion of CWI is a more general form of contextual independence than CSI. Furthermore, if the contextual strong independence holds for all contexts, then the notion of CSI becomes strong CI. On the other hand, if the weak contextual independence holds for all contexts, then the notion of CWI becomes weak independence (WI) nwhich is a more general noncontextual independency than strong CI. More importantly, complete axiomatizations are studied for both the class of WI and the class of CI and WI together. Finally, the interesting property of WI being a necessary and sufficient condition for ensuring consistency in granular probabilistic networks is shown. As Bayesian networks are applied to larger and more complex problem domains, search for flexible modeling and more efficient inference methods is an ongoing effort. Multiply sectioned Bayesian networks (MSBNs) extend the HUGIN inference for Bayesian networks into a coherent framework for flexible modeling and distributed inference.Lazy propagation extends the Shafer-Shenoy and HUGIN inference methods with reduced space complexity. We apply the Shafer-Shenoy and lazy propagation to inference in MSBNs. The combination of the MSBN framework and lazy propagation provides a better framework for modeling and inference in very large domains. It retains the modeling flexibility of MSBNs and reduces the runtime space complexity, allowing exact inference in much larger domains given the same computational resources. Recent interests in dynamic decision modeling have led to the development of several representation and inference methods. These methods however, have limited application under time critical conditions where a trade-off between model quality and computational tractability is essential. This paper presents an approach to time-critical dynamic decision modeling. A knowledge representation and modeling method called the time-critical dynamic influence diagram is proposed. The formalism has two forms. The condensed form is used for modeling and model abstraction, while the deployed form which can be converted from the condensed form is used for inference purposes. The proposed approach has the ability to represent space-temporal abstraction within the model. A knowledge-based meta-reasoning approach is proposed for the purpose of selecting the best abstracted model that provide the optimal trade-off between model quality and model tractability. An outline of the knowledge-based model construction algorithm is also provided. In this paper we propose a logic-based, framework inspired by artificial intelligence, but scaled down for practical database and programming applications. Computation in the framework is viewed as the task of generating a sequence of state transitions, with the purpose of making an agent's goals all true. States are represented by sets of atomic sentences (or facts), representing the values of program variables, tuples in a coordination language, facts in relational databases, or Herbrand models. In the model-theoretic semantics, the entire sequence of states and events are combined into a single model-theoretic structure, by associating timestamps with facts and events. But in the operational semantics, facts are updated destructively, without timestamps. We show that the model generated by destructive updates is identical to the model generated by reasoning with facts containing timestamps. We also extend the model with intentional predicates and composite event predicates defined by logic programs containing conditions in first-order logic, which query the current state. Possibilistic logic is a well-known graded logic of uncertainty suitable to reason under incomplete information and partially inconsistent knowledge, which is built upon classical first order logic. There exists for Possibilistic logic a proof procedure based on a refutation complete resolution-style calculus. Recently, a syntactical extension of first order Possibilistic logic (called PLFC) dealing with fuzzy constants and fuzzily restricted quantifiers has been proposed. Our aim is to present steps towards both the formalization of PLFC itself and an automated deduction system for it by (i) providing a formal semantics; (ii) defining a sound resolution-style calculus by refutation; and (iii) describing a first-order proof procedure for PLFC clauses based on (ii) and on a novel notion of most general substitution of two literals in a resolution step. In contrast to standard Possibilistic logic semantics, truth-evaluation of formulas with fuzzy constants are many-valued instead of boolean, and consequently an extended notion of possibilistic uncertainty is also needed. There exist two general forms of exact algorithms for updating probabilities in Bayesian Networks. The first approach involves using a structure, usually a clique tree, and performing local message based calculation to extract the belief in each variable. The second general class of algorithm involves the use of non-serial dynamic programming techniques to extract the belief in some desired group of variables. In this paper we present a hybrid algorithm based on the latter approach yet possessing the ability to retrieve the belief in all single variables. The technique is advantageous in that it saves a NP-hard computation step over using one algorithm of each type. Furthermore, this technique re-enforces a conjecture of Jensen and Jensen [JJ94] in that it still requires a single NP-hard step to set up the structure on which inference is performed, as we show by confirming Li and D'Ambrosio's [LD94] conjectured NP-hardness of OFP. Recent research in decision theoretic planning has focussed on making the solution of Markov decision processes (MDPs) more feasible. We develop a family of algorithms for structured reachability analysis of MDPs that are suitable when an initial state (or set of states) is known. Using compact, structured representations of MDPs (e.g., Bayesian networks), our methods, which vary in the tradeoff between complexity and accuracy, produce structured descriptions of (estimated) reachable states that can be used to eliminate variables or variable values from the problem description, reducing the size of the MDP and making it easier to solve. One contribution of our work is the extension of ideas from GRAPHPLAN to deal with the distributed nature of action representations typically embodied within Bayes nets and the problem of correlated action effects. We also demonstrate that our algorithm can be made more complete by using k-ary constraints instead of binary constraints. Another contribution is the illustration of how the compact representation of reachability constraints can be exploited by several existing (exact and approximate) abstraction algorithms for MDPs. Information Retrieval (IR) is concerned with the identification of documents in a collection that are relevant to a given information need, usually represented as a query containing terms or keywords, which are supposed to be a good description of what the user is looking for. IR systems may improve their effectiveness (i.e., increasing the number of relevant documents retrieved) by using a process of query expansion, which automatically adds new terms to the original query posed by an user. In this paper we develop a method of query expansion based on Bayesian networks. Using a learning algorithm, we construct a Bayesian network that represents some of the relationships among the terms appearing in a given document collection; this network is then used as a thesaurus (specific for that collection). We also report the results obtained by our method on three standard test collections. It is well known that one can ignore parts of a belief network when computing answers to certain probabilistic queries. It is also well known that the ignorable parts (if any) depend on the specific query of interest and, therefore, may change as the query changes. Algorithms based on jointrees, however, do not seem to take computational advantage of these facts given that they typically construct jointrees for worst-case queries; that is, queries for which every part of the belief network is considered relevant. To address this limitation, we propose in this paper a method for reconfiguring jointrees dynamically as the query changes. The reconfiguration process aims at maintaining a jointree which corresponds to the underlying belief network after it has been pruned given the current query. Our reconfiguration method is marked by three characteristics: (a) it is based on a non-classical definition of jointrees; (b) it is relatively efficient; and (c) it can reuse some of the computations performed before a jointree is reconfigured. We present preliminary experimental results which demonstrate significant savings over using static jointrees when query changes are considerable. In recent years there has been a flurry of works on learning Bayesian networks from data. One of the hard problems in this area is how to effectively learn the structure of a belief network from incomplete data- that is, in the presence of missing values or hidden variables. In a recent paper, I introduced an algorithm called Structural EM that combines the standard Expectation Maximization (EM) algorithm, which optimizes parameters, with structure search for model selection. That algorithm learns networks based on penalized likelihood scores, which include the BIC/MDL score and various approximations to the Bayesian score. In this paper, I extend Structural EM to deal directly with Bayesian model selection. I prove the convergence of the resulting algorithm and show how to apply it for learning a large class of probabilistic models, including Bayesian networks and some variants thereof. This paper (1)shows that the best supported current psychological theory (Cheng, 1997) of how human subjects judge the causal power or influence of variations in presence or absence of one feature on another, given data on their covariation, tacitly uses a Bayes network which is either a noisy or gate (for causes that promote the effect) or a noisy and gate (for causes that inhibit the effect); (2)generalizes Chengs theory to arbitrary acyclic networks of noisy or and noisy and gates; (3)gives various sufficient conditions for the estimation of the parameters in such networks when there are independent, unobserved causes; (4)distinguishes direct causal influence of one feature on another (influence along a path with one edge) from total influence (influence along all paths from one variable to another) and gives sufficient conditions for estimating each when there are unobserved causes of the outcome variable; (5)describes the relation between Cheng models and a simplified version of the Rubin framework for representing causal relations. While decision theory provides an appealing normative framework for representing rich preference structures, eliciting utility or value functions typically incurs a large cost. For many applications involving interactive systems this overhead precludes the use of formal decision-theoretic models of preference. Instead of performing elicitation in a vacuum, it would be useful if we could augment directly elicited preferences with some appropriate default information. In this paper we propose a case-based approach to alleviating the preference elicitation bottleneck. Assuming the existence of a population of users from whom we have elicited complete or incomplete preference structures, we propose eliciting the preferences of a new user interactively and incrementally, using the closest existing preference structures as potential defaults. Since a notion of closeness demands a measure of distance among preference structures, this paper takes the first step of studying various distance measures over fully and partially specified preference structures. We explore the use of Euclidean distance, Spearmans footrule, and define a new measure, the probabilistic distance. We provide computational techniques for all three measures. We investigate the use of temporally abstract actions, or macro-actions, in the solution of Markov decision processes. Unlike current models that combine both primitive actions and macro-actions and leave the state space unchanged, we propose a hierarchical model (using an abstract MDP) that works with macro-actions only, and that significantly reduces the size of the state space. This is achieved by treating macroactions as local policies that act in certain regions of state space, and by restricting states in the abstract MDP to those at the boundaries of regions. The abstract MDP approximates the original and can be solved more efficiently. We discuss several ways in which macro-actions can be generated to ensure good solution quality. Finally, we consider ways in which macro-actions can be reused to solve multiple, related MDPs; and we show that this can justify the computational overhead of macro-action generation. People using consumer software applications typically do not use technical jargon when querying an online database of help topics. Rather, they attempt to communicate their goals with common words and phrases that describe software functionality in terms of structure and objects they understand. We describe a Bayesian approach to modeling the relationship between words in a user's query for assistance and the informational goals of the user. After reviewing the general method, we describe several extensions that center on integrating additional distinctions and structure about language usage and user goals into the Bayesian models. Stochastic search algorithms are among the most sucessful approaches for solving hard combinatorial problems. A large class of stochastic search approaches can be cast into the framework of Las Vegas Algorithms (LVAs). As the run-time behavior of LVAs is characterized by random variables, the detailed knowledge of run-time distributions provides important information for the analysis of these algorithms. In this paper we propose a novel methodology for evaluating the performance of LVAs, based on the identification of empirical run-time distributions. We exemplify our approach by applying it to Stochastic Local Search (SLS) algorithms for the satisfiability problem (SAT) in propositional logic. We point out pitfalls arising from the use of improper empirical methods and discuss the benefits of the proposed methodology for evaluating and comparing LVAs. We present an anytime algorithm which computes policies for decision problems represented as multi-stage influence diagrams. Our algorithm constructs policies incrementally, starting from a policy which makes no use of the available information. The incremental process constructs policies which includes more of the information available to the decision maker at each step. While the process converges to the optimal policy, our approach is designed for situations in which computing the optimal policy is infeasible. We provide examples of the process on several large decision problems, showing that, for these examples, the process constructs valuable (but sub-optimal) policies before the optimal policy would be available by traditional methods. The adaptation to situations of sequential choice under uncertainty of decision criteria which deviate from (subjective) expected utility raises the problem of ensuring the selection of a nondominated strategy. In particular, when following the suggestion of Machina and McClennen of giving up separability (also known as consequentialism), which requires the choice of a substrategy in a subtree to depend only on data relevant to that subtree, one must renounce to the use of dynamic programming, since Bellman's principle is no longer valid. An interpretation of McClennen's resolute choice, based on cooperation between the successive Selves of the decision maker, is proposed. Implementations of resolute choice which prevent Money Pumps negative prices of information or, more generally, choices of dominated strategies, while remaining computationally tractable, are proposed. This paper proposes a method to find the actual state of a complex dynamic system from information coming from the sensors on the system himself, or on its environment. The nominal evolution of the system is a priori known and can be modeled (by an expert, for example), by different methods. In this paper, the Petri nets have been chosen. Contrary to the usual use of the Petri nets, the initial state of the system is unknown. So a degree of belief is bound to each places, or set of places. The theory used to model this uncertainty is the Dempster-Shafer's one which is well adapted to this type of problems. From the given Petri net characterizing the nominal evolution of the dynamic system, and from the observation inputs, the proposed method allows to determine according to the reliability of the model and the inputs, the state of the system at any time. Qualitative probabilistic reasoning in a Bayesian network often reveals tradeoffs: relationships that are ambiguous due to competing qualitative influences. We present two techniques that combine qualitative and numeric probabilistic reasoning to resolve such tradeoffs, inferring the qualitative relationship between nodes in a Bayesian network. The first approach incrementally marginalizes nodes that contribute to the ambiguous qualitative relationships. The second approach evaluates approximate Bayesian networks for bounds of probability distributions, and uses these bounds to determinate qualitative relationships in question. This approach is also incremental in that the algorithm refines the state spaces of random variables for tighter bounds until the qualitative relationships are resolved. Both approaches provide systematic methods for tradeoff resolution at potentially lower computational cost than application of purely numeric methods. We present locally complete inference rules for probabilistic deduction from taxonomic and probabilistic knowledge-bases over conjunctive events. Crucially, in contrast to similar inference rules in the literature, our inference rules are locally complete for conjunctive events and under additional taxonomic knowledge. We discover that our inference rules are extremely complex and that it is at first glance not clear at all where the deduced tightest bounds come from. Moreover, analyzing the global completeness of our inference rules, we find examples of globally very incomplete probabilistic deductions. More generally, we even show that all systems of inference rules for taxonomic and probabilistic knowledge-bases over conjunctive events are globally incomplete. We conclude that probabilistic deduction by the iterative application of inference rules on interval restrictions for conditional probabilities, even though considered very promising in the literature so far, seems very limited in its field of application. The efficiency of algorithms using secondary structures for probabilistic inference in Bayesian networks can be improved by exploiting independence relations induced by evidence and the direction of the links in the original network. In this paper we present an algorithm that on-line exploits independence relations induced by evidence and the direction of the links in the original network to reduce both time and space costs. Instead of multiplying the conditional probability distributions for the various cliques, we determine on-line which potentials to multiply when a message is to be produced. The performance improvement of the algorithm is emphasized through empirical evaluations involving large real world Bayesian networks, and we compare the method with the HUGIN and Shafer-Shenoy inference algorithms. Several authors have explained that the likelihood ratio measures the strength of the evidence represented by observations in statistical problems. This idea works fine when the goal is to evaluate the strength of the available evidence for a simple hypothesis versus another simple hypothesis. However, the applicability of this idea is limited to simple hypotheses because the likelihood function is primarily defined on points (simple hypotheses) of the parameter space. In this paper we define a general weight of evidence that is applicable to both simple and composite hypotheses. It is based on the Dempster-Shafer concept of plausibility and is shown to be a generalization of the likelihood ratio. Functional models are of a fundamental importance for the general weight of evidence proposed in this paper. The relevant concepts and ideas are explained by means of a familiar urn problem and the general analysis of a real-world medical problem is presented. In this paper we address the problem of discretization in the context of learning Bayesian networks (BNs) from data containing both continuous and discrete variables. We describe a new technique for multivariate discretization, whereby each continuous variable is discretized while taking into account its interaction with the other variables. The technique is based on the use of a Bayesian scoring metric that scores the discretization policy for a continuous variable given a BN structure and the observed data. Since the metric is relative to the BN structure currently being evaluated, the discretization of a variable needs to be dynamically adjusted as the BN structure changes. Distributed knowledge based applications in open domain rely on common sense information which is bound to be uncertain and incomplete. To draw the useful conclusions from ambiguous data, one must address uncertainties and conflicts incurred in a holistic view. No integrated frameworks are viable without an in-depth analysis of conflicts incurred by uncertainties. In this paper, we give such an analysis and based on the result, propose an integrated framework. Our framework extends definite argumentation theory to model uncertainty. It supports three views over conflicting and uncertain knowledge. Thus, knowledge engineers can draw different conclusions depending on the application context (i.e. view). We also give an illustrative example on strategical decision support to show the practical usefulness of our framework. The process of diagnosis involves learning about the state of a system from various observations of symptoms or findings about the system. Sophisticated Bayesian (and other) algorithms have been developed to revise and maintain beliefs about the system as observations are made. Nonetheless, diagnostic models have tended to ignore some common sense reasoning exploited by human diagnosticians; In particular, one can learn from which observations have not been made, in the spirit of conversational implicature. There are two concepts that we describe to extract information from the observations not made. First, some symptoms, if present, are more likely to be reported before others. Second, most human diagnosticians and expert systems are economical in their data-gathering, searching first where they are more likely to find symptoms present. Thus, there is a desirable bias toward reporting symptoms that are present. We develop a simple model for these concepts that can significantly improve diagnostic inference. There is evidence that the numbers in probabilistic inference don't really matter. This paper considers the idea that we can make a probabilistic model simpler by making fewer distinctions. Unfortunately, the level of a Bayesian network seems too coarse; it is unlikely that a parent will make little difference for all values of the other parents. In this paper we consider an approximation scheme where distinctions can be ignored in some contexts, but not in other contexts. We elaborate on a notion of a parent context that allows a structured context-specific decomposition of a probability distribution and the associated probabilistic inference scheme called probabilistic partial evaluation (Poole 1997). This paper shows a way to simplify a probabilistic model by ignoring distinctions which have similar probabilities, a method to exploit the simpler model, a bound on the resulting errors, and some preliminary empirical results on simple networks. It was recently shown that the problem of decoding messages transmitted through a noisy channel can be formulated as a belief updating task over a probabilistic network [McEliece]. Moreover, it was observed that iterative application of the (linear time) Pearl's belief propagation algorithm designed for polytrees outperformed state of the art decoding algorithms, even though the corresponding networks may have many cycles. This paper demonstrates empirically that an approximation algorithm approx-mpe for solving the most probable explanation (MPE) problem, developed within the recently proposed mini-bucket elimination framework [Dechter96], outperforms iterative belief propagation on classes of coding networks that have bounded induced width. Our experiments suggest that approximate MPE decoders can be good competitors to the approximate belief updating decoders. This paper describes a decision theoretic formulation of learning the graphical structure of a Bayesian Belief Network from data. This framework subsumes the standard Bayesian approach of choosing the model with the largest posterior probability as the solution of a decision problem with a 0-1 loss function and allows the use of more general loss functions able to trade-off the complexity of the selected model and the error of choosing an oversimplified model. A new class of loss functions, called disintegrable, is introduced, to allow the decision problem to match the decomposability of the graphical model. With this class of loss functions, the optimal solution to the decision problem can be found using an efficient bottom-up search strategy. One of the benefits of belief networks and influence diagrams is that so much knowledge is captured in the graphical structure. In particular, statements of conditional irrelevance (or independence) can be verified in time linear in the size of the graph. To resolve a particular inference query or decision problem, only some of the possible states and probability distributions must be specified, the "requisite information." This paper presents a new, simple, and efficient "Bayes-ball" algorithm which is well-suited to both new students of belief networks and state of the art implementations. The Bayes-ball algorithm determines irrelevant sets and requisite information more efficiently than existing methods, and is linear in the size of the graph for belief networks and influence diagrams. A constant rebalanced portfolio is an asset allocation algorithm which keeps the same distribution of wealth among a set of assets along a period of time. Recently, there has been work on on-line portfolio selection algorithms which are competitive with the best constant rebalanced portfolio determined in hindsight. By their nature, these algorithms employ the assumption that high returns can be achieved using a fixed asset allocation strategy. However, stock markets are far from being stationary and in many cases the wealth achieved by a constant rebalanced portfolio is much smaller than the wealth achieved by an ad-hoc investment strategy that adapts to changes in the market. In this paper we present an efficient Bayesian portfolio selection algorithm that is able to track a changing market. We also describe a simple extension of the algorithm for the case of a general transaction cost, including the transactions cost models recently investigated by Blum and kalai. We provide a simple analysis of the competitiveness of the algorithm and check its performance on real stock data from the New York Stock Exchange accumulated during a 22-year period. AThe paper gives a few arguments in favour of the use of chain graphs for description of probabilistic conditional independence structures. Every Bayesian network model can be equivalently introduced by means of a factorization formula with respect to a chain graph which is Markov equivalent to the Bayesian network. A graphical characterization of such graphs is given. The class of equivalent graphs can be represented by a distinguished graph which is called the largest chain graph. The factorization formula with respect to the largest chain graph is a basis of a proposal of how to represent the corresponding (discrete) probability distribution in a computer (i.e. parametrize it). This way does not depend on the choice of a particular Bayesian network from the class of equivalent networks and seems to be the most efficient way from the point of view of memory demands. A separation criterion for reading independency statements from a chain graph is formulated in a simpler way. It resembles the well-known d-separation criterion for Bayesian networks and can be implemented locally. We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman--Stutz asymptotic approximation for model posterior probability and (2) the Expectation--Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples. In the real world, insufficient information, limited computation resources, and complex problem structures often force an autonomous agent to make a decision in time less than that required to solve the problem at hand completely. Flexible and approximate computations are two approaches to decision making under limited computation resources. Flexible computation helps an agent to flexibly allocate limited computation resources so that the overall system utility is maximized. Approximate computation enables an agent to find the best satisfactory solution within a deadline. In this paper, we present two state-space reduction methods for flexible and approximate computation: quantitative reduction to deal with inaccurate heuristic information, and structural reduction to handle complex problem structures. These two methods can be applied successively to continuously improve solution quality if more computation is available. Our results show that these reduction methods are effective and efficient, finding better solutions with less computation than some existing well-known methods. The task of expert finding has been getting increasing attention in information retrieval literature. However, the current state-of-the-art is still lacking in principled approaches for combining different sources of evidence in an optimal way. This paper explores the usage of learning to rank methods as a principled approach for combining multiple estimators of expertise, derived from the textual contents, from the graph-structure with the citation patterns for the community of experts, and from profile information about the experts. Experiments made over a dataset of academic publications, for the area of Computer Science, attest for the adequacy of the proposed approaches. In the winter of 1967 Cambridge radio astronomers discovered a new type of radio source of such an artificial seeming nature that for a few weeks some members of the group had to seriously consider whether they had discovered an extraterrestrial intelligence. Although their investigations lead them to a natural explanation (they had discovered pulsars), they had discussed the implications if it was indeed an artificial source: how to verify such a conclusion and how to announce it, and whether such a discovery might be dangerous. In this they presaged many of the components of the SETI Detection Protocols and the proposed Reply Protocols which have been used to guide the responses of groups dealing with the detection of an extraterrestrial intelligence. These Protocols were only established some twenty five years later in the 1990s and 2000s. Using contemporary and near-contemporary documentation and later recollections, this paper discusses in detail what happened that winter. Wide-angle sonar mapping of the environment by mobile robot is nontrivial due to several sources of uncertainty: dropouts due to "specular" reflections, obstacle location uncertainty due to the wide beam, and distance measurement error. Earlier papers address the latter problems, but dropouts remain a problem in many environments. We present an approach that lifts the overoptimistic independence assumption used in earlier work, and use Bayes nets to represent the dependencies between objects of the model. Objects of the model consist of readings, and of regions in which "quasi location invariance" of the (possible) obstacles exists, with respect to the readings. Simulation supports the method's feasibility. The model is readily extensible to allow for prior distributions, as well as other types of sensing operations. We present an algorithm for arc reversal in Bayesian networks with tree-structured conditional probability tables, and consider some of its advantages, especially for the simulation of dynamic probabilistic networks. In particular, the method allows one to produce CPTs for nodes involved in the reversal that exploit regularities in the conditional distributions. We argue that this approach alleviates some of the overhead associated with arc reversal, plays an important role in evidence integration and can be used to restrict sampling of variables in DPNs. We also provide an algorithm that detects the dynamic irrelevance of state variables in forward simulation. This algorithm exploits the structured CPTs in a reversed network to determine, in a time-independent fashion, the conditions under which a variable does or does not need to be sampled. Recently several researchers have investigated techniques for using data to learn Bayesian networks containing compact representations for the conditional probability distributions (CPDs) stored at each node. The majority of this work has concentrated on using decision-tree representations for the CPDs. In addition, researchers typically apply non-Bayesian (or asymptotically Bayesian) scoring functions such as MDL to evaluate the goodness-of-fit of networks to the data. In this paper we investigate a Bayesian approach to learning Bayesian networks that contain the more general decision-graph representations of the CPDs. First, we describe how to evaluate the posterior probability that is, the Bayesian score of such a network, given a database of observed cases. Second, we describe various search spaces that can be used, in conjunction with a scoring function and a search procedure, to identify one or more high-scoring networks. Finally, we present an experimental evaluation of the search spaces, using a greedy algorithm and a Bayesian scoring function. It has been shown that a class of probabilistic domain models cannot be learned correctly by several existing algorithms which employ a single-link look ahead search. When a multi-link look ahead search is used, the computational complexity of the learning algorithm increases. We study how to use parallelism to tackle the increased complexity in learning such models and to speed up learning in large domains. An algorithm is proposed to decompose the learning task for parallel processing. A further task decomposition is used to balance load among processors and to increase the speed-up and efficiency. For learning from very large datasets, we present a regrouping of the available processors such that slow data access through file can be replaced by fast memory access. Our implementation in a parallel computer demonstrates the effectiveness of the algorithm. Robust Bayesian inference is the calculation of posterior probability bounds given perturbations in a probabilistic model. This paper focuses on perturbations that can be expressed locally in Bayesian networks through convex sets of distributions. Two approaches for combination of local models are considered. The first approach takes the largest set of joint distributions that is compatible with the local sets of distributions; we show how to reduce this type of robust inference to a linear programming problem. The second approach takes the convex hull of joint distributions generated from the local sets of distributions; we demonstrate how to apply interior-point optimization methods to generate posterior bounds and how to generate approximations that are guaranteed to converge to correct posterior bounds. We also discuss calculation of bounds for expected utilities and variances, and global perturbation models. We present a method for calculation of myopic value of information in influence diagrams (Howard & Matheson, 1981) based on the strong junction tree framework (Jensen, Jensen & Dittmer, 1994). The difference in instantiation order in the influence diagrams is reflected in the corresponding junction trees by the order in which the chance nodes are marginalized. This order of marginalization can be changed by table expansion and in effect the same junction tree with expanded tables may be used for calculating the expected utility for scenarios with different instantiation order. We also compare our method to the classic method of modeling different instantiation orders in the same influence diagram. This paper investigates the problem of finding a preference relation on a set of acts from the knowledge of an ordering on events (subsets of states of the world) describing the decision-maker (DM)s uncertainty and an ordering of consequences of acts, describing the DMs preferences. However, contrary to classical approaches to decision theory, we try to do it without resorting to any numerical representation of utility nor uncertainty, and without even using any qualitative scale on which both uncertainty and preference could be mapped. It is shown that although many axioms of Savage theory can be preserved and despite the intuitive appeal of the method for constructing a preference over acts, the approach is inconsistent with a probabilistic representation of uncertainty, but leads to the kind of uncertainty theory encountered in non-monotonic reasoning (especially preferential and rational inference), closely related to possibility theory. Moreover the method turns out to be either very little decisive or to lead to very risky decisions, although its basic principles look sound. This paper raises the question of the very possibility of purely symbolic approaches to Savage-like decision-making under uncertainty and obtains preliminary negative results. There is an obvious need for improving the performance and accuracy of a Bayesian network as new data is observed. Because of errors in model construction and changes in the dynamics of the domains, we cannot afford to ignore the information in new data. While sequential update of parameters for a fixed structure can be accomplished using standard techniques, sequential update of network structure is still an open problem. In this paper, we investigate sequential update of Bayesian networks were both parameters and structure are expected to change. We introduce a new approach that allows for the flexible manipulation of the tradeoff between the quality of the learned networks and the amount of information that is maintained about past observations. We formally describe our approach including the necessary modifications to the scoring functions for learning Bayesian networks, evaluate its effectiveness through an empirical study, and extend it to the case of missing data. "Background subtraction" is an old technique for finding moving objects in a video sequence for example, cars driving on a freeway. The idea is that subtracting the current image from a timeaveraged background image will leave only nonstationary objects. It is, however, a crude approximation to the task of classifying each pixel of the current image; it fails with slow-moving objects and does not distinguish shadows from moving objects. The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. We learn a mixture-of-Gaussians classification model for each pixel using an unsupervised technique- an efficient, incremental version of EM. Unlike the standard image-averaging approach, this automatically updates the mixture component for each class according to likelihood of membership; hence slow-moving objects are handled perfectly. Our approach also identifies and eliminates shadows much more effectively than other techniques such as thresholding. Application of this method as part of the Roadwatch traffic surveillance project is expected to result in significant improvements in vehicle identification and tracking. A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for learning BNs, however, attempt to optimize another criterion (usually likelihood, possibly augmented with a regularizing term), which is independent of the distribution of queries that are posed. This paper takes the "performance criteria" seriously, and considers the challenge of computing the BN whose performance - read "accuracy over the distribution of queries" - is optimal. We show that many aspects of this learning task are more difficult than the corresponding subtasks in the standard model. Conditioning is the generally agreed-upon method for updating probability distributions when one learns that an event is certainly true. But it has been argued that we need other rules, in particular the rule of cross-entropy minimization, to handle updates that involve uncertain information. In this paper we re-examine such a case: van Fraassen's Judy Benjamin problem, which in essence asks how one might update given the value of a conditional probability. We argue that -- contrary to the suggestions in the literature -- it is possible to use simple conditionalization in this case, and thereby obtain answers that agree fully with intuition. This contrasts with proposals such as cross-entropy, which are easier to apply but can give unsatisfactory answers. Based on the lessons from this example, we speculate on some general philosophical issues concerning probability update. Decision theory has become widely accepted in the AI community as a useful framework for planning and decision making. Applying the framework typically requires elicitation of some form of probability and utility information. While much work in AI has focused on providing representations and tools for elicitation of probabilities, relatively little work has addressed the elicitation of utility models. This imbalance is not particularly justified considering that probability models are relatively stable across problem instances, while utility models may be different for each instance. Spending large amounts of time on elicitation can be undesirable for interactive systems used in low-stakes decision making and in time-critical decision making. In this paper we investigate the issues of reasoning with incomplete utility models. We identify patterns of problem instances where plans can be proved to be suboptimal if the (unknown) utility function satisfies certain conditions. We present an approach to planning and decision making that performs the utility elicitation incrementally and in a way that is informed by the domain model. We describe work to control graphics rendering under limited computational resources by taking a decision-theoretic perspective on perceptual costs and computational savings of approximations. The work extends earlier work on the control of rendering by introducing methods and models for computing the expected cost associated with degradations of scene components. The expected cost is computed by considering the perceptual cost of degradations and a probability distribution over the attentional focus of viewers. We review the critical literature describing findings on visual search and attention, discuss the implications of the findings, and introduce models of expected perceptual cost. Finally, we discuss policies that harness information about the expected cost of scene components. A pseudo independent (PI) model is a probabilistic domain model (PDM) where proper subsets of a set of collectively dependent variables display marginal independence. PI models cannot be learned correctly by many algorithms that rely on a single link search. Earlier work on learning PI models has suggested a straightforward multi-link search algorithm. However, when a domain contains recursively embedded PI submodels, it may escape the detection of such an algorithm. In this paper, we propose an improved algorithm that ensures the learning of all embedded PI submodels whose sizes are upper bounded by a predetermined parameter. We show that this improved learning capability only increases the complexity slightly beyond that of the previous algorithm. The performance of the new algorithm is demonstrated through experiment. Decomposable models and Bayesian networks can be defined as sequences of oligo-dimensional probability measures connected with operators of composition. The preliminary results suggest that the probabilistic models allowing for effective computational procedures are represented by sequences possessing a special property; we shall call them perfect sequences. The paper lays down the elementary foundation necessary for further study of iterative application of operators of composition. We believe to develop a technique describing several graph models in a unifying way. We are convinced that practically all theoretical results and procedures connected with decomposable models and Bayesian networks can be translated into the terminology introduced in this paper. For example, complexity of computational procedures in these models is closely dependent on possibility to change the ordering of oligo-dimensional measures defining the model. Therefore, in this paper, lot of attention is paid to possibility to change ordering of the operators of composition. The efficiency of inference in both the Hugin and, most notably, the Shafer-Shenoy architectures can be improved by exploiting the independence relations induced by the incoming messages of a clique. That is, the message to be sent from a clique can be computed via a factorization of the clique potential in the form of a junction tree. In this paper we show that by exploiting such nested junction trees in the computation of messages both space and time costs of the conventional propagation methods may be reduced. The paper presents a structured way of exploiting the nested junction trees technique to achieve such reductions. The usefulness of the method is emphasized through a thorough empirical evaluation involving ten large real-world Bayesian networks and the Hugin inference algorithm. We consider probabilistic inference in general hybrid networks, which include continuous and discrete variables in an arbitrary topology. We reexamine the question of variable discretization in a hybrid network aiming at minimizing the information loss induced by the discretization. We show that a nonuniform partition across all variables as opposed to uniform partition of each variable separately reduces the size of the data structures needed to represent a continuous function. We also provide a simple but efficient procedure for nonuniform partition. To represent a nonuniform discretization in the computer memory, we introduce a new data structure, which we call a Binary Split Partition (BSP) tree. We show that BSP trees can be an exponential factor smaller than the data structures in the standard uniform discretization in multiple dimensions and show how the BSP trees can be used in the standard join tree algorithm. We show that the accuracy of the inference process can be significantly improved by adjusting discretization with evidence. We construct an iterative anytime algorithm that gradually improves the quality of the discretization and the accuracy of the answer on a query. We provide empirical evidence that the algorithm converges. In most current applications of belief networks, domain knowledge is represented by a single belief network that applies to all problem instances in the domain. In more complex domains, problem-specific models must be constructed from a knowledge base encoding probabilistic relationships in the domain. Most work in knowledge-based model construction takes the rule as the basic unit of knowledge. We present a knowledge representation framework that permits the knowledge base designer to specify knowledge in larger semantically meaningful units which we call network fragments. Our framework provides for representation of asymmetric independence and canonical intercausal interaction. We discuss the combination of network fragments to form problem-specific models to reason about particular problem instances. The framework is illustrated using examples from the domain of military situation awareness. This paper introduces a computational framework for reasoning in Bayesian belief networks that derives significant advantages from focused inference and relevance reasoning. This framework is based on d -separation and other simple and computationally efficient techniques for pruning irrelevant parts of a network. Our main contribution is a technique that we call relevance-based decomposition. Relevance-based decomposition approaches belief updating in large networks by focusing on their parts and decomposing them into partially overlapping subnetworks. This makes reasoning in some intractable networks possible and, in addition, often results in significant speedup, as the total time taken to update all subnetworks is in practice often considerably less than the time taken to update the network as a whole. We report results of empirical tests that demonstrate practical significance of our approach. A submarine's sonar team is responsible for detecting, localising and classifying targets using information provided by the platform's sensor suite. The information used to make these assessments is typically uncertain and/or incomplete and is likely to require a measure of confidence in its reliability. Moreover, improvements in sensor and communication technology are resulting in increased amounts of on-platform and off-platform information available for evaluation. This proliferation of imprecise information increases the risk of overwhelming the operator. To assist the task of localisation and classification a concept demonstration decision aid (Horizon), based on evidential reasoning, has been developed. Horizon is an information fusion software package for representing and fusing imprecise information about the state of the world, expressed across suitable frames of reference. The Horizon software is currently at prototype stage. By discussing several examples, the theory of generalized functional models is shown to be very natural for modeling some situations of reasoning under uncertainty. A generalized functional model is a pair (f, P) where f is a function describing the interactions between a parameter variable, an observation variable and a random source, and P is a probability distribution for the random source. Unlike traditional functional models, generalized functional models do not require that there is only one value of the parameter variable that is compatible with an observation and a realization of the random source. As a consequence, the results of the analysis of a generalized functional model are not expressed in terms of probability distributions but rather by support and plausibility functions. The analysis of a generalized functional model is very logical and is inspired from ideas already put forward by R.A. Fisher in his theory of fiducial probability. There is a brief description of the probabilistic causal graph model for representing, reasoning with, and learning causal structure using Bayesian networks. It is then argued that this model is closely related to how humans reason with and learn causal structure. It is shown that studies in psychology on discounting (reasoning concerning how the presence of one cause of an effect makes another cause less probable) support the hypothesis that humans reach the same judgments as algorithms for doing inference in Bayesian networks. Next, it is shown how studies by Piaget indicate that humans learn causal structure by observing the same independencies and dependencies as those used by certain algorithms for learning the structure of a Bayesian network. Based on this indication, a subjective definition of causality is forwarded. Finally, methods for further testing the accuracy of these claims are discussed. Bayesian knowledge bases (BKBs) are a generalization of Bayes networks and weighted proof graphs (WAODAGs), that allow cycles in the causal graph. Reasoning in BKBs requires finding the most probable inferences consistent with the evidence. The cost-sharing heuristic for finding least-cost explanations in WAODAGs was presented and shown to be effective by Charniak and Husain. However, the cycles in BKBs would make the definition of cost-sharing cyclic as well, if applied directly to BKBs. By treating the defining equations of cost-sharing as a system of equations, one can properly define an admissible cost-sharing heuristic for BKBs. Empirical evaluation shows that cost-sharing improves performance significantly when applied to BKBs. We introduce a new interpretation of two related notions - conditional utility and utility independence. Unlike the traditional interpretation, the new interpretation renders the notions the direct analogues of their probabilistic counterparts. To capture these notions formally, we appeal to the notion of utility distribution, introduced in previous paper. We show that utility distributions, which have a structure that is identical to that of probability distributions, can be viewed as a special case of an additive multiattribute utility functions, and show how this special case permits us to capture the novel senses of conditional utility and utility independence. Finally, we present the notion of utility networks, which do for utilities what Bayesian networks do for probabilities. Specifically, utility networks exploit the new interpretation of conditional utility and utility independence to compactly represent a utility distribution. Default logic encounters some conceptual difficulties in representing common sense reasoning tasks. We argue that we should not try to formulate modular default rules that are presumed to work in all or most circumstances. We need to take into account the importance of the context which is continuously evolving during the reasoning process. Sequential thresholding is a quantitative counterpart of default logic which makes explicit the role context plays in the construction of a non-monotonic extension. We present a semantic characterization of generic non-monotonic reasoning, as well as the instantiations pertaining to default logic and sequential thresholding. This provides a link between the two mechanisms as well as a way to integrate the two that can be beneficial to both. Recursive graphical models usually underlie the statistical modelling concerning probabilistic expert systems based on Bayesian networks. This paper defines a version of these models, denoted as recursive exponential models, which have evolved by the desire to impose sophisticated domain knowledge onto local fragments of a model. Besides the structural knowledge, as specified by a given model, the statistical modelling may also include expert opinion about the values of parameters in the model. It is shown how to translate imprecise expert knowledge into approximately conjugate prior distributions. Based on possibly incomplete data, the score and the observed information are derived for these models. This accounts for both the traditional score and observed information, derived as derivatives of the log-likelihood, and the posterior score and observed information, derived as derivatives of the log-posterior distribution. Throughout the paper the specialization into recursive graphical models is accounted for by a simple example. The criterion commonly used in directed acyclic graphs (dags) for testing graphical independence is the well-known d-separation criterion. It allows us to build graphical representations of dependency models (usually probabilistic dependency models) in the form of belief networks, which make easy interpretation and management of independence relationships possible, without reference to numerical parameters (conditional probabilities). In this paper, we study the following combinatorial problem: finding the minimum d-separating set for two nodes in a dag. This set would represent the minimum information (in the sense of minimum number of variables) necessary to prevent these two nodes from influencing each other. The solution to this basic problem and some of its extensions can be useful in several ways, as we shall see later. Our solution is based on a two-step process: first, we reduce the original problem to the simpler one of finding a minimum separating set in an undirected graph, and second, we develop an algorithm for solving it. This paper works through the optimization of a real world planning problem, with a combination of a generative planning tool and an influence diagram solver. The problem is taken from an existing application in the domain of oil spill emergency response. The planning agent manages constraints that order sets of feasible equipment employment actions. This is mapped at an intermediate level of abstraction onto an influence diagram. In addition, the planner can apply a surveillance operator that determines observability of the state---the unknown trajectory of the oil. The uncertain world state and the objective function properties are part of the influence diagram structure, but not represented in the planning agent domain. By exploiting this structure under the constraints generated by the planning agent, the influence diagram solution complexity simplifies considerably, and an optimum solution to the employment problem based on the objective function is found. Finding this optimum is equivalent to the simultaneous evaluation of a range of plans. This result is an example of bounded optimality, within the limitations of this hybrid generative planner and influence diagram architecture. We developed the language of Modifiable Temporal Belief Networks (MTBNs) as a structural and temporal extension of Bayesian Belief Networks (BNs) to facilitate normative temporal and causal modeling under uncertainty. In this paper we present definitions of the model, its components, and its fundamental properties. We also discuss how to represent various types of temporal knowledge, with an emphasis on hybrid temporal-explicit time modeling, dynamic structures, avoiding causal temporal inconsistencies, and dealing with models that involve simultaneously actions (decisions) and causal and non-causal associations. We examine the relationships among BNs, Modifiable Belief Networks, and MTBNs with a single temporal granularity, and suggest areas of application suitable to each one of them. Graphical Markov models use graphs, either undirected, directed, or mixed, to represent possible dependences among statistical variables. Applications of undirected graphs (UDGs) include models for spatial dependence and image analysis, while acyclic directed graphs (ADGs), which are especially convenient for statistical analysis, arise in such fields as genetics and psychometrics and as models for expert systems and Bayesian belief networks. Lauritzen, Wermuth and Frydenberg (LWF) introduced a Markov property for chain graphs, which are mixed graphs that can be used to represent simultaneously both causal and associative dependencies and which include both UDGs and ADGs as special cases. In this paper an alternative Markov property (AMP) for chain graphs is introduced, which in some ways is a more direct extension of the ADG Markov property than is the LWF property for chain graph. A nonmonotonic logic of thresholded generalizations is presented. Given propositions A and B from a language L and a positive integer k, the thresholded generalization A=>B{k} means that the conditional probability P(B|A) falls short of one by no more than c*d^k. A two-level probability structure is defined. At the lower level, a model is defined to be a probability function on L. At the upper level, there is a probability distribution over models. A definition is given of what it means for a collection of thresholded generalizations to entail another thresholded generalization. This nonmonotonic entailment relation, called "entailment in probability", has the feature that its conclusions are "probabilistically trustworthy" meaning that, given true premises, it is improbable that an entailed conclusion would be false. A procedure is presented for ascertaining whether any given collection of premises entails any given conclusion. It is shown that entailment in probability is closely related to Goldszmidt and Pearl's System-Z^+, thereby demonstrating that the conclusions of System-Z^+ are probabilistically trustworthy. This paper deals with a scene recognition system in a robotics contex. The general problem is to match images with a priori descriptions. A typical mission would consist in identifying an object in an installation with a vision system situated at the end of a manipulator and with a human operator provided description, formulated in a pseudo-natural language, and possibly redundant. The originality of this work comes from the nature of the description, from the special attention given to the management of imprecision and uncertainty in the interpretation process and from the way to assess the description redundancy so as to reinforce the overall matching likelihood. An algorithm is developed for finding a close to optimal junction tree of a given graph G. The algorithm has a worst case complexity O(c^k n^a) where a and c are constants, n is the number of vertices, and k is the size of the largest clique in a junction tree of G in which this size is minimized. The algorithm guarantees that the logarithm of the size of the state space of the heaviest clique in the junction tree produced is less than a constant factor off the optimal value. When k = O(log n), our algorithm yields a polynomial inference algorithm for Bayesian networks. Bayesian networks provide a language for qualitatively representing the conditional independence properties of a distribution. This allows a natural and compact representation of the distribution, eases knowledge acquisition, and supports effective inference algorithms. It is well-known, however, that there are certain independencies that we cannot capture qualitatively within the Bayesian network structure: independencies that hold only in certain contexts, i.e., given a specific assignment of values to certain variables. In this paper, we propose a formal notion of context-specific independence (CSI), based on regularities in the conditional probability tables (CPTs) at a node. We present a technique, analogous to (and based on) d-separation, for determining when such independence holds in a given network. We then focus on a particular qualitative representation scheme - tree-structured CPTs - for capturing CSI. We suggest ways in which this representation can be used to support effective inference algorithms. In particular, we present a structural decomposition of the resulting network which can improve the performance of clustering algorithms, and an alternative algorithm based on cutset conditioning. We develop and extend existing decision-theoretic methods for troubleshooting a nonfunctioning device. Traditionally, diagnosis with Bayesian networks has focused on belief updating---determining the probabilities of various faults given current observations. In this paper, we extend this paradigm to include taking actions. In particular, we consider three classes of actions: (1) we can make observations regarding the behavior of a device and infer likely faults as in traditional diagnosis, (2) we can repair a component and then observe the behavior of the device to infer likely faults, and (3) we can change the configuration of the device, observe its new behavior, and infer the likelihood of faults. Analysis of latter two classes of troubleshooting actions requires incorporating notions of persistence into the belief-network formalism used for probabilistic inference. The paper presents an efficient method for simulating the tails of a target variable Z=h(X) which depends on a set of basic variables X=(X_1, ..., X_n). To this aim, variables X_i, i=1, ..., n are sequentially simulated in such a manner that Z=h(x_1, ..., x_i-1, X_i, ..., X_n) is guaranteed to be in the tail of Z. When this method is difficult to apply, an alternative method is proposed, which leads to a low rejection proportion of sample values, when compared with the Monte Carlo method. Both methods are shown to be very useful to perform a sensitivity analysis of Bayesian networks, when very large confidence intervals for the marginal/conditional probabilities are required, as in reliability or risk analysis. The methods are shown to behave best when all scores coincide. The required modifications for this to occur are discussed. The methods are illustrated with several examples and one example of application to a real case is used to illustrate the whole process. Decision analysis (DA) and the rich set of tools developed by researchers in decision making under uncertainty show great potential to penetrate the technological content of the products and services delivered by firms in a variety of industries as well as the business processes used to deliver those products and services to market. In this paper I describe work in progress at Sun Microsystems in the application of decision-analytic methods to Operational Decision Making (ODM) in its World-Wide Operations (WWOPS) Business Management Group. Working with membersof product engineering, marketing, and sales, operations planners from WWOPS have begun to use a decision-analytic framework called SCRAM (Supply Communication/Risk Assessment and Management) to structure and solve problems in product planning, tracking, and transition. Concepts such as information value provide a powerful method of managing huge information sets and thereby enable managers to focus attention on factors that matter most for their business. Finally, our process-oriented introduction of decision-analytic methods to Sun managers has led to a focused effort to develop decision support software based on methods from decision making under uncertainty. Approaches to learning Bayesian networks from data typically combine a scoring function with a heuristic search procedure. Given a Bayesian network structure, many of the scoring functions derived in the literature return a score for the entire equivalence class to which the structure belongs. When using such a scoring function, it is appropriate for the heuristic search algorithm to search over equivalence classes of Bayesian networks as opposed to individual structures. We present the general formulation of a search space for which the states of the search correspond to equivalence classes of structures. Using this space, any one of a number of heuristic search algorithms can easily be applied. We compare greedy search performance in the proposed search space to greedy search performance in a search space for which the states correspond to individual Bayesian network structures. We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that the CS measure is the most accurate. It is shown that the ability of the interval probability representation to capture epistemological independence is severely limited. Two events are epistemologically independent if knowledge of the first event does not alter belief (i.e., probability bounds) about the second. However, independence in this form can only exist in a 2-monotone probability function in degenerate cases i.e., if the prior bounds are either point probabilities or entirely vacuous. Additional limitations are characterized for other classes of lower probabilities as well. It is argued that these phenomena are simply a matter of interpretation. They appear to be limitations when one interprets probability bounds as a measure of epistemological indeterminacy (i.e., uncertainty arising from a lack of knowledge), but are exactly as one would expect when probability intervals are interpreted as representations of ontological indeterminacy (indeterminacy introduced by structural approximations). The ontological interpretation is introduced and discussed. Quasi-Bayesian theory uses convex sets of probability distributions and expected loss to represent preferences about plans. The theory focuses on decision robustness, i.e., the extent to which plans are affected by deviations in subjective assessments of probability. The present work presents solutions for plan generation when robustness of probability assessments must be included: plans contain information about the robustness of certain actions. The surprising result is that some problems can be solved faster in the Quasi-Bayesian framework than within usual Bayesian theory. We investigate this on the planning to observe problem, i.e., an agent must decide whether to take new observations or not. The fundamental question is: How, and how much, to search for a "best" plan, based on the robustness of probability assessments? Plan generation algorithms are derived in the context of material classification with an acoustic robotic probe. A package that constructs Quasi-Bayesian plans is available through anonymous ftp. This paper provides a formal and practical framework for sound abstraction of probabilistic actions. We start by precisely defining the concept of sound abstraction within the context of finite-horizon planning (where each plan is a finite sequence of actions). Next we show that such abstraction cannot be performed within the traditional probabilistic action representation, which models a world with a single probability distribution over the state space. We then present the constraint mass assignment representation, which models the world with a set of probability distributions and is a generalization of mass assignment representations. Within this framework, we present sound abstraction procedures for three types of action abstraction. We end the paper with discussions and related work on sound and approximate abstraction. We give pointers to papers in which we discuss other sound abstraction-related issues, including applications, estimating loss due to abstraction, and automatically generating abstraction hierarchies. This paper discusses belief revision under uncertain inputs in the framework of possibility theory. Revision can be based on two possible definitions of the conditioning operation, one based on min operator which requires a purely ordinal scale only, and another based on product, for which a richer structure is needed, and which is a particular case of Dempster's rule of conditioning. Besides, revision under uncertain inputs can be understood in two different ways depending on whether the input is viewed, or not, as a constraint to enforce. Moreover, it is shown that M.A. Williams' transmutations, originally defined in the setting of Spohn's functions, can be captured in this framework, as well as Boutilier's natural revision. Many algorithms for processing probabilistic networks are dependent on the topological properties of the problem's structure. Such algorithms (e.g., clustering, conditioning) are effective only if the problem has a sparse graph captured by parameters such as tree width and cycle-cut set size. In this paper we initiate a study to determine the potential of structure-based algorithms in real-life applications. We analyze empirically the structural properties of problems coming from the circuit diagnosis domain. Specifically, we locate those properties that capture the effectiveness of clustering and conditioning as well as of a family of conditioning+clustering algorithms designed to gradually trade space for time. We perform our analysis on 11 benchmark circuits widely used in the testing community. We also report on the effect of ordering heuristics on tree-clustering and show that, on our benchmarks, the well-known max-cardinality ordering is substantially inferior to an ordering called min-degree. In this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method. This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters. The study of belief change has been an active area in philosophy and AI. In recent years two special cases of belief change, belief revision and belief update, have been studied in detail. Roughly, revision treats a surprising observation as a sign that previous beliefs were wrong, while update treats a surprising observation as an indication that the world has changed. In general, we would expect that an agent making an observation may both want to revise some earlier beliefs and assume that some change has occurred in the world. We define a novel approach to belief change that allows us to do this, by applying ideas from probability theory in a qualitative setting. The key idea is to use a qualitative Markov assumption, which says that state transitions are independent. We show that a recent approach to modeling qualitative uncertainty using plausibility measures allows us to make such a qualitative Markov assumption in a relatively straightforward way, and show how the Markov assumption can be used to provide an attractive belief-change model. We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node. Modeling worlds and actions under uncertainty is one of the central problems in the framework of decision-theoretic planning. The representation must be general enough to capture real-world problems but at the same time it must provide a basis upon which theoretical results can be derived. The central notion in the framework we propose here is that of the affine-operator, which serves as a tool for constructing (convex) sets of probability distributions, and which can be considered as a generalization of belief functions and interval mass assignments. Uncertainty in the state of the worlds is modeled with sets of probability distributions, represented by affine-trees while actions are defined as tree-manipulators. A small set of key properties of the affine-operator is presented, forming the basis for most existing operator-based definitions of probabilistic action projection and action abstraction. We derive and prove correct three projection rules, which vividly illustrate the precision-complexity tradeoff in plan projection. Finally, we show how the three types of action abstraction identified by Haddawy and Doan are manifested in the present framework. Recent research has found that diagnostic performance with Bayesian belief networks is often surprisingly insensitive to imprecision in the numerical probabilities. For example, the authors have recently completed an extensive study in which they applied random noise to the numerical probabilities in a set of belief networks for medical diagnosis, subsets of the CPCS network, a subset of the QMR (Quick Medical Reference) focused on liver and bile diseases. The diagnostic performance in terms of the average probabilities assigned to the actual diseases showed small sensitivity even to large amounts of noise. In this paper, we summarize the findings of this study and discuss possible explanations of this low sensitivity. One reason is that the criterion for performance is average probability of the true hypotheses, rather than average error in probability, which is insensitive to symmetric noise distributions. But, we show that even asymmetric, logodds-normal noise has modest effects. A second reason is that the gold-standard posterior probabilities are often near zero or one, and are little disturbed by noise. The validation of data from sensors has become an important issue in the operation and control of modern industrial plants. One approach is to use knowledge based techniques to detect inconsistencies in measured data. This article presents a probabilistic model for the detection of such inconsistencies. Based on probability propagation, this method is able to find the existence of a possible fault among the set of sensors. That is, if an error exists, many sensors present an apparent fault due to the propagation from the sensor(s) with a real fault. So the fault detection mechanism can only tell if a sensor has a potential fault, but it can not tell if the fault is real or apparent. So the central problem is to develop a theory, and then an algorithm, for distinguishing real and apparent faults, given that one or more sensors can fail at the same time. This article then, presents an approach based on two levels: (i) probabilistic reasoning, to detect a potential fault, and (ii) constraint management, to distinguish the real fault from the apparent ones. The proposed approach is exemplified by applying it to a power plant model. Uncertainty may be taken to characterize inferences, their conclusions, their premises or all three. Under some treatments of uncertainty, the inferences itself is never characterized by uncertainty. We explore both the significance of uncertainty in the premises and in the conclusion of an argument that involves uncertainty. We argue that for uncertainty to characterize the conclusion of an inference is natural, but that there is an interplay between uncertainty in the premises and uncertainty in the procedure of argument itself. We show that it is possible in principle to incorporate all uncertainty in the premises, rendering uncertainty arguments deductively valid. But we then argue (1) that this does not reflect human argument, (2) that it is computationally costly, and (3) that the gain in simplicity obtained by allowing uncertainty inference can sometimes outweigh the loss of flexibility it entails. In this paper we propose a framework for combining Disjunctive Logic Programming and Poole's Probabilistic Horn Abduction. We use the concept of hypothesis to specify the probability structure. We consider the case in which probabilistic information is not available. Instead of using probability intervals, we allow for the specification of the probabilities of disjunctions. Because minimal models are used as characteristic models in disjunctive logic programming, we apply the principle of indifference on the set of minimal models to derive default probability values. We define the concepts of explanation and partial explanation of a formula, and use them to determine the default probability distribution(s) induced by a program. An algorithm for calculating the default probability of a goal is presented. A naive (or Idiot) Bayes network is a network with a single hypothesis node and several observations that are conditionally independent given the hypothesis. We recently surveyed a number of members of the UAI community and discovered a general lack of understanding of the implications of the Naive Bayes assumption on the kinds of problems that can be solved by these networks. It has long been recognized [Minsky 61] that if observations are binary, the decision surfaces in these networks are hyperplanes. We extend this result (hyperplane separability) to Naive Bayes networks with m-ary observations. In addition, we illustrate the effect of observation-observation dependencies on decision surfaces. Finally, we discuss the implications of these results on knowledge acquisition and research in learning. Directed acyclic graphs have been used fruitfully to represent causal strucures (Pearl 1988). However, in the social sciences and elsewhere models are often used which correspond both causally and statistically to directed graphs with directed cycles (Spirtes 1995). Pearl (1993) discussed predicting the effects of intervention in models of this kind, so-called linear non-recursive structural equation models. This raises the question of whether it is possible to make inferences about causal structure with cycles, form sample data. In particular do there exist general, informative, feasible and reliable precedures for inferring causal structure from conditional independence relations among variables in a sample generated by an unknown causal structure? In this paper I present a discovery algorithm that is correct in the large sample limit, given commonly (but often implicitly) made plausible assumptions, and which provides information about the existence or non-existence of causal pathways from one variable to another. The algorithm is polynomial on sparse graphs. Although the concept of d-separation was originally defined for directed acyclic graphs (see Pearl 1988), there is a natural extension of he concept to directed cyclic graphs. When exactly the same set of d-separation relations hold in two directed graphs, no matter whether respectively cyclic or acyclic, we say that they are Markov equivalent. In other words, when two directed cyclic graphs are Markov equivalent, the set of distributions that satisfy a natural extension of the Global Directed Markov condition (Lauritzen et al. 1990) is exactly the same for each graph. There is an obvious exponential (in the number of vertices) time algorithm for deciding Markov equivalence of two directed cyclic graphs; simply chech all of the d-separation relations in each graph. In this paper I state a theorem that gives necessary and sufficient conditions for the Markov equivalence of two directed cyclic graphs, where each of the conditions can be checked in polynomial time. Hence, the theorem can be easily adapted into a polynomial time algorithm for deciding the Markov equivalence of two directed cyclic graphs. Although space prohibits inclusion of correctness proofs, they are fully described in Richardson (1994b). Belief updating in Bayes nets, a well known computationally hard problem, has recently been approximated by several deterministic algorithms, and by various randomized approximation algorithms. Deterministic algorithms usually provide probability bounds, but have an exponential runtime. Some randomized schemes have a polynomial runtime, but provide only probability estimates. We present randomized algorithms that enumerate high-probability partial instantiations, resulting in probability bounds. Some of these algorithms are also sampling algorithms. Specifically, we introduce and evaluate a variant of backward sampling, both as a sampling algorithm and as a randomized enumeration algorithm. We also relax the implicit assumption used by both sampling and accumulation algorithms, that query nodes must be instantiated in all the samples. Over the past several years Bayesian networks have been applied to a wide variety of problems. A central problem in applying Bayesian networks is that of finding one or more of the most probable instantiations of a network. In this paper we develop an efficient algorithm that incrementally enumerates the instantiations of a Bayesian network in decreasing order of probability. Such enumeration algorithms are applicable in a variety of applications ranging from medical expert systems to model-based diagnosis. Fundamentally, our algorithm is simply performing a lazy enumeration of the sorted list of all instantiations of the network. This insight leads to a very concise algorithm statement which is both easily understood and implemented. We show that for singly connected networks, our algorithm generates the next instantiation in time polynomial in the size of the network. The algorithm extends to arbitrary Bayesian networks using standard conditioning techniques. We empirically evaluate the enumeration algorithm and demonstrate its practicality. Chain graphs give a natural unifying point of view on Markov and Bayesian networks and enlarge the potential of graphical models for description of conditional independence structures. In the paper a direct graphical separation criterion for chain graphs, called c-separation, which generalizes the d-separation criterion for Bayesian networks is introduced (recalled). It is equivalent to the classic moralization criterion for chain graphs and complete in sense that for every chain graph there exists a probability distribution satisfying exactly conditional independencies derivable from the chain graph by the c-separation criterion. Every class of Markov equivalent chain graphs can be uniquely described by a natural representative, called the largest chain graph. A recovery algorithm, which on basis of the (conditional) dependency model induced by an unknown chain graph finds the corresponding largest chain graph, is presented. When we work with information from multiple sources, the formalism each employs to handle uncertainty may not be uniform. In order to be able to combine these knowledge bases of different formats, we need to first establish a common basis for characterizing and evaluating the different formalisms, and provide a semantics for the combined mechanism. A common framework can provide an infrastructure for building an integrated system, and is essential if we are to understand its behavior. We present a unifying framework based on an ordered partition of possible worlds called partition sequences, which corresponds to our intuitive notion of biasing towards certain possible scenarios when we are uncertain of the actual situation. We show that some of the existing formalisms, namely, default logic, autoepistemic logic, probabilistic conditioning and thresholding (generalized conditioning), and possibility theory can be incorporated into this general framework. Integrating diagnosis and repair is particularly crucial when gaining sufficient information to discriminate between several candidate diagnoses requires carrying out some repair actions. A typical case is supply restoration in a faulty power distribution system. This problem, which is a major concern for electricity distributors, features partial observability, and stochastic repair actions which are more elaborate than simple replacement of components. This paper analyses the difficulties in applying existing work on integrating model-based diagnosis and repair and on planning in partially observable stochastic domains to this real-world problem, and describes the pragmatic approach we have retained so far. We examine a standard factory scheduling problem with stochastic processing and setup times, minimizing the expectation of the weighted number of tardy jobs. Because the costs of operators in the schedule are stochastic and sequence dependent, standard dynamic programming algorithms such as A* may fail to find the optimal schedule. The SDA* (Stochastic Dominance A*) algorithm remedies this difficulty by relaxing the pruning condition. We present an improved state-space search formulation for these problems and discuss the conditions under which stochastic scheduling problems can be solved optimally using SDA*. In empirical testing on randomly generated problems, we found that in 70%, the expected cost of the optimal stochastic solution is lower than that of the solution derived using a deterministic approximation, with comparable search effort. In learning belief networks, the single link lookahead search is widely adopted to reduce the search space. We show that there exists a class of probabilistic domain models which displays a special pattern of dependency. We analyze the behavior of several learning algorithms using different scoring metrics such as the entropy, conditional independence, minimal description length and Bayesian metrics. We demonstrate that single link lookahead search procedures (employed in these algorithms) cannot learn these models correctly. Thus, when the underlying domain model actually belongs to this class, the use of a single link search procedure will result in learning of an incorrect model. This may lead to inference errors when the model is used. Our analysis suggests that if the prior knowledge about a domain does not rule out the possible existence of these models, a multi-link lookahead search or other heuristics should be used for the learning process. Probabilistic independence can dramatically simplify the task of eliciting, representing, and computing with probabilities in large domains. A key technique in achieving these benefits is the idea of graphical modeling. We survey existing notions of independence for utility functions in a multi-attribute space, and suggest that these can be used to achieve similar advantages. Our new results concern conditional additive independence, which we show always has a perfect representation as separation in an undirected graph (a Markov network). Conditional additive independencies entail a particular functional for the utility function that is analogous to a product decomposition of a probability function, and confers analogous benefits. This functional form has been utilized in the Bayesian network and influence diagram literature, but generally without an explanation in terms of independence. The functional form yields a decomposition of the utility function that can greatly speed up expected utility calculations, particularly when the utility graph has a similar topology to the probabilistic network being used. The first contribution of this paper is the presentation of a Pavelka - like formulation of possibilistic logic in which the language is naturally enriched by two connectives which represent negation (eg) and a new type of conjunction (otimes). The space of truth values for this logic is the lattice of possibility functions, that, from an algebraic point of view, forms a quantal. A second contribution comes from the understanding of the new conjunction as the combination of tokens of information coming from different sources, which makes our language "dynamic". A Gentzen calculus is presented, which is proved sound and complete with respect to the given semantics. The problem of truth functionality is discussed in this context. We describe an application of belief networks to the diagnosis of bottlenecks in computer systems. The technique relies on a high-level functional model of the interaction between application workloads, the Windows NT operating system, and system hardware. Given a workload description, the model predicts the values of observable system counters available from the Windows NT performance monitoring tool. Uncertainty in workloads, predictions, and counter values are characterized with Gaussian distributions. During diagnostic inference, we use observed performance monitor values to find the most probable assignment to the workload parameters. In this paper we provide some background on automated bottleneck detection, describe the structure of the system model, and discuss empirical procedures for model calibration and verification. Part of the calibration process includes generating a dataset to estimate a multivariate Gaussian error model. Initial results in diagnosing bottlenecks are presented. We can perform inference in Bayesian belief networks by enumerating instantiations with high probability thus approximating the marginals. In this paper, we present a method for determining the fraction of instantiations that has to be considered such that the absolute error in the marginals does not exceed a predefined value. The method is based on extreme value theory. Essentially, the proposed method uses the reversed generalized Pareto distribution to model probabilities of instantiations below a given threshold. Based on this distribution, an estimate of the maximal absolute error if instantiations with probability smaller than u are disregarded can be made. An approach to fault isolation that exploits vastly incomplete models is presented. It relies on separate descriptions of each component behavior, together with the links between them, which enables focusing of the reasoning to the relevant part of the system. As normal observations do not need explanation, the behavior of the components is limited to anomaly propagation. Diagnostic solutions are disorders (fault modes or abnormal signatures) that are consistent with the observations, as well as abductive explanations. An ordinal representation of uncertainty based on possibility theory provides a simple exception-tolerant description of the component behaviors. We can for instance distinguish between effects that are more or less certainly present (or absent) and effects that are more or less certainly present (or absent) when a given anomaly is present. A realistic example illustrates the benefits of this approach. In this paper we study different concepts of independence for convex sets of probabilities. There will be two basic ideas for independence. The first is irrelevance. Two variables are independent when a change on the knowledge about one variable does not affect the other. The second one is factorization. Two variables are independent when the joint convex set of probabilities can be decomposed on the product of marginal convex sets. In the case of the Theory of Probability, these two starting points give rise to the same definition. In the case of convex sets of probabilities, the resulting concepts will be strongly related, but they will not be equivalent. As application of the concept of independence, we shall consider the problem of building a global convex set from marginal convex sets of probabilities. The undirected technique for evaluating belief networks [Jensen, et.al., 1990, Lauritzen and Spiegelhalter, 1988] requires clustering the nodes in the network into a junction tree. In the traditional view, the junction tree is constructed from the cliques of the moralized and triangulated belief network: triangulation is taken to be the primitive concept, the goal towards which any clustering algorithm (e.g. node elimination) is directed. In this paper, we present an alternative conception of clustering, in which clusters and the junction tree property play the role of primitives: given a graph (not a tree) of clusters which obey (a modified version of) the junction tree property, we transform this graph until we have obtained a tree. There are several advantages to this approach: it is much clearer and easier to understand, which is important for humans who are constructing belief networks; it admits a wider range of heuristics which may enable more efficient or superior clustering algorithms; and it serves as the natural basis for an incremental clustering scheme, which we describe. Although the usefulness of belief networks for reasoning under uncertainty is widely accepted, obtaining numerical probabilities that they require is still perceived a major obstacle. Often not enough statistical data is available to allow for reliable probability estimation. Available information may not be directly amenable for encoding in the network. Finally, domain experts may be reluctant to provide numerical probabilities. In this paper, we propose a method for elicitation of probabilities from a domain expert that is non-invasive and accommodates whatever probabilistic information the expert is willing to state. We express all available information, whether qualitative or quantitative in nature, in a canonical form consisting of (in) equalities expressing constraints on the hyperspace of possible joint probability distributions. We then use this canonical form to derive second-order probability distributions over the desired probabilities. Accepting a proposition means that our confidence in this proposition is strictly greater than the confidence in its negation. This paper investigates the subclass of uncertainty measures, expressing confidence, that capture the idea of acceptance, what we call acceptance functions. Due to the monotonicity property of confidence measures, the acceptance of a proposition entails the acceptance of any of its logical consequences. In agreement with the idea that a belief set (in the sense of Gardenfors) must be closed under logical consequence, it is also required that the separate acceptance o two propositions entail the acceptance of their conjunction. Necessity (and possibility) measures agree with this view of acceptance while probability and belief functions generally do not. General properties of acceptance functions are estabilished. The motivation behind this work is the investigation of a setting for belief revision more general than the one proposed by Alchourron, Gardenfors and Makinson, in connection with the notion of conditioning. The fraud/uncollectible debt problem in the telecommunications industry presents two technical challenges: the detection and the treatment of the account given the detection. In this paper, we focus on the first problem of detection using Bayesian network models, and we briefly discuss the application of a normative expert system for the treatment at the end. We apply Bayesian network models to the problem of fraud/uncollectible debt detection for telecommunication services. In addition to being quite successful at predicting rare event outcomes, it is able to handle a mixture of categorical and continuous data. We present a performance comparison using linear and non-linear discriminant analysis, classification and regression trees, and Bayesian network models The Constraint Satisfaction Problem (CSP) framework offers a simple and sound basis for representing and solving simple decision problems, without uncertainty. This paper is devoted to an extension of the CSP framework enabling us to deal with some decisions problems under uncertainty. This extension relies on a differentiation between the agent-controllable decision variables and the uncontrollable parameters whose values depend on the occurrence of uncertain events. The uncertainty on the values of the parameters is assumed to be given under the form of a probability distribution. Two algorithms are given, for computing respectively decisions solving the problem with a maximal probability, and conditional decisions mapping the largest possible amount of possible cases to actual decisions. We examine a new approach to modeling uncertainty based on plausibility measures, where a plausibility measure just associates with an event its plausibility, an element is some partially ordered set. This approach is easily seen to generalize other approaches to modeling uncertainty, such as probability measures, belief functions, and possibility measures. The lack of structure in a plausibility measure makes it easy for us to add structure on an "as needed" basis, letting us examine what is required to ensure that a plausibility measure has certain properties of interest. This gives us insight into the essential features of the properties in question, while allowing us to prove general results that apply to many approaches to reasoning about uncertainty. Plausibility measures have already proved useful in analyzing default reasoning. In this paper, we examine their "algebraic properties," analogues to the use of + and * in probability theory. An understanding of such properties will be essential if plausibility measures are to be used in practice as a representation tool. We present an algorithm, called Predict, for updating beliefs in causal networks quantified with order-of-magnitude probabilities. The algorithm takes advantage of both the structure and the quantification of the network and presents a polynomial asymptotic complexity. Predict exhibits a conservative behavior in that it is always sound but not always complete. We provide sufficient conditions for completeness and present algorithms for testing these conditions and for computing a complete set of plausible values. We propose Predict as an efficient method to estimate probabilistic values and illustrate its use in conjunction with two known algorithms for probabilistic inference. Finally, we describe an application of Predict to plan evaluation, present experimental results, and discuss issues regarding its use with conditional logics of belief, and in the characterization of irrelevance. We present a precise definition of cause and effect in terms of a fundamental notion called unresponsiveness. Our definition is based on Savage's (1954) formulation of decision theory and departs from the traditional view of causation in that our causal assertions are made relative to a set of decisions. An important consequence of this departure is that we can reason about cause locally, not requiring a causal explanation for every dependency. Such local reasoning can be beneficial because it may not be necessary to determine whether a particular dependency is causal to make a decision. Also in this paper, we examine the graphical encoding of causal relationships. We show that influence diagrams in canonical form are an accurate and efficient representation of causal relationships. In addition, we establish a correspondence between canonical form and Pearl's causal theory. We describe methods for managing the complexity of information displayed to people responsible for making high-stakes, time-critical decisions. The techniques provide tools for real-time control of the configuration and quantity of information displayed to a user, and a methodology for designing flexible human-computer interfaces for monitoring applications. After defining a prototypical set of display decision problems, we introduce the expected value of revealed information (EVRI) and the related measure of expected value of displayed information (EVDI). We describe how these measures can be used to enhance computer displays used for monitoring complex systems. We motivate the presentation by discussing our efforts to employ decision-theoretic control of displays for a time-critical monitoring application at the NASA Mission Control Center in Houston. In this paper we extend the influence diagram (ID) representation for decisions under uncertainty. In the standard ID, arrows into a decision node are only informational; they do not represent constraints on what the decision maker can do. We can represent such constraints only indirectly, using arrows to the children of the decision and sometimes adding more variables to the influence diagram, thus making the ID more complicated. Users of influence diagrams often want to represent constraints by arrows into decision nodes. We represent constraints on decisions by allowing relevance arrows into decision nodes. We call the resulting representation information/relevance influence diagrams (IRIDs). Information/relevance influence diagrams allow for direct representation and specification of constrained decisions. We use a combination of stochastic dynamic programming and Gibbs sampling to solve IRIDs. This method is especially useful when exact methods for solving IDs fail. An important issue in the use of expert systems is the so-called brittleness problem. Expert systems model only a limited part of the world. While the explicit management of uncertainty in expert systems itigates the brittleness problem, it is still possible for a system to be used, unwittingly, in ways that the system is not prepared to address. Such a situation may be detected by the method of straw models, first presented by Jensen et al. [1990] and later generalized and justified by Laskey [1991]. We describe an algorithm, which we have implemented, that takes as input an annotated diagnostic Bayesian network (the base model) and constructs, without assistance, a bipartite network to be used as a straw model. We show that in some cases this straw model is better that the independent straw model of Jensen et al., the only other straw model for which a construction algorithm has been designed and implemented. We show an alternative way of representing a Bayesian belief network by sensitivities and probability distributions. This representation is equivalent to the traditional representation by conditional probabilities, but makes dependencies between nodes apparent and intuitively easy to understand. We also propose a QR matrix representation for the sensitivities and/or conditional probabilities which is more efficient, in both memory requirements and computational speed, than the traditional representation for computer-based implementations of probabilistic inference. We use sensitivities to show that for a certain class of binary networks, the computation time for approximate probabilistic inference with any positive upper bound on the error of the result is independent of the size of the network. Finally, as an alternative to traditional algorithms that use conditional probabilities, we describe an exact algorithm for probabilistic inference that uses the QR-representation for sensitivities and updates probability distributions of nodes in a network according to messages from the neighbors. This paper introduces the independent choice logic, and in particular the "single agent with nature" instance of the independent choice logic, namely ICLdt. This is a logical framework for decision making uncertainty that extends both logic programming and stochastic models such as influence diagrams. This paper shows how the representation of a decision problem within the independent choice logic can be exploited to cut down the combinatorics of dynamic programming. One of the main problems with influence diagram evaluation techniques is the need to optimise a decision for all values of the 'parents' of a decision variable. In this paper we show how the rule based nature of the ICLdt can be exploited so that we only make distinctions in the values of the information available for a decision that will make a difference to utility. Bayesian belief networks are bing increasingly used as a knowledge representation for diagnostic reasoning. One simple method for conducting diagnostic reasoning is to represent system faults and observations only. In this paper, we investigate how having intermediate nodes-nodes other than fault and observation nodes affects the diagnostic performance of a Bayesian belief network. We conducted a series of experiments on a set of real belief networks for medical diagnosis in liver and bile disease. We compared the effects on diagnostic performance of a two-level network consisting just of disease and finding nodes with that of a network which models intermediate pathophysiological disease states as well. We provide some theoretical evidence for differences observed between the abstracted two-level network and the full network. Typical approaches to plan recognition start from a representation of an agent's possible plans, and reason evidentially from observations of the agent's actions to assess the plausibility of the various candidates. A more expansive view of the task (consistent with some prior work) accounts for the context in which the plan was generated, the mental state and planning process of the agent, and consequences of the agent's actions in the world. We present a general Bayesian framework encompassing this view, and focus on how context can be exploited in plan recognition. We demonstrate the approach on a problem in traffic monitoring, where the objective is to induce the plan of the driver from observation of vehicle movements. Starting from a model of how the driver generates plans, we show how the highway context can appropriately influence the recognizer's interpretation of observed driver behavior. The main goal of this paper is to describe a new pruning method for solving decision trees and game trees. The pruning method for decision trees suggests a slight variant of decision trees that we call scenario trees. In scenario trees, we do not need a conditional probability for each edge emanating from a chance node. Instead, we require a joint probability for each path from the root node to a leaf node. We compare the pruning method to the traditional rollback method for decision trees and game trees. For problems that require Bayesian revision of probabilities, a scenario tree representation with the pruning method is more efficient than a decision tree representation with the rollback method. For game trees, the pruning method is more efficient than the rollback method. The use of directed acyclic graphs (DAGs) to represent conditional independence relations among random variables has proved fruitful in a variety of ways. Recursive structural equation models are one kind of DAG model. However, non-recursive structural equation models of the kinds used to model economic processes are naturally represented by directed cyclic graphs with independent errors, a characterization of conditional independence errors, a characterization of conditional independence constraints is obtained, and it is shown that the result generalizes in a natural way to systems in which the error variables or noises are statistically dependent. For non-linear systems with independent errors a sufficient condition for conditional independence of variables in associated distributions is obtained. Probabilistic model-based diagnosis computes the posterior probabilities of failure of components from the prior probabilities of component failure and observations of system behavior. One problem with this method is that such priors are almost never directly available. One of the reasons is that the prior probability estimates include an implicit notion of a time interval over which they are specified -- for example, if the probability of failure of a component is 0.05, is this over the period of a day or is this over a week? A second problem facing probabilistic model-based diagnosis is the modeling of persistence. Say we have an observation about a system at time t_1 and then another observation at a later time t_2. To compute posterior probabilities that take into account both the observations, we need some model of how the state of the system changes from time t_1 to t_2. In this paper, we address these problems using techniques from Reliability theory. We show how to compute the failure prior of a component from an empirical measure of its reliability -- the Mean Time Between Failure (MTBF). We also develop a scheme to model persistence when handling multiple time tagged observations. The goal of diagnosis is to compute good repair strategies in response to anomalous system behavior. In a decision theoretic framework, a good repair strategy has low expected cost. In a general formulation of the problem, the computation of the optimal (lowest expected cost) repair strategy for a system with multiple faults is intractable. In this paper, we consider an interesting and natural restriction on the behavior of the system being diagnosed: (a) the system exhibits faulty behavior if and only if one or more components is malfunctioning. (b) The failures of the system components are independent. Given this restriction on system behavior, we develop a polynomial time algorithm for computing the optimal repair strategy. We then go on to introduce a system hierarchy and the notion of inspecting (testing) components before repair. We develop a linear time algorithm for computing an optimal repair strategy for the hierarchical system which includes both repair and inspection. The goal of model-based diagnosis is to isolate causes of anomalous system behavior and recommend inexpensive repair actions in response. In general, precomputing optimal repair policies is intractable. To date, investigators addressing this problem have explored approximations that either impose restrictions on the system model (such as a single fault assumption) or compute an immediate best action with limited lookahead. In this paper, we develop a formulation of repair in model-based diagnosis and a repair algorithm that computes optimal sequences of actions. This optimal approach is costly but can be applied to precompute an optimal repair strategy for compact systems. We show how we can exploit a hierarchical system specification to make this approach tractable for large systems. When introducing hierarchy, we also consider the tradeoff between simply replacing a component and decomposing it to repair its subcomponents. The hierarchical repair algorithm is suitable for off-line precomputation of an optimal repair strategy. A modification of the algorithm takes advantage of an iterative deepening scheme to trade off inference time and the quality of the computed strategy. Standard algorithms for finding the shortest path in a graph require that the cost of a path be additive in edge costs, and typically assume that costs are deterministic. We consider the problem of uncertain edge costs, with potential probabilistic dependencies among the costs. Although these dependencies violate the standard dynamic-programming decomposition, we identify a weaker stochastic consistency condition that justifies a generalized dynamic-programming approach based on stochastic dominance. We present a revised path-planning algorithm and prove that it produces optimal paths under time-dependent uncertain costs. We test the algorithm by applying it to a model of stochastic bus networks, and present empirical performance results comparing it to some alternatives. Finally, we consider extensions of these concepts to a more general class of problems of heuristic search under uncertainty. Developing Question Answering systems has been one of the important research issues because it requires insights from a variety of disciplines,including,Artificial Intelligence,Information Retrieval, Information Extraction,Natural Language Processing, and Psychology.In this paper we realize a formal model for a lightweight semantic based open domain yes/no Arabic question answering system based on paragraph retrieval with variable length. We propose a constrained semantic representation. Using an explicit unification framework based on semantic similarities and query expansion synonyms and antonyms.This frequently improves the precision of the system. Employing the passage retrieval system achieves a better precision by retrieving more paragraphs that contain relevant answers to the question; It significantly reduces the amount of text to be processed by the system. Bayesian learning of belief networks (BLN) is a method for automatically constructing belief networks (BNs) from data using search and Bayesian scoring techniques. K2 is a particular instantiation of the method that implements a greedy search strategy. To evaluate the accuracy of K2, we randomly generated a number of BNs and for each of those we simulated data sets. K2 was then used to induce the generating BNs from the simulated data. We examine the performance of the program, and the factors that influence it. We also present a simple BN model, developed from our results, which predicts the accuracy of K2, when given various characteristics of the data set. We have previously reported a Bayesian algorithm for determining the coordinates of points in three-dimensional space from uncertain constraints. This method is useful in the determination of biological molecular structure. It is limited, however, by the requirement that the uncertainty in the constraints be normally distributed. In this paper, we present an extension of the original algorithm that allows constraint uncertainty to be represented as a mixture of Gaussians, and thereby allows arbitrary constraint distributions. We illustrate the performance of this algorithm on a problem drawn from the domain of molecular structure determination, in which a multicomponent constraint representation produces a much more accurate solution than the old single component mechanism. The new mechanism uses mixture distributions to decompose the problem into a set of independent problems with unimodal constraint uncertainty. The results of the unimodal subproblems are periodically recombined using Bayes' law, to avoid combinatorial explosion. The new algorithm is particularly suited for parallel implementation. In previous work [BGHK92, BGHK93], we have studied the random-worlds approach -- a particular (and quite powerful) method for generating degrees of belief (i.e., subjective probabilities) from a knowledge base consisting of objective (first-order, statistical, and default) information. But allowing a knowledge base to contain only objective information is sometimes limiting. We occasionally wish to include information about degrees of belief in the knowledge base as well, because there are contexts in which old beliefs represent important information that should influence new beliefs. In this paper, we describe three quite general techniques for extending a method that generates degrees of belief from objective information to one that can make use of degrees of belief as well. All of our techniques are bloused on well-known approaches, such as cross-entropy. We discuss general connections between the techniques and in particular show that, although conceptually and technically quite different, all of the techniques give the same answer when applied to the random-worlds method. Evaluation of counterfactual queries (e.g., "If A were true, would C have been true?") is important to fault diagnosis, planning, and determination of liability. In this paper we present methods for computing the probabilities of such queries using the formulation proposed in [Balke and Pearl, 1994], where the antecedent of the query is interpreted as an external action that forces the proposition A to be true. When a prior probability is available on the causal mechanisms governing the domain, counterfactual probabilities can be evaluated precisely. However, when causal knowledge is specified as conditional probabilities on the observables, only bounds can computed. This paper develops techniques for evaluating these bounds, and demonstrates their use in two applications: (1) the determination of treatment efficacy from studies in which subjects may choose their own treatment, and (2) the determination of liability in product-safety litigation. I describe a planning methodology for domains with uncertainty in the form of external events that are not completely predictable. The events are represented by enabling conditions and probabilities of occurrence. The planner is goal-directed and backward chaining, but the subgoals are suggested by analyzing the probability of success of the partial plan rather than being simply the open conditions of the operators in the plan. The partial plan is represented as a Bayesian belief net to compute its probability of success. Since calculating the probability of success of a plan can be very expensive I introduce two other techniques for computing it, one that uses Monte Carlo simulation to estimate it and one based on a Markov chain representation that uses knowledge about the dependencies between the predicates describing the domain. Bayesian belief network learning algorithms have three basic components: a measure of a network structure and a database, a search heuristic that chooses network structures to be considered, and a method of estimating the probability tables from the database. This paper contributes to all these three topics. The behavior of the Bayesian measure of Cooper and Herskovits and a minimum description length (MDL) measure are compared with respect to their properties for both limiting size and finite size databases. It is shown that the MDL measure has more desirable properties than the Bayesian measure when a distribution is to be learned. It is shown that selecting belief networks with certain minimallity properties is NP-hard. This result justifies the use of search heuristics instead of exact algorithms for choosing network structures to be considered. In some cases, a collection of belief networks can be represented by a single belief network which leads to a new kind of probability table estimation called smoothing. We argue that smoothing can be efficiently implemented by incorporating it in the search heuristic. Experimental results suggest that for learning probabilities of belief networks smoothing is helpful. The expected value of information (EVI) is the most powerful measure of sensitivity to uncertainty in a decision model: it measures the potential of information to improve the decision, and hence measures the expected value of outcome. Standard methods for computing EVI use discrete variables and are computationally intractable for models that contain more than a few variables. Monte Carlo simulation provides the basis for more tractable evaluation of large predictive models with continuous and discrete variables, but so far computation of EVI in a Monte Carlo setting also has appeared impractical. We introduce an approximate approach based on pre-posterior analysis for estimating EVI in Monte Carlo models. Our method uses a linear approximation to the value function and multiple linear regression to estimate the linear model from the samples. The approach is efficient and practical for extremely large models. It allows easy estimation of EVI for perfect or partial information on individual variables or on combinations of variables. We illustrate its implementation within Demos (a decision modeling system), and its application to a large model for crisis transportation planning. This work proposes action networks as a semantically well-founded framework for reasoning about actions and change under uncertainty. Action networks add two primitives to probabilistic causal networks: controllable variables and persistent variables. Controllable variables allow the representation of actions as directly setting the value of specific events in the domain, subject to preconditions. Persistent variables provide a canonical model of persistence according to which both the state of a variable and the causal mechanism dictating its value persist over time unless intervened upon by an action (or its consequences). Action networks also allow different methods for quantifying the uncertainty in causal relationships, which go beyond traditional probabilistic quantification. This paper describes both recent results and work in progress. We study the connection between kappa calculus and probabilistic reasoning in diagnosis applications. Specifically, we abstract a probabilistic belief network for diagnosing faults into a kappa network and compare the ordering of faults computed using both methods. We show that, at least for the example examined, the ordering of faults coincide as long as all the causal relations in the original probabilistic network are taken into account. We also provide a formal analysis of some network structures where the two methods will differ. Both kappa rankings and infinitesimal probabilities have been used extensively to study default reasoning and belief revision. But little has been done on utilizing their connection as outlined above. This is partly because the relation between kappa and probability calculi assumes that probabilities are arbitrarily close to one (or zero). The experiments in this paper investigate this relation when this assumption is not satisfied. The reported results have important implications on the use of kappa rankings to enhance the knowledge engineering of uncertainty models. When agents devise plans for execution in the real world, they face two important forms of uncertainty: they can never have complete knowledge about the state of the world, and they do not have complete control, as the effects of their actions are uncertain. While most classical planning methods avoid explicit uncertainty reasoning, we believe that uncertainty should be explicitly represented and reasoned about. We develop a probabilistic representation for states and actions, based on belief networks. We define conditional belief nets (CBNs) to capture the probabilistic dependency of the effects of an action upon the state of the world. We also use a CBN to represent the intrinsic relationships among entities in the environment, which persist from state to state. We present a simple projection algorithm to construct the belief network of the state succeeding an action, using the environment CBN model to infer indirect effects. We discuss how the qualitative aspects of belief networks and CBNs make them appropriate for the various stages of the problem solving process, from model construction to the design of planning algorithms. Most algorithms for propagating evidence through belief networks have been exact and exhaustive: they produce an exact (point-valued) marginal probability for every node in the network. Often, however, an application will not need information about every n ode in the network nor will it need exact probabilities. We present the localized partial evaluation (LPE) propagation algorithm, which computes interval bounds on the marginal probability of a specified query node by examining a subset of the nodes in the entire network. Conceptually, LPE ignores parts of the network that are "too far away" from the queried node to have much impact on its value. LPE has the "anytime" property of being able to produce better solutions (tighter intervals) given more time to consider more of the network. AI planning algorithms have addressed the problem of generating sequences of operators that achieve some input goal, usually assuming that the planning agent has perfect control over and information about the world. Relaxing these assumptions requires an extension to the action representation that allows reasoning both about the changes an action makes and the information it provides. This paper presents an action representation that extends the deterministic STRIPS model, allowing actions to have both causal and informational effects, both of which can be context dependent and noisy. We also demonstrate how a standard least-commitment planning algorithm can be extended to include informational actions and contingent execution. An ordinal view of independence is studied in the framework of possibility theory. We investigate three possible definitions of dependence, of increasing strength. One of them is the counterpart to the multiplication law in probability theory, and the two others are based on the notion of conditional possibility. These two have enough expressive power to support the whole possibility theory, and a complete axiomatization is provided for the strongest one. Moreover we show that weak independence is well-suited to the problems of belief change and plausible reasoning, especially to address the problem of blocking of property inheritance in exception-tolerant taxonomic reasoning. In this paper, we introduce evidence propagation operations on influence diagrams and a concept of value of evidence, which measures the value of experimentation. Evidence propagation operations are critical for the computation of the value of evidence, general update and inference operations in normative expert systems which are based on the influence diagram (generalized Bayesian network) paradigm. The value of evidence allows us to compute directly an outcome sensitivity, a value of perfect information and a value of control which are used in decision analysis (the science of decision making under uncertainty). More specifically, the outcome sensitivity is the maximum difference among the values of evidence, the value of perfect information is the expected value of the values of evidence, and the value of control is the optimal value of the values of evidence. We also discuss an implementation and a relative computational efficiency issues related to the value of evidence and the value of perfect information. We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Previous work has concentrated on metrics for domains containing only discrete variables, under the assumption that data represents a multinomial sample. In this paper, we extend this work, developing scoring metrics for domains containing all continuous variables or a mixture of discrete and continuous variables, under the assumption that continuous data is sampled from a multivariate normal distribution. Our work extends traditional statistical approaches for identifying vanishing regression coefficients in that we identify two important assumptions, called event equivalence and parameter modularity, that when combined allow the construction of prior distributions for multivariate normal parameters from a single prior Bayesian network specified by a user. Testing the validity of probabilistic models containing unmeasured (hidden) variables is shown to be a hard task. We show that the task of testing whether models are structurally incompatible with the data at hand, requires an exponential number of independence evaluations, each of the form: "X is conditionally independent of Y, given Z." In contrast, a linear number of such evaluations is required to test a standard Bayesian network (one per vertex). On the positive side, we show that if a network with hidden variables G has a tree skeleton, checking whether G represents a given probability model P requires the polynomial number of such independence evaluations. Moreover, we provide an algorithm that efficiently constructs a tree-structured Bayesian network (with hidden variables) that represents P if such a network exists, and further recognizes when such a network does not exist. We introduce an approach to high-level conditional planning we call epsilon-safe planning. This probabilistic approach commits us to planning to meet some specified goal with a probability of success of at least 1-epsilon for some user-supplied epsilon. We describe several algorithms for epsilon-safe planning based on conditional planners. The two conditional planners we discuss are Peot and Smith's nonlinear conditional planner, CNLP, and our own linear conditional planner, PLINTH. We present a straightforward extension to conditional planners for which computing the necessary probabilities is simple, employing a commonly-made but perhaps overly-strong independence assumption. We also discuss a second approach to epsilon-safe planning which relaxes this independence assumption, involving the incremental construction of a probability dependence model in conjunction with the construction of the plan graph. We present a method for dynamically generating Bayesian networks from knowledge bases consisting of first-order probability logic sentences. We present a subset of probability logic sufficient for representing the class of Bayesian networks with discrete-valued nodes. We impose constraints on the form of the sentences that guarantee that the knowledge base contains all the probabilistic information necessary to generate a network. We define the concept of d-separation for knowledge bases and prove that a knowledge base with independence conditions defined by d-separation is a complete specification of a probability distribution. We present a network generation algorithm that, given an inference problem in the form of a query Q and a set of evidence E, generates a network to compute P(Q|E). We prove the algorithm to be correct. Heckerman (1993) defined causal independence in terms of a set of temporal conditional independence statements. These statements formalized certain types of causal interaction where (1) the effect is independent of the order that causes are introduced and (2) the impact of a single cause on the effect does not depend on what other causes have previously been applied. In this paper, we introduce an equivalent a temporal characterization of causal independence based on a functional representation of the relationship between causes and the effect. In this representation, the interaction between causes and effect can be written as a nested decomposition of functions. Causal independence can be exploited by representing this decomposition in the belief network, resulting in representations that are more efficient for inference than general causal models. We present empirical results showing the benefits of a causal-independence representation for belief-network inference. On the one hand, classical terminological knowledge representation excludes the possibility of handling uncertain concept descriptions involving, e.g., "usually true" concept properties, generalized quantifiers, or exceptions. On the other hand, purely numerical approaches for handling uncertainty in general are unable to consider terminological knowledge. This paper presents the language ACP which is a probabilistic extension of terminological logics and aims at closing the gap between the two areas of research. We present the formal semantics underlying the language ALUP and introduce the probabilistic formalism that is based on classes of probabilities and is realized by means of probabilistic constraints. Besides inferring implicitly existent probabilistic relationships, the constraints guarantee terminological and probabilistic consistency. Altogether, the new language ALUP applies to domains where both term descriptions and uncertainty have to be handled. Qualitative and infinitesimal probability schemes are consistent with the axioms of probability theory, but avoid the need for precise numerical probabilities. Using qualitative probabilities could substantially reduce the effort for knowledge engineering and improve the robustness of results. We examine experimentally how well infinitesimal probabilities (the kappa-calculus of Goldszmidt and Pearl) perform a diagnostic task - troubleshooting a car that will not start by comparison with a conventional numerical belief network. We found the infinitesimal scheme to be as good as the numerical scheme in identifying the true fault. The performance of the infinitesimal scheme worsens significantly for prior fault probabilities greater than 0.03. These results suggest that infinitesimal probability methods may be of substantial practical value for machine diagnosis with small prior fault probabilities. Possibilistic logic, an extension of first-order logic, deals with uncertainty that can be estimated in terms of possibility and necessity measures. Syntactically, this means that a first-order formula is equipped with a possibility degree or a necessity degree that expresses to what extent the formula is possibly or necessarily true. Possibilistic resolution yields a calculus for possibilistic logic which respects the semantics developed for possibilistic logic. A drawback, which possibilistic resolution inherits from classical resolution, is that it may not terminate if applied to formulas belonging to decidable fragments of first-order logic. Therefore we propose an alternative proof method for possibilistic logic. The main feature of this method is that it completely abstracts from a concrete calculus but uses as basic operation a test for classical entailment. We then instantiate possibilistic logic with a terminological logic, which is a decidable subclass o f first-order logic but nevertheless much more expressive than propositional logic. This yields an extension of terminological logics towards the representation of uncertain knowledge which is satisfactory from a semantic as well as algorithmic point of view. We give an axiomatization of confidence transfer - a known conditioning scheme - from the perspective of expectation-based inference in the sense of Gardenfors and Makinson. Then, we use the notion of belief independence to "filter out" different proposal s of possibilistic conditioning rules, all are variations of confidence transfer. Among the three rules that we consider, only Dempster's rule of conditioning passes the test of supporting the notion of belief independence. With the use of this conditioning rule, we then show that we can use local computation for computing desired conditional marginal possibilities of the joint possibility satisfying the given constraints. It turns out that our local computation scheme is already proposed by Shenoy. However, our intuitions are completely different from that of Shenoy. While Shenoy just defines a local computation scheme that fits his framework of valuation-based systems, we derive that local computation scheme from II(,8) = tI(,8 I a) * II(a) and appropriate independence assumptions, just like how the Bayesians derive their local computation scheme. To coordinate with other agents in its environment, an agent needs models of what the other agents are trying to do. When communication is impossible or expensive, this information must be acquired indirectly via plan recognition. Typical approaches to plan recognition start with a specification of the possible plans the other agents may be following, and develop special techniques for discriminating among the possibilities. Perhaps more desirable would be a uniform procedure for mapping plans to general structures supporting inference based on uncertain and incomplete observations. In this paper, we describe a set of methods for converting plans represented in a flexible procedural language to observation models represented as probabilistic belief networks. The paper presents a method for reducing the computational complexity of Bayesian networks through identification and removal of weak dependencies (removal of links from the (moralized) independence graph). The removal of a small number of links may reduce the computational complexity dramatically, since several fill-ins and moral links may be rendered superfluous by the removal. The method is described in terms of impact on the independence graph, the junction tree, and the potential functions associated with these. An empirical evaluation of the method using large real-world networks demonstrates the applicability of the method. Further, the method, which has been implemented in Hugin, complements the approximation method suggested by Jensen & Andersen (1990). We explore the issue of refining an existent Bayesian network structure using new data which might mention only a subset of the variables. Most previous works have only considered the refinement of the network's conditional probability parameters, and have not addressed the issue of refining the network's structure. We develop a new approach for refining the network's structure. Our approach is based on the Minimal Description Length (MDL) principle, and it employs an adapted version of a Bayesian network learning algorithm developed in our previous work. One of the adaptations required is to modify the previous algorithm to account for the structure of the existent network. The learning algorithm generates a partial network structure which can then be used to improve the existent network. We also present experimental evidence demonstrating the effectiveness of our approach. We view the syntax-based approaches to default reasoning as a model-based diagnosis problem, where each source giving a piece of information is considered as a component. It is formalized in the ATMS framework (each source corresponds to an assumption). We assume then that all sources are independent and "fail" with a very small probability. This leads to a probability assignment on the set of candidates, or equivalently on the set of consistent environments. This probability assignment induces a Dempster-Shafer belief function which measures the probability that a proposition can be deduced from the evidence. This belief function can be used in several different ways to define a non-monotonic consequence relation. We study and compare these consequence relations. The -case of prioritized knowledge bases is briefly considered. A model to represent spatial information is presented in this paper. It is based on fuzzy constraints represented as fuzzy geometric relations that can be hierarchically structured. The concept of spatial template is introduced to capture the idea of interrelated objects in two-dimensional space. The representation model is used to specify imprecise or vague information consisting in relative locations and orientations of template objects. It is shown in this paper how a template represented by this model can be matched against a crisp situation to recognize a particular instance of this template. Furthermore, the proximity measure (fuzzy measure) between the instance and the template is worked out - this measure can be interpreted as a degree of similarity. In this context, template recognition can be viewed as a case of fuzzy pattern recognition. The results of this work have been implemented and applied to a complex military problem from which this work originated. This paper describes the best first search strategy used by U-Plan (Mansell 1993a), a planning system that constructs quantitatively ranked plans given an incomplete description of an uncertain environment. U-Plan uses uncertain and incomplete evidence de scribing the environment, characterizes it using a Dempster-Shafer interval, and generates a set of possible world states. Plan construction takes place in an abstraction hierarchy where strategic decisions are made before tactical decisions. Search through this abstraction hierarchy is guided by a quantitative measure (expected fulfillment) based on decision theory. The search strategy is best first with the provision to update expected fulfillment and review previous decisions in the light of planning developments. U-Plan generates multiple plans for multiple possible worlds, and attempts to use existing plans for new world situations. A super-plan is then constructed, based on merging the set of plans and appropriately timed knowledge acquisition operators, which are used to decide between plan alternatives during plan execution. In this paper we describe a framework for model-based diagnosis of dynamic systems, which extends previous work in this field by using and expressing temporal uncertainty in the form of qualitative interval relations a la Allen. Based on a logical framework extended by qualitative and quantitative temporal constraints we show how to describe behavioral models (both consistency- and abductive-based), discuss how to use abstract observations and show how abstract temporal diagnoses are computed. This yields an expressive framework, which allows the representation of complex temporal behavior allowing us to represent temporal uncertainty. Due to its abstraction capabilities computation is made independent of the number of observations and time points in a temporal setting. An example of hepatitis diagnosis is used throughout the paper. Certain classes of problems, including perceptual data understanding, robotics, discovery, and learning, can be represented as incremental, dynamically constructed belief networks. These automatically constructed networks can be dynamically extended and modified as evidence of new individuals becomes available. The main result of this paper is the incremental extension of the singly connected polytree network in such a way that the network retains its singly connected polytree structure after the changes. The algorithm is deterministic and is guaranteed to have a complexity of single node addition that is at most of order proportional to the number of nodes (or size) of the network. Additional speed-up can be achieved by maintaining the path information. Despite its incremental and dynamic nature, the algorithm can also be used for probabilistic inference in belief networks in a fashion similar to other exact inference algorithms. We present a symbolic machinery that admits both probabilistic and causal information about a given domain and produces probabilistic statements about the effect of actions and the impact of observations. The calculus admits two types of conditioning operators: ordinary Bayes conditioning, P(y|X = x), which represents the observation X = x, and causal conditioning, P(y|do(X = x)), read the probability of Y = y conditioned on holding X constant (at x) by deliberate action. Given a mixture of such observational and causal sentences, together with the topology of the causal graph, the calculus derives new conditional probabilities of both types, thus enabling one to quantify the effects of actions (and policies) from partially specified knowledge bases, such as Bayesian networks in which some conditional probabilities may not be available. This paper describes a novel approach to planning which takes advantage of decision theory to greatly improve robustness in an uncertain environment. We present an algorithm which computes conditional plans of maximum expected utility. This algorithm relies on a representation of the search space as an AND/OR tree and employs a depth-limit to control computation costs. A numeric robustness factor, which parameterizes the utility function, allows the user to modulate the degree of risk-aversion employed by the planner. Via a look-ahead search, the planning algorithm seeks to find an optimal plan using expected utility as its optimization criterion. We present experimental results obtained by applying our algorithm to a non-deterministic extension of the blocks world domain. Our results demonstrate that the robustness factor governs the degree of risk embodied in the conditional plans computed by our algorithm. We present several techniques for knowledge engineering of large belief networks (BNs) based on the our experiences with a network derived from a large medical knowledge base. The noisyMAX, a generalization of the noisy-OR gate, is used to model causal in dependence in a BN with multi-valued variables. We describe the use of leak probabilities to enforce the closed-world assumption in our model. We present Netview, a visualization tool based on causal independence and the use of leak probabilities. The Netview software allows knowledge engineers to dynamically view sub-networks for knowledge engineering, and it provides version control for editing a BN. Netview generates sub-networks in which leak probabilities are dynamically updated to reflect the missing portions of the network. Bayesian Belief Networks (BBNs) are a powerful formalism for reasoning under uncertainty but bear some severe limitations: they require a large amount of information before any reasoning process can start, they have limited contradiction handling capabilities, and their ability to provide explanations for their conclusion is still controversial. There exists a class of reasoning systems, called Truth Maintenance Systems (TMSs), which are able to deal with partially specified knowledge, to provide well-founded explanation for their conclusions, and to detect and handle contradictions. TMSs incorporating measure of uncertainty are called Belief Maintenance Systems (BMSs). This paper describes how a BMS based on probabilistic logic can be applied to BBNs, thus introducing a new class of BBNs, called Ignorant Belief Networks, able to incrementally deal with partially specified conditional dependencies, to provide explanations, and to detect and handle contradictions. In this paper we propose a new approach to probabilistic inference on belief networks, global conditioning, which is a simple generalization of Pearl's (1986b) method of loopcutset conditioning. We show that global conditioning, as well as loop-cutset conditioning, can be thought of as a special case of the method of Lauritzen and Spiegelhalter (1988) as refined by Jensen et al (199Oa; 1990b). Nonetheless, this approach provides new opportunities for parallel processing and, in the case of sequential processing, a tradeoff of time for memory. We also show how a hybrid method (Suermondt and others 1990) combining loop-cutset conditioning with Jensen's method can be viewed within our framework. By exploring the relationships between these methods, we develop a unifying framework in which the advantages of each approach can be combined successfully. Over time, there have hen refinements in the way that probability distributions are used for representing beliefs. Models which rely on single probability distributions depict a complete ordering among the propositions of interest, yet human beliefs are sometimes not completely ordered. Non-singleton sets of probability distributions can represent partially ordered beliefs. Convex sets are particularly convenient and expressive, but it is known that there are reasonable patterns of belief whose faithful representation require less restrictive sets. The present paper shows that prior ignorance about three or more exclusive alternatives and the emergence of partially ordered beliefs when evidence is obtained defy representation by any single set of distributions, but yield to a representation baud on several uts. The partial order is shown to be a partial qualitative probability which shares some intuitively appealing attributes with probability distributions. Probability measures by themselves, are known to be inappropriate for modeling the dynamics of plain belief and their excessively strong measurability constraints make them unsuitable for some representational tasks, e.g. in the context of firstorder knowledge. In this paper, we are therefore going to look for possible alternatives and extensions. We begin by delimiting the general area of interest, proposing a minimal list of assumptions to be satisfied by any reasonable quasi-probabilistic valuation concept. Within this framework, we investigate two particularly interesting kinds of quasi-measures which are not or much less affected by the traditional problems. * Ranking measures, which generalize Spohn-type and possibility measures. * Cumulative measures, which combine the probabilistic and the ranking philosophy, allowing thereby a fine-grained account of static and dynamic belief. We have developed a general Bayesian algorithm for determining the coordinates of points in a three-dimensional space. The algorithm takes as input a set of probabilistic constraints on the coordinates of the points, and an a priori distribution for each point location. The output is a maximum-likelihood estimate of the location of each point. We use the extended, iterated Kalman filter, and add a search heuristic for optimizing its solution under nonlinear conditions. This heuristic is based on the same principle as the simulated annealing heuristic for other optimization problems. Our method uses any probabilistic constraints that can be expressed as a function of the point coordinates (for example, distance, angles, dihedral angles, and planarity). It assumes that all constraints have Gaussian noise. In this paper, we describe the algorithm and show its performance on a set of synthetic data to illustrate its convergence properties, and its applicability to domains such ng molecular structure determination. This paper addresses the tradeoffs which need to be considered in reasoning using probabilistic network representations, such as Influence Diagrams (IDs). In particular, we examine the tradeoffs entailed in using Temporal Influence Diagrams (TIDs) which adequately capture the temporal evolution of a dynamic system without prohibitive data and computational requirements. Three approaches for TID construction which make different tradeoffs are examined: (1) tailoring the network at each time interval to the data available (rather then just copying the original Bayes Network for all time intervals); (2) modeling the evolution of a parsimonious subset of variables (rather than all variables); and (3) model selection approaches, which seek to minimize some measure of the predictive accuracy of the model without introducing too many parameters, which might cause "overfitting" of the model. Methods of evaluating the accuracy/efficiency of the tradeoffs are proposed. Influence diagrams are ideal knowledge representations for Bayesian statistical models. However, these diagrams are difficult for end users to interpret and to manipulate. We present a user-based architecture that enables end users to create and to manipulate the knowledge representation. We use the problem of physicians' interpretation of two-arm parallel randomized clinical trials (TAPRCT) to illustrate the architecture and its use. There are three primary data structures. Elements of statistical models are encoded as subgraphs of a restricted class of influence diagram. The interpretations of those elements are mapped into users' language in a domain-specific, user-based semantic interface, called a patient-flow diagram, in the TAPRCT problem. Pennitted transformations of the statistical model that maintain the semantic relationships of the model are encoded in a metadata-state diagram, called the cohort-state diagram, in the TAPRCT problem. The algorithm that runs the system uses modular actions called construction steps. This framework has been implemented in a system called THOMAS, that allows physicians to interpret the data reported from a TAPRCT. In this paper we address the uncertainty issues involved in the low-level vision task of image segmentation. Researchers in computer vision have worked extensively on this problem, in which the goal is to partition (or segment) an image into regions that are homogeneous or uniform in some sense. This segmentation is often utilized by some higher level process, such as an object recognition system. We show that by considering uncertainty in a Bayesian formalism, we can use statistical image models to build an approximate representation of a probability distribution over a space of alternative segmentations. We give detailed descriptions of the various levels of uncertainty associated with this problem, discuss the interaction of prior and posterior distributions, and provide the operations for constructing this representation. Dynamic network models (DNMs) are belief networks for temporal reasoning. The DNM methodology combines techniques from time series analysis and probabilistic reasoning to provide (1) a knowledge representation that integrates noncontemporaneous and contemporaneous dependencies and (2) methods for iteratively refining these dependencies in response to the effects of exogenous influences. We use belief-network inference algorithms to perform forecasting, control, and discrete event simulation on DNMs. The belief network formulation allows us to move beyond the traditional assumptions of linearity in the relationships among time-dependent variables and of normality in their probability distributions. We demonstrate the DNM methodology on an important forecasting problem in medicine. We conclude with a discussion of how the methodology addresses several limitations found in traditional time series analyses. We compare the diagnostic accuracy of three diagnostic inference models: the simple Bayes model, the multimembership Bayes model, which is isomorphic to the parallel combination function in the certainty-factor model, and a model that incorporates the noisy OR-gate interaction. The comparison is done on 20 clinicopathological conference (CPC) cases from the American Journal of Medicine-challenging cases describing actual patients often with multiple disorders. We find that the distributions produced by the noisy OR model agree most closely with the gold-standard diagnoses, although substantial differences exist between the distributions and the diagnoses. In addition, we find that the multimembership Bayes model tends to significantly overestimate the posterior probabilities of diseases, whereas the simple Bayes model tends to significantly underestimate the posterior probabilities. Our results suggest that additional work to refine the noisy OR model for internal medicine will be worthwhile. The inherent intractability of probabilistic inference has hindered the application of belief networks to large domains. Noisy OR-gates [30] and probabilistic similarity networks [18, 17] escape the complexity of inference by restricting model expressiveness. Recent work in the application of belief-network models to time-series analysis and forecasting [9, 10] has given rise to the additive belief network model (ABNM). We (1) discuss the nature and implications of the approximations made by an additive decomposition of a belief network, (2) show greater efficiency in the induction of additive models when available data are scarce, (3) generalize probabilistic inference algorithms to exploit the additive decomposition of ABNMs, (4) show greater efficiency of inference, and (5) compare results on inference with a simple additive belief network. From an inconsistent database non-trivial arguments may be constructed both for a proposition, and for the contrary of that proposition. Therefore, inconsistency in a logical database causes uncertainty about which conclusions to accept. This kind of uncertainty is called logical uncertainty. We define a concept of "acceptability", which induces a means for differentiating arguments. The more acceptable an argument, the more confident we are in it. A specific interest is to use the acceptability classes to assign linguistic qualifiers to propositions, such that the qualifier assigned to a propositions reflects its logical uncertainty. A more general interest is to understand how classes of acceptability can be defined for arguments constructed from an inconsistent database, and how this notion of acceptability can be devised to reflect different criteria. Whilst concentrating on the aspects of assigning linguistic qualifiers to propositions, we also indicate the more general significance of the notion of acceptability. We take a utility-based approach to categorization. We construct generalizations about events and actions by considering losses associated with failing to distinguish among detailed distinctions in a decision model. The utility-based methods transform detailed states of the world into more abstract categories comprised of disjunctions of the states. We show how we can cluster distinctions into groups of distinctions at progressively higher levels of abstraction, and describe rules for decision making with the abstractions. The techniques introduce a utility-based perspective on the nature of concepts, and provide a means of simplifying decision models used in automated reasoning systems. We demonstrate the techniques by describing the capabilities and output of TUBA, a program for utility-based abstraction. One topic that is likely to attract an increasing amount of attention within the Knowledge-base systems research community is the coordination of information provided by multiple experts. We envision a situation in which several experts independently encode information as belief networks. A potential user must then coordinate the conclusions and recommendations of these networks to derive some sort of consensus. One approach to such a consensus is the fusion of the contributed networks into a single, consensus model prior to the consideration of any case-specific data (specific observations, test results). This approach requires two types of combination procedures, one for probabilities, and one for graphs. Since the combination of probabilities is relatively well understood, the key barriers to this approach lie in the realm of graph theory. This paper provides formal definitions of some of the operations necessary to effect the necessary graphical combinations, and provides complexity analyses of these procedures. The paper's key result is that most of these operations are NP-hard, and its primary message is that the derivation of ?good? consensus networks must be done heuristically. This paper identifies and solves a new optimization problem: Given a belief network (BN) and a target ordering on its variables, how can we efficiently derive its minimal I-map whose arcs are consistent with the target ordering? We present three solutions to this problem, all of which lead to directed acyclic graphs based on the original BN's recursive basis relative to the specified ordering (such a DAG is sometimes termed the boundary DAG drawn from the given BN relative to the said ordering [5]). Along the way, we also uncover an important general principal about arc reversals: when reordering a BN according to some target ordering, (while attempting to minimize the number of arcs generated), the sequence of arc reversals should follow the topological ordering induced by the original belief network's arcs to as great an extent as possible. These results promise to have a significant impact on the derivation of consensus models, as well as on other algorithms that require the reconfiguration and/or combination of BN's. Problems of probabilistic inference and decision making under uncertainty commonly involve continuous random variables. Often these are discretized to a few points, to simplify assessments and computations. An alternative approximation is to fit analytically tractable continuous probability distributions. This approach has potential simplicity and accuracy advantages, especially if variables can be transformed first. This paper shows how a minimum relative entropy criterion can drive both transformation and fitting, illustrating with a power and logarithm family of transformations and mixtures of Gaussian (normal) distributions, which allow use of efficient influence diagram methods. The fitting procedure in this case is the well-known EM algorithm. The selection of the number of components in a fitted mixture distribution is automated with an objective that trades off accuracy and computational cost. Relevance-based explanation is a scheme in which partial assignments to Bayesian belief network variables are explanations (abductive conclusions). We allow variables to remain unassigned in explanations as long as they are irrelevant to the explanation, where irrelevance is defined in terms of statistical independence. When multiple-valued variables exist in the system, especially when subsets of values correspond to natural types of events, the over specification problem, alleviated by independence-based explanation, resurfaces. As a solution to that, as well as for addressing the question of explanation specificity, it is desirable to collapse such a subset of values into a single value on the fly. The equivalent method, which is adopted here, is to generalize the notion of assignments to allow disjunctive assignments. We proceed to define generalized independence based explanations as maximum posterior probability independence based generalized assignments (GIB-MAPs). GIB assignments are shown to have certain properties that ease the design of algorithms for computing GIB-MAPs. One such algorithm is discussed here, as well as suggestions for how other algorithms may be adapted to compute GIB-MAPs. GIB-MAP explanations still suffer from instability, a problem which may be addressed using ?approximate? conditional independence as a condition for irrelevance. We present a mechanism for constructing graphical models, specifically Bayesian networks, from a knowledge base of general probabilistic information. The unique feature of our approach is that it uses a powerful first-order probabilistic logic for expressing the general knowledge base. This logic allows for the representation of a wide range of logical and probabilistic information. The model construction procedure we propose uses notions from direct inference to identify pieces of local statistical information from the knowledge base that are most appropriate to the particular event we want to reason about. These pieces are composed to generate a joint probability distribution specified as a Bayesian network. Although there are fundamental difficulties in dealing with fully general knowledge, our procedure is practical for quite rich knowledge bases and it supports the construction of a far wider range of networks than allowed for by current template technology. PAGODA (Probabilistic Autonomous Goal-Directed Agent) is a model for autonomous learning in probabilistic domains [desJardins, 1992] that incorporates innovative techniques for using the agent's existing knowledge to guide and constrain the learning process and for representing, reasoning with, and learning probabilistic knowledge. This paper describes the probabilistic representation and inference mechanism used in PAGODA. PAGODA forms theories about the effects of its actions and the world state on the environment over time. These theories are represented as conditional probability distributions. A restriction is imposed on the structure of the theories that allows the inference mechanism to find a unique predicted distribution for any action and world state description. These restricted theories are called uniquely predictive theories. The inference mechanism, Probability Combination using Independence (PCI), uses minimal independence assumptions to combine the probabilities in a theory to make probabilistic predictions. In previous work we developed a method of learning Bayesian Network models from raw data. This method relies on the well known minimal description length (MDL) principle. The MDL principle is particularly well suited to this task as it allows us to tradeoff, in a principled way, the accuracy of the learned network against its practical usefulness. In this paper we present some new results that have arisen from our work. In particular, we present a new local way of computing the description length. This allows us to make significant improvements in our search algorithm. In addition, we modify our algorithm so that it can take into account partial domain information that might be provided by a domain expert. The local computation of description length also opens the door for local refinement of an existent network. The feasibility of our approach is demonstrated by experiments involving networks of a practical size. As belief networks are used to model increasingly complex situations, the need to automatically construct them from large databases will become paramount. This paper concentrates on solving a part of the belief network induction problem: that of learning the quantitative structure (the conditional probabilities), given the qualitative structure. In particular, a theory is presented that shows how to propagate inference distributions in a belief network, with the only assumption being that the given qualitative structure is correct. Most inference algorithms must make at least this assumption. The theory is based on four network transformations that are sufficient for any inference in a belief network. Furthermore, the claim is made that contrary to popular belief, error will not necessarily grow as the inference chain grows. Instead, for QBN belief nets induced from large enough samples, the error is more likely to decrease as the size of the inference chain increases. Numerous methods for probabilistic reasoning in large, complex belief or decision networks are currently being developed. There has been little research on automating the dynamic, incremental construction of decision models. A uniform value-driven method of decision model construction is proposed for the hierarchical complete diagnosis. Hierarchical complete diagnostic reasoning is formulated as a stochastic process and modeled using influence diagrams. Given observations, this method creates decision models in order to obtain the best actions sequentially for locating and repairing a fault at minimum cost. This method construct decision models incrementally, interleaving probe actions with model construction and evaluation. The method treats meta-level and baselevel tasks uniformly. That is, the method takes a decision-theoretic look at the control of search in causal pathways and structural hierarchies. In recent years the belief network has been used increasingly to model systems in Al that must perform uncertain inference. The development of efficient algorithms for probabilistic inference in belief networks has been a focus of much research in AI. Efficient algorithms for certain classes of belief networks have been developed, but the problem of reporting the uncertainty in inferred probabilities has received little attention. A system should not only be capable of reporting the values of inferred probabilities and/or the favorable choices of a decision; it should report the range of possible error in the inferred probabilities and/or choices. Two methods have been developed and implemented for determining the variance in inferred probabilities in belief networks. These methods, the Approximate Propagation Method and the Monte Carlo Integration Method are discussed and compared in this paper. We describe a method for time-critical decision making involving sequential tasks and stochastic processes. The method employs several iterative refinement routines for solving different aspects of the decision making problem. This paper concentrates on the meta-level control problem of deliberation scheduling, allocating computational resources to these routines. We provide different models corresponding to optimization problems that capture the different circumstances and computational strategies for decision making under time constraints. We consider precursor models in which all decision making is performed prior to execution and recurrent models in which decision making is performed in parallel with execution, accounting for the states observed during execution and anticipating future states. We describe algorithms for precursor and recurrent models and provide the results of our empirical investigations to date. Given a belief network with evidence, the task of finding the I most probable explanations (MPE) in the belief network is that of identifying and ordering the I most probable instantiations of the non-evidence nodes of the belief network. Although many approaches have been proposed for solving this problem, most work only for restricted topologies (i.e., singly connected belief networks). In this paper, we will present a new approach for finding I MPEs in an arbitrary belief network. First, we will present an algorithm for finding the MPE in a belief network. Then, we will present a linear time algorithm for finding the next MPE after finding the first MPE. And finally, we will discuss the problem of finding the MPE for a subset of variables of a belief network, and show that the problem can be efficiently solved by this approach. This paper describes ongoing research into planning in an uncertain environment. In particular, it introduces U-Plan, a planning system that constructs quantitatively ranked plans given an incomplete description of the state of the world. U-Plan uses a DempsterShafer interval to characterise uncertain and incomplete information about the state of the world. The planner takes as input what is known about the world, and constructs a number of possible initial states with representations at different abstraction levels. A plan is constructed for the initial state with the greatest support, and this plan is tested to see if it will work for other possible initial states. All, part, or none of the existing plans may be used in the generation of the plans for the remaining possible worlds. Planning takes place in an abstraction hierarchy where strategic decisions are made before tactical decisions. A super-plan is then constructed, based on merging the set of plans and the appropriately timed acquisition of essential knowledge, which is used to decide between plan alternatives. U-Plan usually produces a super-plan in less time than a classical planner would take to produce a set of plans, one for each possible world. This paper discusses how conflicts (as used by the consistency-based diagnosis community) can be adapted to be used in a search-based algorithm for computing prior and posterior probabilities in discrete Bayesian Networks. This is an "anytime" algorithm, that at any stage can estimate the probabilities and give an error bound. Whereas the most popular Bayesian net algorithms exploit the structure of the network for efficiency, we exploit probability distributions for efficiency; this algorithm is most suited to the case with extreme probabilities. This paper presents a solution to the inefficiencies found in naive algorithms, and shows how the tools of the consistency-based diagnosis community (namely conflicts) can be used effectively to improve the efficiency. Empirical results with networks having tens of thousands of nodes are presented. Bayesian belief networks can be used to represent and to reason about complex systems with uncertain, incomplete and conflicting information. Belief networks are graphs encoding and quantifying probabilistic dependence and conditional independence among variables. One type of reasoning of interest in diagnosis is called abductive inference (determination of the global most probable system description given the values of any partial subset of variables). In some cases, abductive inference can be performed with exact algorithms using distributed network computations but it is an NP-hard problem and complexity increases drastically with the presence of undirected cycles, number of discrete states per variable, and number of variables in the network. This paper describes an approximate method based on genetic algorithms to perform abductive inference in large, multiply connected networks for which complexity is a concern when using most exact methods and for which systematic search methods are not feasible. The theoretical adequacy of the method is discussed and preliminary experimental results are presented. This paper presents and discusses several methods for reasoning from inconsistent knowledge bases. A so-called argumentative-consequence relation taking into account the existence of consistent arguments in favor of a conclusion and the absence of consistent arguments in favor of its contrary, is particularly investigated. Flat knowledge bases, i.e. without any priority between their elements, as well as prioritized ones where some elements are considered as more strongly entrenched than others are studied under different consequence relations. Lastly a paraconsistent-like treatment of prioritized knowledge bases is proposed, where both the level of entrenchment and the level of paraconsistency attached to a formula are propagated. The priority levels are handled in the framework of possibility theory. Argumentation is the process of constructing arguments about propositions, and the assignment of statements of confidence to those propositions based on the nature and relative strength of their supporting arguments. The process is modelled as a labelled deductive system, in which propositions are doubly labelled with the grounds on which they are based and a representation of the confidence attached to the argument. Argument construction is captured by a generalized argument consequence relation based on the ^,--fragment of minimal logic. Arguments can be aggregated by a variety of numeric and symbolic flattening functions. This approach appears to shed light on the common logical structure of a variety of quantitative, qualitative and defeasible uncertainty calculi. We present a semantics for adding uncertainty to conditional logics for default reasoning and belief revision. We are able to treat conditional sentences as statements of conditional probability, and express rules for revision such as "If A were believed, then B would be believed to degree p." This method of revision extends conditionalization by allowing meaningful revision by sentences whose probability is zero. This is achieved through the use of counterfactual probabilities. Thus, our system accounts for the best properties of qualitative methods of update (in particular, the AGM theory of revision) and probabilistic methods. We also show how our system can be viewed as a unification of probability theory and possibility theory, highlighting their orthogonality and providing a means for expressing the probability of a possibility. We also demonstrate the connection to Lewis's method of imaging. A key issue in the handling of temporal data is the treatment of persistence; in most approaches it consists in inferring defeasible confusions by extrapolating from the actual knowledge of the history of the world; we propose here a gradual modelling of persistence, following the idea that persistence is decreasing (the further we are from the last time point where a fluent is known to be true, the less certainly true the fluent is); it is based on possibility theory, which has strong relations with other well-known ordering-based approaches to nonmonotonic reasoning. We compare our approach with Dean and Kanazawa's probabilistic projection. We give a formal modelling of the decreasing persistence problem. Lastly, we show how to infer nonmonotonic conclusions using the principle of decreasing persistence. Security flaws in software applications today has been attributed mostly to design flaws. With limited budget and time to release software into the market, many developers often consider security as an afterthought. Previous research shows that integrating security into software applications at a later stage of software development lifecycle (SDLC) has been found to be more costly than when it is integrated during the early stages. To assist in the integration of security early in the SDLC stages, a new approach for assessing security during the design phase by neural network is investigated in this paper. Our findings show that by training a back propagation neural network to identify attack patterns, possible attacks can be identified from design scenarios presented to it. The result of performance of the neural network is presented in this paper. Influence diagram is a graphical representation of belief networks with uncertainty. This article studies the structural properties of a probabilistic model in an influence diagram. In particular, structural controllability theorems and structural observability theorems are developed and algorithms are formulated. Controllability and observability are fundamental concepts in dynamic systems (Luenberger 1979). Controllability corresponds to the ability to control a system while observability analyzes the inferability of its variables. Both properties can be determined by the ranks of the system matrices. Structural controllability and observability, on the other hand, analyze the property of a system with its structure only, without the specific knowledge of the values of its elements (tin 1974, Shields and Pearson 1976). The structural analysis explores the connection between the structure of a model and the functional dependence among its elements. It is useful in comprehending problem and formulating solution by challenging the underlying intuitions and detecting inconsistency in a model. This type of qualitative reasoning can sometimes provide insight even when there is insufficient numerical information in a model. We describe how we selectively reformulate portions of a belief network that pose difficulties for solution with a stochastic-simulation algorithm. With employ the selective conditioning approach to target specific nodes in a belief network for decomposition, based on the contribution the nodes make to the tractability of stochastic simulation. We review previous work on BNRAS algorithms- randomized approximation algorithms for probabilistic inference. We show how selective conditioning can be employed to reformulate a single BNRAS problem into multiple tractable BNRAS simulation problems. We discuss how we can use another simulation algorithm-logic sampling-to solve a component of the inference problem that provides a means for knitting the solutions of individual subproblems into a final result. Finally, we analyze tradeoffs among the computational subtasks associated with the selective conditioning approach to reformulation. This paper investigates the possibility of performing automated reasoning in probabilistic logic when probabilities are expressed by means of linguistic quantifiers. Each linguistic term is expressed as a prescribed interval of proportions. Then instead of propagating numbers, qualitative terms are propagated in accordance with the numerical interpretation of these terms. The quantified syllogism, modelling the chaining of probabilistic rules, is studied in this context. It is shown that a qualitative counterpart of this syllogism makes sense, and is relatively independent of the threshold defining the linguistically meaningful intervals, provided that these threshold values remain in accordance with the intuition. The inference power is less than that of a full-fledged probabilistic con-quaint propagation device but better corresponds to what could be thought of as commonsense probabilistic reasoning. Data fusion allows the elaboration and the evaluation of a situation synthesized from low level informations provided by different kinds of sensors. The fusion of the collected data will result in fewer and higher level informations more easily assessed by a human operator and that will assist him effectively in his decision process. In this paper we present the suitability and the advantages of using a Possibilistic Assumption based Truth Maintenance System (n-ATMS) in a data fusion military application. We first describe the problem, the needed knowledge representation formalisms and problem solving paradigms. Then we remind the reader of the basic concepts of ATMSs, Possibilistic Logic and 11-ATMSs. Finally we detail the solution to the given data fusion problem and conclude with the results and comparison with a non-possibilistic solution. To date, most probabilistic reasoning systems have relied on a fixed belief network constructed at design time. The network is used by an application program as a representation of (in)dependencies in the domain. Probabilistic inference algorithms operate over the network to answer queries. Recognizing the inflexibility of fixed models has led researchers to develop automated network construction procedures that use an expressive knowledge base to generate a network that can answer a query. Although more flexible than fixed model approaches, these construction procedures separate construction and evaluation into distinct phases. In this paper we develop an approach to combining incremental construction and evaluation of a partial probability model. The combined method holds promise for improved methods for control of model construction based on a trade-off between fidelity of results and cost of construction. We recently described a formalism for reasoning with if-then rules that re expressed with different levels of firmness [18]. The formalism interprets these rules as extreme conditional probability statements, specifying orders of magnitude of disbelief, which impose constraints over possible rankings of worlds. It was shown that, once we compute a priority function Z+ on the rules, the degree to which a given query is confirmed or denied can be computed in O(log n`) propositional satisfiability tests, where n is the number of rules in the knowledge base. In this paper, we show that computing Z+ requires O(n2 X log n) satisfiability tests, not an exponential number as was conjectured in [18], which reduces to polynomial complexity in the case of Horn expressions. We also show how reasoning with imprecise observations can be incorporated in our formalism and how the popular notions of belief revision and epistemic entrenchment are embodied naturally and tractably. A number of writers(Joseph Halpern and Fahiem Bacchus among them) have offered semantics for formal languages in which inferences concerning probabilities can be made. Our concern is different. This paper provides a formalization of nonmonotonic inferences in which the conclusion is supported only to a certain degree. Such inferences are clearly 'invalid' since they must allow the falsity of a conclusion even when the premises are true. Nevertheless, such inferences can be characterized both syntactically and semantically. The 'premises' of probabilistic arguments are sets of statements (as in a database or knowledge base), the conclusions categorical statements in the language. We provide standards for both this form of inference, for which high probability is required, and for an inference in which the conclusion is qualified by an intermediate interval of support. The Dempster-Shafer theory of evidence has been used intensively to deal with uncertainty in knowledge-based systems. However the representation of uncertain relationships between evidence and hypothesis groups (heuristic knowledge) is still a major research problem. This paper presents an approach to representing such heuristic knowledge by evidential mappings which are defined on the basis of mass functions. The relationships between evidential mappings and multi valued mappings, as well as between evidential mappings and Bayesian multi- valued causal link models in Bayesian theory are discussed. Following this the detailed procedures for constructing evidential mappings for any set of heuristic rules are introduced. Several situations of belief propagation are discussed. Bayes nets are relatively recent innovations. As a result, most of their theoretical development has focused on the simplest class of single-author models. The introduction of more sophisticated multiple-author settings raises a variety of interesting questions. One such question involves the nature of compromise and consensus. Posterior compromises let each model process all data to arrive at an independent response, and then split the difference. Prior compromises, on the other hand, force compromise to be reached on all points before data is observed. This paper introduces prior compromises in a Bayes net setting. It outlines the problem and develops an efficient algorithm for fusing two directed acyclic graphs into a single, consensus structure, which may then be used as the basis of a prior compromise. In Moral, Campos (1991) and Cano, Moral, Verdegay-Lopez (1991) a new method of conditioning convex sets of probabilities has been proposed. The result of it is a convex set of non-necessarily normalized probability distributions. The normalizing factor of each probability distribution is interpreted as the possibility assigned to it by the conditioning information. From this, it is deduced that the natural value for the conditional probability of an event is a possibility distribution. The aim of this paper is to study methods of transforming this possibility distribution into a probability (or uncertainty) interval. These methods will be based on the use of Sugeno and Choquet integrals. Their behaviour will be compared in basis to some selected examples. The trajectory of a robot is monitored in a restricted dynamic environment using light beam sensor data. We have a Dynamic Belief Network (DBN), based on a discrete model of the domain, which provides discrete monitoring analogous to conventional quantitative filter techniques. Sensor observations are added to the basic DBN in the form of specific evidence. However, sensor data is often partially or totally incorrect. We show how the basic DBN, which infers only an impossible combination of evidence, may be modified to handle specific types of incorrect data which may occur in the domain. We then present an extension to the DBN, the addition of an invalidating node, which models the status of the sensor as working or defective. This node provides a qualitative explanation of inconsistent data: it is caused by a defective sensor. The connection of successive instances of the invalidating node models the status of a sensor over time, allowing the DBN to handle both persistent and intermittent faults. This paper describes some results of research on associate systems: knowledge-based systems that flexibly and adaptively support their human users in carrying out complex, time-dependent problem-solving tasks under uncertainty. Based on principles derived from decision theory and decision analysis, a problem-solving approach is presented which can overcome many of the limitations of traditional expert-systems. This approach implements an explicit model of the human user's problem-solving capabilities as an integral element in the overall problem solving architecture. This integrated model, represented as an influence diagram, is the basis for achieving adaptive task sharing behavior between the associate system and the human user. This associate system model has been applied toward ongoing research on a Mars Rover Manager's Associate (MRMA). MRMA's role would be to manage a small fleet of robotic rovers on the Martian surface. The paper describes results for a specific scenario where MRMA examines the benefits and costs of consulting human experts on Earth to assist a Mars rover with a complex resource management decision. Although the notion of diagnostic problem has been extensively investigated in the context of static systems, in most practical applications the behavior of the modeled system is significantly variable during time. The goal of the paper is to propose a novel approach to the modeling of uncertainty about temporal evolutions of time-varying systems and a characterization of model-based temporal diagnosis. Since in most real world cases knowledge about the temporal evolution of the system to be diagnosed is uncertain, we consider the case when probabilistic temporal knowledge is available for each component of the system and we choose to model it by means of Markov chains. In fact, we aim at exploiting the statistical assumptions underlying reliability theory in the context of the diagnosis of timevarying systems. We finally show how to exploit Markov chain theory in order to discard, in the diagnostic process, very unlikely diagnoses. Many AI synthesis problems such as planning or scheduling may be modelized as constraint satisfaction problems (CSP). A CSP is typically defined as the problem of finding any consistent labeling for a fixed set of variables satisfying all given constraints between these variables. However, for many real tasks such as job-shop scheduling, time-table scheduling, design?, all these constraints have not the same significance and have not to be necessarily satisfied. A first distinction can be made between hard constraints, which every solution should satisfy and soft constraints, whose satisfaction has not to be certain. In this paper, we formalize the notion of possibilistic constraint satisfaction problems that allows the modeling of uncertainly satisfied constraints. We use a possibility distribution over labelings to represent respective possibilities of each labeling. Necessity-valued constraints allow a simple expression of the respective certainty degrees of each constraint. The main advantage of our approach is its integration in the CSP technical framework. Most classical techniques, such as Backtracking (BT), arcconsistency enforcing (AC) or Forward Checking have been extended to handle possibilistics CSP and are effectively implemented. The utility of our approach is demonstrated on a simple design problem. The general use of subjective probabilities to model belief has been justified using many axiomatic schemes. For example, ?consistent betting behavior' arguments are well-known. To those not already convinced of the unique fitness and generality of probability models, such justifications are often unconvincing. The present paper explores another rationale for probability models. ?Qualitative probability,' which is known to provide stringent constraints on belief representation schemes, is derived from five simple assumptions about relationships among beliefs. While counterparts of familiar rationality concepts such as transitivity, dominance, and consistency are used, the betting context is avoided. The gap between qualitative probability and probability proper can be bridged by any of several additional assumptions. The discussion here relies on results common in the recent AI literature, introducing a sixth simple assumption. The narrative emphasizes models based on unique complete orderings, but the rationale extends easily to motivate set-valued representations of partial orderings as well. In a previous paper [Pearl and Verma, 1991] we presented an algorithm for extracting causal influences from independence information, where a causal influence was defined as the existence of a directed arc in all minimal causal models consistent with the data. In this paper we address the question of deciding whether there exists a causal model that explains ALL the observed dependencies and independencies. Formally, given a list M of conditional independence statements, it is required to decide whether there exists a directed acyclic graph (dag) D that is perfectly consistent with M, namely, every statement in M, and no other, is reflected via dseparation in D. We present and analyze an effective algorithm that tests for the existence of such a day, and produces one, if it exists. Current Bayesian net representations do not consider structure in the domain and include all variables in a homogeneous network. At any time, a human reasoner in a large domain may direct his attention to only one of a number of natural subdomains, i.e., there is ?localization' of queries and evidence. In such a case, propagating evidence through a homogeneous network is inefficient since the entire network has to be updated each time. This paper presents multiply sectioned Bayesian networks that enable a (localization preserving) representation of natural subdomains by separate Bayesian subnets. The subnets are transformed into a set of permanent junction trees such that evidential reasoning takes place at only one of them at a time. Probabilities obtained are identical to those that would be obtained from the homogeneous network. We discuss attention shift to a different junction tree and propagation of previously acquired evidence. Although the overall system can be large, computational requirements are governed by the size of only one junction tree. Valuation-based system (VBS) provides a general framework for representing knowledge and drawing inferences under uncertainty. Recent studies have shown that the semantics of VBS can represent and solve Bayesian decision problems (Shenoy, 1991a). The purpose of this paper is to propose a decision calculus for Dempster-Shafer (D-S) theory in the framework of VBS. The proposed calculus uses a weighting factor whose role is similar to the probabilistic interpretation of an assumption that disambiguates decision problems represented with belief functions (Strat 1990). It will be shown that with the presented calculus, if the decision problems are represented in the valuation network properly, we can solve the problems by using fusion algorithm (Shenoy 1991a). It will also be shown the presented decision calculus can be reduced to the calculus for Bayesian probability theory when probabilities, instead of belief functions, are given. This paper presents a new approach for computing posterior probabilities in Bayesian nets, which sidesteps the triangulation problem. The current state of art is the clique tree propagation approach. When the underlying graph of a Bayesian net is triangulated, this approach arranges its cliques into a tree and computes posterior probabilities by appropriately passing around messages in that tree. The computation in each clique is simply direct marginalization. When the underlying graph is not triangulated, one has to first triangulated it by adding edges. Referred to as the triangulation problem, the problem of finding an optimal or even a ?good? triangulation proves to be difficult. In this paper, we propose to first decompose a Bayesian net into smaller components by making use of Tarjan's algorithm for decomposing an undirected graph at all its minimal complete separators. Then, the components are arranged into a tree and posterior probabilities are computed by appropriately passing around messages in that tree. The computation in each component is carried out by repeating the whole procedure from the beginning. Thus the triangulation problem is sidestepped. Belief networks are a new, potentially important, class of knowledge-based models. ARCO1, currently under development at the Atlantic Richfield Company (ARCO) and the University of Southern California (USC), is the most advanced reported implementation of these models in a financial forecasting setting. ARCO1's underlying belief network models the variables believed to have an impact on the crude oil market. A pictorial market model-developed on a MAC II- facilitates consensus among the members of the forecasting team. The system forecasts crude oil prices via Monte Carlo analyses of the network. Several different models of the oil market have been developed; the system's ability to be updated quickly highlights its flexibility. The way experts manage uncertainty usually changes depending on the task they are performing. This fact has lead us to consider the problem of communicating modules (task implementations) in a large and structured knowledge based system when modules have different uncertainty calculi. In this paper, the analysis of the communication problem is made assuming that (i) each uncertainty calculus is an inference mechanism defining an entailment relation, and therefore the communication is considered to be inference-preserving, and (ii) we restrict ourselves to the case which the different uncertainty calculi are given by a class of truth functional Multiple-valued Logics. An approach to reasoning with default rules where the proportion of exceptions, or more generally the probability of encountering an exception, can be at least roughly assessed is presented. It is based on local uncertainty propagation rules which provide the best bracketing of a conditional probability of interest from the knowledge of the bracketing of some other conditional probabilities. A procedure that uses two such propagation rules repeatedly is proposed in order to estimate any simple conditional probability of interest from the available knowledge. The iterative procedure, that does not require independence assumptions, looks promising with respect to the linear programming method. Improved bounds for conditional probabilities are given when independence assumptions hold. This paper presents a plausible reasoning system to illustrate some broad issues in knowledge representation: dualities between different reasoning forms, the difficulty of unifying complementary reasoning styles, and the approximate nature of plausible reasoning. These issues have a common underlying theme: there should be an underlying belief calculus of which the many different reasoning forms are special cases, sometimes approximate. The system presented allows reasoning about defaults, likelihood, necessity and possibility in a manner similar to the earlier work of Adams. The system is based on the belief calculus of subjective Bayesian probability which itself is based on a few simple assumptions about how belief should be manipulated. Approximations, semantics, consistency and consequence results are presented for the system. While this puts these often discussed plausible reasoning forms on a probabilistic footing, useful application to practical problems remains an issue. Theory refinement is the task of updating a domain theory in the light of new cases, to be done automatically or with some expert assistance. The problem of theory refinement under uncertainty is reviewed here in the context of Bayesian statistics, a theory of belief revision. The problem is reduced to an incremental learning task as follows: the learning system is initially primed with a partial theory supplied by a domain expert, and thereafter maintains its own internal representation of alternative theories which is able to be interrogated by the domain expert and able to be incrementally refined from data. Algorithms for refinement of Bayesian networks are presented to illustrate what is meant by "partial theory", "alternative theory representation", etc. The algorithms are an incremental variant of batch learning algorithms from the literature so can work well in batch and incremental mode. Research on Symbolic Probabilistic Inference (SPI) [2, 3] has provided an algorithm for resolving general queries in Bayesian networks. SPI applies the concept of dependency directed backward search to probabilistic inference, and is incremental with respect to both queries and observations. Unlike traditional Bayesian network inferencing algorithms, SPI algorithm is goal directed, performing only those calculations that are required to respond to queries. Research to date on SPI applies to Bayesian networks with discrete-valued variables and does not address variables with continuous values. In this papers, we extend the SPI algorithm to handle Bayesian networks made up of continuous variables where the relationships between the variables are restricted to be ?linear gaussian?. We call this variation of the SPI algorithm, SPI Continuous (SPIC). SPIC modifies the three basic SPI operations: multiplication, summation, and substitution. However, SPIC retains the framework of the SPI algorithm, namely building the search tree and recursive query mechanism and therefore retains the goal-directed and incrementality features of SPI. In this paper, we consider one aspect of the problem of applying decision theory to the design of agents that learn how to make decisions under uncertainty. This aspect concerns how an agent can estimate probabilities for the possible states of the world, given that it only makes limited observations before committing to a decision. We show that the naive application of statistical tools can be improved upon if the agent can determine which of his observations are truly relevant to the estimation problem at hand. We give a framework in which such determinations can be made, and define an estimation procedure to use them. Our framework also suggests several extensions, which show how additional knowledge can be used to improve tile estimation procedure still further. Value-of-information analyses provide a straightforward means for selecting the best next observation to make, and for determining whether it is better to gather additional information or to act immediately. Determining the next best test to perform, given a state of uncertainty about the world, requires a consideration of the value of making all possible sequences of observations. In practice, decision analysts and expert-system designers have avoided the intractability of exact computation of the value of information by relying on a myopic approximation. Myopic analyses are based on the assumption that only one additional test will be performed, even when there is an opportunity to make a large number of observations. We present a nonmyopic approximation for value of information that bypasses the traditional myopic analyses by exploiting the statistical properties of large samples. Since exact probabilistic inference is intractable in general for large multiply connected belief nets, approximate methods are required. A promising approach is to use heuristic search among hypotheses (instantiations of the network) to find the most probable ones, as in the TopN algorithm. Search is based on the relative probabilities of hypotheses which are efficient to compute. Given upper and lower bounds on the relative probability of partial hypotheses, it is possible to obtain bounds on the absolute probabilities of hypotheses. Best-first search aimed at reducing the maximum error progressively narrows the bounds as more hypotheses are examined. Here, qualitative probabilistic analysis is employed to obtain bounds on the relative probability of partial hypotheses for the BN20 class of networks networks and a generalization replacing the noisy OR assumption by negative synergy. The approach is illustrated by application to a very large belief network, QMR-BN, which is a reformulation of the Internist-1 system for diagnosis in internal medicine. We motivate and describe a theory of belief in this paper. This theory is developed with the following view of human belief in mind. Consider the belief that an event E will occur (or has occurred or is occurring). An agent either entertains this belief or does not entertain this belief (i.e., there is no "grade" in entertaining the belief). If the agent chooses to exercise "the will to believe" and entertain this belief, he/she/it is entitled to a degree of confidence c (1 > c > 0) in doing so. Adopting this view of human belief, we conjecture that whenever an agent entertains the belief that E will occur with c degree of confidence, the agent will be surprised (to the extent c) upon realizing that E did not occur. The categorial approach to evidential reasoning can be seen as a combination of the probability kinematics approach of Richard Jeffrey (1965) and the maximum (cross-) entropy inference approach of E. T. Jaynes (1957). As a consequence of that viewpoint, it is well known that category theory provides natural definitions for logical connectives. In particular, disjunction and conjunction are modelled by general categorial constructions known as products and coproducts. In this paper, I focus mainly on Dempster-Shafer theory of belief functions for which I introduce a category I call Dempster?s category. I prove the existence of and give explicit formulas for conjunction and disjunction in the subcategory of separable belief functions. In Dempster?s category, the new defined conjunction can be seen as the most cautious conjunction of beliefs, and thus no assumption about distinctness (of the sources) of beliefs is needed as opposed to Dempster?s rule of combination, which calls for distinctness (of the sources) of beliefs. A semantics is given to possibilistic logic, a logic that handles weighted classical logic formulae, and where weights are interpreted as lower bounds on degrees of certainty or possibility, in the sense of Zadeh's possibility theory. The proposed semantics is based on fuzzy sets of interpretations. It is tolerant to partial inconsistency. Satisfiability is extended from interpretations to fuzzy sets of interpretations, each fuzzy set representing a possibility distribution describing what is known about the state of the world. A possibilistic knowledge base is then viewed as a set of possibility distributions that satisfy it. The refutation method of automated deduction in possibilistic logic, based on previously introduced generalized resolution principle is proved to be sound and complete with respect to the proposed semantics, including the case of partial inconsistency. When a planner must decide whether it has enough evidence to make a decision based on probability, it faces the sample size problem. Current planners using probabilities need not deal with this problem because they do not generate their probabilities from observations. This paper presents an event based language in which the planner's probabilities are calculated from the binomial random variable generated by the observed ratio of one type of event to another. Such probabilities are subject to error, so the planner must introspect about their validity. Inferences about the probability of these events can be made using statistics. Inferences about the validity of the approximations can be made using interval estimation. Interval estimation allows the planner to avoid making choices that are only weakly supported by the planner's evidence. We present a general architecture for the monitoring and diagnosis of large scale sensor-based systems with real time diagnostic constraints. This architecture is multileveled, combining a single monitoring level based on statistical methods with two model based diagnostic levels. At each level, sources of uncertainty are identified, and integrated methodologies for uncertainty management are developed. The general architecture was applied to the monitoring and diagnosis of a specific nuclear physics detector at Lawrence Berkeley National Laboratory that contained approximately 5000 components and produced over 500 channels of output data. The general architecture is scalable, and work is ongoing to apply it to detector systems one and two orders of magnitude more complex. A new probabilistic network construction system, DYNASTY, is proposed for diagnostic reasoning given variables whose probabilities change over time. Diagnostic reasoning is formulated as a sequential stochastic process, and is modeled using influence diagrams. Given a set O of observations, DYNASTY creates an influence diagram in order to devise the best action given O. Sensitivity analyses are conducted to determine if the best network has been created, given the uncertainty in network parameters and topology. DYNASTY uses an equivalence class approach to provide decision thresholds for the sensitivity analysis. This equivalence-class approach to diagnostic reasoning differentiates diagnoses only if the required actions are different. A set of network-topology updating algorithms are proposed for dynamically updating the network when necessary. For high level path planning, environments are usually modeled as distance graphs, and path planning problems are reduced to computing the shortest path in distance graphs. One major drawback of this modeling is the inability to model uncertainties, which are often encountered in practice. In this paper, a new tool, called U-yraph, is proposed for environment modeling. A U-graph is an extension of distance graphs with the ability to handle a kind of uncertainty. By modeling an uncertain environment as a U-graph, and a navigation problem as a Markovian decision process, we can precisely define a new optimality criterion for navigation plans, and more importantly, we can come up with a general algorithm for computing optimal plans for navigation tasks. Deliberation plays an important role in the design of rational agents embedded in the real-world. In particular, deliberation leads to the formation of intentions, i.e., plans of action that the agent is committed to achieving. In this paper, we present a branching time possible-worlds model for representing and reasoning about, beliefs, goals, intentions, time, actions, probabilities, and payoffs. We compare this possible-worlds approach with the more traditional decision tree representation and provide a transformation from decision trees to possible worlds. Finally, we illustrate how an agent can perform deliberation using a decision-tree representation and then use a possible-worlds model to form and reason about his intentions. This paper introduces conceptual relations that synthesize utilitarian and logical concepts, extending the logics of preference of Rescher. We define first, in the context of a possible worlds model, constraint-dependent measures that quantify the relative quality of alternative solutions of reasoning problems or the relative desirability of various policies in control, decision, and planning problems. We show that these measures may be interpreted as truth values in a multi valued logic and propose mechanisms for the representation of complex constraints as combinations of simpler restrictions. These extended logical operations permit also the combination and aggregation of goal-specific quality measures into global measures of utility. We identify also relations that represent differential preferences between alternative solutions and relate them to the previously defined desirability measures. Extending conventional modal logic formulations, we introduce structures for the representation of ignorance about the utility of alternative solutions. Finally, we examine relations between these concepts and similarity based semantic models of fuzzy logic. In this article we present two ways of structuring bodies of evidence, which allow us to reduce the complexity of the operations usually performed in the framework of evidence theory. The first structure just partitions the focal elements in a body of evidence by their cardinality. With this structure we are able to reduce the complexity on the calculation of the belief functions Bel, Pl, and Q. The other structure proposed here, the Hierarchical Trees, permits us to reduce the complexity of the calculation of Bel, Pl, and Q, as well as of the Dempster's rule of combination in relation to the brute-force algorithm. Both these structures do not require the generation of all the subsets of the reference domain. In mechanical design, there is often unavoidable uncertainty in estimates of design performance. Evaluation of design alternatives requires consideration of the impact of this uncertainty. Expert heuristics embody assumptions regarding the designer's attitude towards risk and uncertainty that might be reasonable in most cases but inaccurate in others. We present a technique to allow designers to incorporate their own unique attitude towards uncertainty as opposed to those assumed by the domain expert's rules. The general approach is to eliminate aspects of heuristic rules which directly or indirectly include assumptions regarding the user's attitude towards risk, and replace them with explicit, user-specified probabilistic multi attribute utility and probability distribution functions. We illustrate the method in a system for material selection for automobile bumpers. The compatibility of quantitative and qualitative representations of beliefs was studied extensively in probability theory. It is only recently that this important topic is considered in the context of belief functions. In this paper, the compatibility of various quantitative belief measures and qualitative belief structures is investigated. Four classes of belief measures considered are: the probability function, the monotonic belief function, Shafer's belief function, and Smets' generalized belief function. The analysis of their individual compatibility with different belief structures not only provides a sound b