• arXiv 2014
  • Neural Turing Machines
  • Neural Turing Machines (NTMs) couple differentiable, external memory resources to neural network controllers. Unlike classical computers, they can be optimized by stochastic gradient descent to infer algorithms from data.
  • ICLR 2016 - Best Paper Award!
  • Neural Programmer-Interpreters
  • We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs.
  • NIPS 2015
  • Spatial Transformer Networks
  • We introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within a neural network.
  • ICML 2016
  • Pixel Recurrent Neural Networks
  • This paper introduces a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions using novel two-dimensional recurrent layers and an effective use of residual connections in LSTMs.
    • 2016
      • Mastering the game of Go with Deep Neural Networks & Tree Search
      • Authors: D Silver A Huang C J Maddison A Guez L Sifre G van den Driessche J Schrittwieser I Antonoglou V Panneershelvam M Lanctot S Dieleman D Grewe J Nham N Kalchbrenner I Sutskever T Lillicrap M Leach K Kavukcuoglu T Graepel D Hassabis
      • Published: Nature 2016
      • A new approach to computer Go that combines Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning, from games of self-play. The first time ever that a computer program has defeated a human professional player.
    • 2016
      • Learning to Communicate with Deep Multi-Agent Reinforcement Learning
      • Authors: J Foerster Y M Assael N de Freitas S Whiteson
      • Published: NIPS 2016
      • We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate endto-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.
    • 2016
      • Sequential Neural Models with Stochastic Layers
      • Authors: M Fraccaro S K Sønderby Ulrich Paquet Ole Winther
      • Published: NIPS 2016
      • How can we efficiently propagate uncertainty in a latent state representation with recurrent neural networks? This paper introduces stochastic recurrent neural networks which glue a deterministic recurrent neural network and a state space model together to form a stochastic and sequential neural generative model. The clear separation of deterministic and stochastic layers allows a structured variational inference network to track the factorization of the model’s posterior distribution. By retaining both the nonlinear recursive structure of a recurrent neural network and averaging over the uncertainty in a latent path, like a state space model, we improve the state of the art results on the Blizzard and TIMIT speech modeling data sets by a large margin, while achieving comparable performances to competing methods on polyphonic music modeling.
    • 2016
      • Conditional Image Generation with PixelCNN Decoders
      • Authors: A van den Oord N Kalchbrenner O Vinyals L Espeholt A Graves K Kavukcuoglu
      • Published: NIPS 2016
      • This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerfuldecoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-ofthe-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
    • 2016
      • Strategic Attentive Writer for Learning Macro-Actions
      • Authors: A Vezhnevets V Mnih S Osindero A Graves O Vinyals J Agapiou K Kavukcuoglu
      • Published: NIPS 2016
      • We present a novel deep recurrent neural network architecture that learns to build implicit plans in an end-to-end manner by purely interacting with an environment in reinforcement learning setting. The network builds an internal plan, which is continuously updated upon observation of the next input from the environment. It can also partition this internal representation into contiguous sub- sequences by learning for how long the plan can be committed to – i.e. followed without re-planing. Combining these properties, the proposed model, dubbed STRategic Attentive Writer (STRAW) can learn high-level, temporally abstracted macro- actions of varying lengths that are solely learnt from data without any prior information. These macro-actions enable both structured exploration and economic computation. We experimentally demonstrate that STRAW delivers strong improvements on several ATARI games by employing temporally extended planning strategies (e.g. Ms. Pacman and Frostbite). It is at the same time a general algorithm that can be applied on any sequence data. To that end, we also show that when trained on text prediction task, STRAW naturally predicts frequent n-grams (instead of macroactions), demonstrating the generality of the approach.
    • 2016
      • Learning to Learn by Gradient Descent by Gradient Descent
      • Authors: M Andrychowicz M Denil S Gomez M W Hoffman D Pfau T Schaul N de Freitas
      • Published: NIPS 2016
      • The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.
    • 2016
      • Matching Networks for One Shot Learning
      • Authors: O Vinyals C Blundell T Lillicrap K Kavukcuoglu D Wierstra
      • Published: NIPS 2016
      • Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6% to 93.2% and from 88.0% to 93.8% on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.
    • 2016
      • Memory-Efficient Backpropagation through Time
      • Authors: A Gruslys R Munos I Danhielka M Lanctot A Graves
      • Published: NIPS 2016
      • We propose a novel approach to reduce memory consumption of the backpropagation through time (BPTT) algorithm when training recurrent neural networks (RNNs). Our approach uses dynamic programming to balance a trade-off between caching of intermediate results and recomputation. The algorithm is capable of tightly fitting within almost any user-set memory budget while finding an optimal execution policy minimizing the computational cost. Computational devices have limited memory capacity and maximizing a computational performance given a fixed memory budget is a practical use-case. We provide asymptotic computational upper bounds for various regimes. The algorithm is particularly effective for long sequences. For sequences of length 1000, our algorithm saves 95%% of memory usage while using only one third more time per iteration than the standard BPTT.
    • 2016
      • Safe and Efficient Off-Policy Reinforcement Learning
      • Authors: R Munos T Stepleton A Harutyunyan M G Bellemare
      • Published: NIPS 2016
      • In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) low variance; (2) safety, as it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) efficiency, as it makes the best use of samples collected from near on-policy behaviour policies. We analyse the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. To our knowledge, this is the first return-based off-policy control algorithm converging a.s. to Q∗ without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q(λ), which was still an open problem. We illustrate the benefits of Retrace(λ) on a standard suite of Atari 2600 games.
    • 2016
      • Unifying Count-Based Exploration and Intrinsic Motivation
      • Authors: M G Bellemare S Srinivasan G Ostrovski T Schaul D Saxton R Munos
      • Published: NIPS 2016
      • We consider an agent's uncertainty about its environment and the problem of generalizing this uncertainty across observations. Specifically, we focus on the problem of exploration in non-tabular reinforcement learning. Drawing inspiration from the intrinsic motivation literature, we use sequential density models to measure uncertainty, and propose a novel algorithm for deriving a pseudo-count from an arbitrary sequential density model. This technique enables us to generalize count-based exploration algorithms to the non-tabular case. We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into intrinsic rewards and obtain significantly improved exploration in a number of hard games, including the infamously difficult Montezuma's Revenge.
    • 2016
      • Towards Conceptual Compression
      • Authors: K Gregor F Besse D J Rezende I Danihelka D Wierstra
      • Published: NIPS 2016
      • We introduce a simple recurrent variational auto-encoder architecture that significantly improves image modeling. The system represents the state-of-the-art in latent variable models for both the ImageNet and Omniglot datasets. We show that it naturally separates global conceptual information from lower level details, thus addressing one of the fundamentally desired properties of unsupervised learning. Furthermore, the possibility of restricting ourselves to storing only global information about an image allows us to achieve high quality 'conceptual compression'.
    • 2016
      • Deep Exploration via Bootstrapped DQN
      • Authors: I Osband C Blundell A Pritzel B Van Roy
      • Published: NIPS 2016
      • Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.
    • 2016
      • Learning functions across many orders of magnitude
      • Authors: H van Hasselt A Guez M Hessel D Silver V Mnih
      • Published: NIPS 2016
      • Most learning algorithms are not invariant to the scale of the function that is being approximated. We propose to adaptively normalize the targets used in learning. This is useful in value-based reinforcement learning, where the magnitude of appropriate value approximations can change over time when we update the policy of behavior. Our main motivation is prior work on learning to play Atari games, where the rewards were all clipped to a predetermined range. This clipping facilitates learning across many different games with a single learning algorithm, but a clipped reward function can result in qualitatively different behavior. Using the adaptive normalization we can remove this domain-specific heuristic without diminishing overall performance.
    • 2016
      • Decoupled Neural Interfaces using Synthetic Gradients
      • Authors: M Jaderberg W M Czarnecki S Osindero O Vinyals A Graves K Kavukcuoglu
      • Published: ArXiv 2016
      • Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decouplingmodules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one’s future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass – amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.
    • 2016
      • Latent Predictor Networks for Code Generation
      • Authors: W Ling E Grefenstette K M Hermann T Ko A Senior F Wang P Blunsom
      • Published: ACL 2016
      • Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be marginalised, thus permitting scalable and effective training. Using this framework, we address the problem of generating programming code from a mixed natural language and structured specification. We create two new data sets for this paradigm derived from the collectible trading card games Magic the Gathering and Hearthstone. On these, and a third preexisting corpus, we demonstrate that marginalising multiple predictors allows our model to outperform strong benchmarks.
    • 2016
      • Progressive Neural Networks
      • Authors: A Rusu R Pascanu N Rabinowitz G Desjardins H Soyer J Kirkpatrick K Kavukcuoglu R Hadsell
      • Published: ArXiv 2016
      • Learning to solve complex sequences of tasks—while both leveraging transfer and avoiding catastrophic forgetting—remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
    • 2016
      • Successor Features for Transfer in Reinforcement Learning
      • Authors: A Barreto R Munos T Schaul D Silver
      • Published: ArXiv 2016
      • Transfer in reinforcement learning refers to the notion that generalization should occur not only within a task but also across tasks. Our focus is on transfer where the reward functions vary across tasks while the environment’s dynamics remain the same. The method we propose rests on two key ideas: "successor features", a value function representation that decouples the dynamics of the environment from the rewards, and “generalized policy improvement,” a generalization of dynamic programming’s policy improvement step that considers a set of policies rather than a single one. Put together, the two ideas lead to an approach that integrates seamlessly within the reinforcement learning framework and allows transfer to take place between tasks without any restriction. The proposed method also provides performance guarantees for the transferred policy even before any learning has taken place. We derive two theorems that set our approach in firm theoretical ground and present experiments that show that it successfully promotes transfer in practice.
    • 2016
      • Early Visual Concept Learning with Unsupervised Deep Learning
      • Authors: I Higgins L Matthey X Glorot A Pal B Uria C Blundell S Mohamed A Lerchner
      • Published: ArXiv 2016
      • Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of variation. We draw inspiration from neuroscience, and show how this can be achieved in an unsupervised generative model by applying the same learning pressures as have been suggested to act in the ventral visual stream in the brain. By enforcing redundancy reduction, encouraging statistical independence, and exposure to data with transform continuities analogous to those to which human infants are exposed, we obtain a variational autoencoder (VAE) framework capable of learning disentangled factors. Our approach makes few assumptions and works well across a wide variety of datasets. Furthermore, our solution has useful emergent properties, such as zero-shot inference and an intuitive understanding of "objectness".
    • 2016
      • Model-Free Episodic Control
      • Authors: C Blundell B Uria A Pritzel Y Li A Ruderman J Z Leibo J Rae D Wierstra D Hassabis
      • Published: ArXiv 2016
      • State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance. Humans, on the other hand, can very quickly exploit highly rewarding nuances of an environment upon first discovery. In the brain, such rapid learning is thought to depend on the hippocampus and its capacity for episodic memory. Here we investigate whether a simple model of hippocampal episodic control can learn to solve difficult sequential decision-making tasks. We demonstrate that it not only attains a highly rewarding strategy significantly faster than state-of-the-art deep reinforcement learning algorithms, but also achieves a higher overall reward on some of the more challenging domains.
    • 2016
      • Towards an integration of Deep Learning and Neuroscience
      • Authors: A H Marblestone G Wayne K P Kording
      • Published: bioRxiv 2016
      • Neuroscience has focused on the detailed implementation of computation, studying neural codes, dynamics and circuits. In machine learning, however, artificial neural networks tend to eschew precisely designed codes, dynamics or circuits in favor of brute force optimization of a cost function, often using simple and relatively uniform initial architectures. Two recent developments have emerged within machine learning that create an opportunity to connect these seemingly divergent perspectives. First, structured architectures are used, including dedicated systems for attention, recursion and various forms of short- and long-term memory storage. Second, cost functions and training procedures have become more complex and are varied across layers and over time. Here we think about the brain in terms of these ideas. We hypothesize that (1) the brain optimizes cost functions, (2) these cost functions are diverse and differ across brain locations and over development, and (3) optimization operates within a pre-structured architecture matched to the computational problems posed by behavior. Such a heterogeneously optimized system, enabled by a series of interacting cost functions, serves to make learning data-efficient and precisely targeted to the needs of the organism. We suggest directions by which neuroscience could seek to refine and test these hypotheses.
    • 2016
      • Safely Interruptible Agents
      • Authors: L Orseau S Armstrong
      • Published: UAI 2016
      • Reinforcement learning agents interacting with a complex environment like the real world are unlikely to behave optimally all the time. If such an agent is operating in real-time under human supervision, now and then it may be necessary for a human operator to press the big red button to prevent the agent from continuing a harmful sequence of actions—harmful either for the agent or for the environment—and lead the agent into a safer situation. However, if the learning agent expects to receive rewards from this sequence, it may learn in the long run to avoid such interruptions, for example by disabling the red button— which is an undesirable outcome. This paper explores a way to make sure a learning agent will not learn to prevent (or seek!) being interrupted by the environment or a human operator. We provide a formal definition of safe interruptibility and exploit the off-policy learning property to prove that either some agents are already safely interruptible, like Q-learning, or can easily be made so, like Sarsa. We show that even ideal, uncomputable reinforcement learning agents for (deterministic) general computable environments can be made safely interruptible.
    • 2016
      • Thompson Sampling is Asymptotically Optimal in General Environments
      • Authors: J Leike T Lattimore L Orseau M Hutter
      • Published: UAI 2016
      • We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable classes of general stochastic environments. These environments can be non-Markov, non-ergodic, and partially observable. We show that Thompson sampling learns the environment class in the sense that (1) asymptotically its value converges to the optimal value in mean and (2) given a recoverability assumption regret is sublinear.
    • 2016
      • Convolution by Evolution: Differentiable Pattern Producing Networks
      • Authors: C Fernando D Banarse M Reynolds F Besse D Pfau M Jaderberg M Lanctot D Wierstra
      • Published: GECCO 2016
      • In this work we introduce a differentiable version of the Compositional Pattern Producing Network, called the DPPN. Unlike a standard CPPN, the topology of a DPPN is evolved but the weights are learned. A Lamarckian algorithm, that combines evolution and learning, produces DPPNs to reconstruct an image. Our main result is that DPPNs can be evolved/trained to compress the weights of a denoising autoencoder from 157684 to roughly 200 parameters, while achieving a reconstruction accuracy comparable to a fully connected network with more than two orders of magnitude more parameters. The regularization ability of the DPPN allows it to rediscover (approximate) convolutional network architectures embedded within a fully connected architecture. Such convolutional architectures are the current state of the art for many computer vision applications, so it is satisfying that DPPNs are capable of discovering this structure rather than having to build it in by design. DPPNs exhibit better generalization when tested on the Omniglot dataset after being trained on MNIST, than directly encoded fully connected autoencoders. DPPNs are therefore a new framework for integrating learning and evolution.
    • 2016
      • Adaptive Computation Time for Recurrent Neural Networks
      • Authors: A Graves
      • Published: ArXiv 2016
      • This paper introduces Adaptive Computation Time (ACT), an algorithm that allows recurrent neural networks to learn how many computational steps to take between receiving an input and emitting an output. ACT requires minimal changes to the network architecture, is deterministic and differentiable, and does not add any noise to the parameter gradients. Experimental results are provided for four synthetic problems: determining the parity of binary vectors, applying binary logic operations, adding integers, and sorting real numbers. Overall, performance is dramatically improved by the use of ACT, which successfully adapts the number of computational steps to the requirements of the problem. We also present character-level language modelling results on the Hutter prize Wikipedia dataset. In this case ACT does not yield large gains in performance; however it does provide intriguing insight into the structure of the data, with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences. This suggests that ACT or other adaptive computation methods could provide a generic method for inferring segment boundaries in sequence data.
    • 2016
      • Neural Mechanisms of Hierarchical Planning in a Virtual Subway Network
      • Authors: J Balaguer H Spiers D Hassabis C Summerfield
      • Published: Neuron 2016
      • Planning allows actions to be structured in pursuit of a future goal. However, in natural environments, planning over multiple possible future states incurs prohibitive computational costs. To represent plans efficiently, states can be clustered hierarchically into “contexts”. For example, representing a journey through a subway network as a succession of individual states (stations) is more costly than encoding a sequence of contexts (lines) and context switches (line changes). Here, using functional brain imaging, we asked humans to perform a planning task in a virtual subway network. Behavioral analyses revealed that humans executed a hierarchically organized plan. Brain activity in the dorsomedial prefrontal cortex and premotor cortex scaled with the cost of hierarchical plan representation and unique neural signals in these regions signaled contexts and context switches. These results suggest that humans represent hierarchical plans using a network of caudal prefrontal structures.
    • 2016
      • Pixel Recurrent Neural Networks
      • Authors: A van den Oord N Kalchbrenner K Kavukcuoglu
      • Published: ICML 2016
      • Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.
    • 2016
      • One-shot Learning with Memory-Augmented Neural Networks
      • Authors: A Santoro S Bartunov M Botvinick D Wierstra T Lillicrap
      • Published: ICML 2016
      • Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.
    • 2016
      • Exploiting Cyclic Symmetry in Convolutional Neural Networks
      • Authors: S Dieleman J De Fauw K Kavukcuoglu
      • Published: ICML 2016
      • Many classes of images exhibit rotational symmetry. Convolutional neural networks are sometimes trained using data augmentation to exploit this, but they are still required to learn the rotation equivariance properties from the data. Encoding these properties into the network architecture, as we are already used to doing for translation equivariance by using convolutional layers, could result in a more efficient use of the parameter budget by relieving the model from learning them. We introduce four operations which can be inserted into neural network models as layers, and which can be combined to make these models partially equivariant to rotations. They also enable parameter sharing across different orientations. We evaluate the effect of these architectural modifications on three datasets which exhibit rotational symmetry and demonstrate improved performance with smaller models.
    • 2016
      • Asynchronous Methods for Deep Reinforcement Learning
      • Authors: V Mnih A Puigdomènech Badia M Mirza A Graves T Lillicrap T Harley D Silver K Kavukcuoglu
      • Published: ICML 2016
      • We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.
    • 2016
      • Variational inference for Monte Carlo objectives
      • Authors: A Mnih D J Rezende
      • Published: ICML 2016
      • Recent progress in deep latent variable models has largely been driven by the development of flexible and scalable variational inference methods. Variational training of this type involves maximizing a lower bound on the log-likelihood, using samples from the variational posterior to compute the required gradients. Recently, Burda et al. (2015) have derived a tighter lower bound using a multi-sample importance sampling estimate of the likelihood and showed that optimizing it yields models that use more of their capacity and achieve higher likelihoods. This development showed the importance of such multi-sample objectives and explained the success of several related approaches. We extend the multi-sample approach to discrete latent variables and analyze the difficulty encountered when estimating the gradients involved. We then develop the first unbiased gradient estimator designed for importance-sampled objectives and evaluate it at training generative and structured output prediction models. The resulting estimator, which is based on low-variance per-sample learning signals, is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biased estimators.
    • 2016
      • Dueling Network Architecture for Deep Reinforcement Learning
      • Authors: Z Wang T Schaul M Hessel H van Hasselt M Lanctot N de Freitas
      • Published: ICML 2016
      • In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning inspired by advantage learning. Our dueling architecture represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art Double DQN method of van Hasselt et al. (2015) in 46 out of 57 Atari games.
    • 2016
      • One-Shot Generalization in Deep Generative Models
      • Authors: D J Rezende S Mohamed I Danihelka K Gregor D Wierstra
      • Published: ICML 2016
      • Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples---having seen new examples just once---providing an important class of general-purpose models for one-shot machine learning.
    • 2016
      • Associative Long Short-Term Memory
      • Authors: I Danihelka G Wayne B Uria N Kalchbrenner A Graves
      • Published: ICML 2016
      • We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.
    • 2016
      • Continuous Deep Q-Learning with Model-based Acceleration
      • Authors: S Gu T P Lillicrap I Sutskever S Levine
      • Published: ICML 2016
      • Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized adantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.
    • 2016
      • Grid Long Short-Term Memory
      • Authors: N Kalchbrenner I Danihelka A Graves
      • Published: ICLR 2016
      • This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images. The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data. The network provides a unified way of using LSTM for both deep and sequential computation. We apply the model to algorithmic tasks such as 15-digit integer addition and sequence memorization, where it is able to significantly outperform the standard LSTM. We then give results for two empirical tasks. We find that 2D Grid LSTM achieves 1.47 bits per character on the Wikipedia character prediction benchmark, which is state-of-the-art among neural approaches. In addition, we use the Grid LSTM to define a novel two-dimensional translation model, the Reencoder, and show that it outperforms a phrase-based reference system on a Chinese-to-English translation task.
    • 2016
      • Policy Distillation
      • Authors: A Rusu S Gomez Colmenarejo C Gulcehre G Desjardins J Kirkpatrick R Pascanu V Mnih K Kavukcuoglu R Hadsell
      • Published: ICLR 2016
      • Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.
    • 2016
      • Prioritized Experience Replay
      • Authors: T Schaul J Quan I Antonoglou D Silver
      • Published: ICLR 2016
      • Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experience, so as to replay important transitions more frequently, and therefore learn more efficiently. We use prioritized experience replay in Deep Q-Networks (DQN), a reinforcement learning algorithm that achieved human-level performance across many Atari games. DQN with prioritized experience replay achieves a new state of-the-art, outperforming DQN with uniform replay on 41 out of 49 games.
    • 2016
      • Neural Programmer-Interpreters
      • Authors: S Reed N de Freitas
      • Published: ICLR 2016 - Best Paper Award!
      • We propose the neural programmer-interpreter (NPI): a recurrent and compositional neural network that learns to represent and execute programs. NPI has three learnable components: a task-agnostic recurrent core, a persistent key-value program memory, and domain-specific encoders that enable a single NPI to operate in multiple perceptually diverse environments with distinct affordances. By learning to compose lower-level programs to express higher-level programs, NPI reduces sample complexity and increases generalization ability compared to sequence-to-sequence LSTMs. The program memory allows efficient learning of additional tasks by building on existing programs. NPI can also harness the environment (e.g. a scratch pad with read-write pointers) to cache intermediate results of computation, lessening the long-term memory burden on recurrent hidden units. In this work we train the NPI with fully-supervised execution traces; each program has example sequences of calls to the immediate subprograms conditioned on the input. Rather than training on a huge number of relatively weak labels, NPI learns from a small number of rich examples. We demonstrate the capability of our model to learn several types of compositional programs: addition, sorting, and canonicalizing 3D models. Furthermore, a single NPI learns to execute these programs and all 21 associated subprograms.
    • 2016
      • Reasoning about Entailment with Neural Attention
      • Authors: T Rocktäschel E Grefenstette K M Hermann T Kočiský P Blunsom
      • Published: ICLR 2016
      • Automatically recognizing entailment relations between pairs of natural language sentences has so far been the dominion of classifiers employing hand engineered features derived from natural language processing pipelines. End-to-end differentiable neural architectures have failed to approach state-of-the-art performance until very recently. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.
    • 2016
      • Continuous control with deep reinforcement learning
      • Authors: T Lillicrap JJ Hunt A Pritzel N Heess T Erez Y Tassa D Silver D Wierstra
      • Published: ICLR 2016
      • We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.
    • 2016
      • MuProp: Unbiased Backpropagation For Stochastic Neural Networks
      • Authors: S Gu S Levine I Sutskever A Mnih
      • Published: ICLR 2016
      • Deep neural networks are powerful parametric models that can be trained efficiently using the backpropagation algorithm. Stochastic neural networks combine the power of large parametric functions with that of graphical models, which makes it possible to learn very complex distributions. However, as backpropagation is not directly applicable to stochastic networks that include discrete sampling operations within their computational graph, training such networks remains difficult. We present MuProp, an unbiased gradient estimator for stochastic networks, designed to make this task easier. MuProp improves on the likelihood-ratio estimator by reducing its variance using a control variate based on the first-order Taylor expansion of a mean-field network. Crucially, unlike prior attempts at using backpropagation for training stochastic networks, the resulting estimator is unbiased and well behaved. Our experiments on structured output prediction and discrete latent variable modeling demonstrate that MuProp yields consistently good performance across a range of difficult tasks.
    • 2016
      • Neural Random-Access Machines
      • Authors: K Kurach M Andrychowicz I Sutskever
      • Published: ICLR 2016
      • In this paper, we propose and investigate a new neural network architecture called Neural Random Access Machine. It can manipulate and dereference pointers to an external variable-size random-access memory. The model is trained from pure input-output examples using backpropagation.We evaluate the new model on a number of simple algorithmic tasks whose solutions require pointer manipulation and dereferencing. Our results show that the proposed model can learn to solve algorithmic tasks of such type and is capable of operating on simple data structures like linked-lists and binary trees. For easier tasks, the learned solutions generalize to sequences of arbitrary length. Moreover, memory access during inference can be done in a constant time under some assumptions.
    • 2016
      • A note on the evaluation of generative models
      • Authors: L Theis A van den Oord M Bethge
      • Published: ICLR 2016
      • Probabilistic generative models can be used for compression, denoising, inpainting, texture synthesis, semi-supervised learning, unsupervised feature learning, and other tasks. Given this wide range of applications, it is not surprising that a lot of heterogeneity exists in the way these models are formulated, trained, and evaluated. As a consequence, direct comparison between models is often difficult. This article reviews mostly known but often underappreciated properties relating to the evaluation and interpretation of generative models with a focus on image models. In particular, we show that three of the currently most commonly used criteria---average log-likelihood, Parzen window estimates, and visual fidelity of samples---are largely independent of each other when the data is high-dimensional. Good performance with respect to one criterion therefore need not imply good performance with respect to the other criteria. Our results show that extrapolation from one criterion to another is not warranted and generative models need to be evaluated directly with respect to the application(s) they were intended for. In addition, we provide examples demonstrating that Parzen window estimates should generally be avoided.
    • 2016
      • Order Matters: Sequence to sequence for sets
      • Authors: O Vinyals S Bengio M Kudlur
      • Published: ICLR 2016
      • Sequences have become first class citizens in supervised learning thanks to the resurgence of recurrent neural networks. Many complex tasks that require mapping from or to a sequence of observations can now be formulated with the sequence-to-sequence (seq2seq) framework which employs the chain rule to efficiently represent the joint probability of sequences. In many cases, however, variable sized inputs and/or outputs might not be naturally expressed as sequences. For instance, it is not clear how to input a set of numbers into a model where the task is to sort them; similarly, we do not know how to organize outputs when they correspond to random variables and the task is to model their unknown joint probability. In this paper, we first show using various examples that the order in which we organize input and/or output data matters significantly when learning an underlying model. We then discuss an extension of the seq2seq framework that goes beyond sequences and handles input sets in a principled way. In addition, we propose a loss which, by searching over possible orders during training, deals with the lack of structure of output sets. We show empirical evidence of our claims regarding ordering, and on the modifications to the seq2seq framework on benchmark language modeling and parsing tasks, as well as two artificial tasks -- sorting numbers and estimating the joint probability of unknown graphical models.
    • 2016
      • Multi-task Sequence to Sequence Learning
      • Authors: M Luong Q Le, I Sutskever O Vinyals L Kaiser
      • Published: ICLR 2016
      • Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three settings to multi-task sequence to sequence learning: (a) the one-to-many setting - where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting - useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting - where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation. Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Additionally, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the context of multi-task learning: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought.
    • 2016
      • A Test of Relative Similarity for Model Selection in Generative Models
      • Authors: E Belilovsky W Bounliphone M Blaschko I Antonoglou A Gretton
      • Published: ICLR 2016
      • Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches. Model selection in this generative setting can be challenging, however, particularly when likelihoods are not easily accessible. To address this issue, we introduce a statistical test of relative similarity, which is used to determine which of two models generates samples that are significantly closer to a real-world reference dataset of interest. We use as our test statistic the difference in maximum mean discrepancies (MMDs) between the reference dataset and each model dataset, and derive a powerful, low-variance test based on the joint asymptotic distribution of the MMDs between each reference-model pair. In experiments on deep generative models, including the variational auto-encoder and generative moment matching network, the tests provide a meaningful ranking of model performance as a function of parameter and training settings.
    • 2016
      • ACDC: A Structured Efficient Linear Layer
      • Authors: M Moczulski M Denil J Appleyard N de Freitas
      • Published: ICLR 2016
      • The linear layer is one of the most pervasive modules in deep learning representations. However, it requires O(N2) parameters and O(N2) operations. These costs can be prohibitive in mobile applications or prevent scaling in many domains. Here, we introduce a deep, differentiable, fully-connected neural network module composed of diagonal matrices of parameters, A and D, and the discrete cosine transform C. The core module, structured as ACDC−1, has O(N) parameters and incurs O(N log N) operations. We present theoretical results showing how deep cascades of ACDC layers approximate linear layers. ACDC is, however, a stand-alone module and can be used in combination with any other types of module. In our experiments, we show that it can indeed be successfully interleaved with ReLU modules in convolutional neural networks for image recognition. Our experiments also study critical factors in the training of these structured modules, including initialization and depth. Finally, this paper also points out avenues for implementing the complex version of ACDC using photonic devices.
    • 2016
      • Learning Efficient Algorithms with Hierarchical Attentive Memory
      • Authors: M Andrychowicz K Kurach
      • Published: ArXiv 2016
      • In this paper, we propose and investigate a novel memory architecture for neural networks called Hierarchical Attentive Memory (HAM). It is based on a binary tree with leaves corresponding to memory cells. This allows HAM to perform memory access in O(log n) complexity, which is a significant improvement over the standard attention mechanism that requires O(n) operations, where n is the size of the memory. We show that an LSTM network augmented with HAM can learn algorithms for problems like merging, sorting or binary searching from pure input-output examples. In particular, it learns to sort n numbers in time O(n log n) and generalizes well to input sequences much longer than the ones seen during the training. We also show that HAM can be trained to act like classic data structures: a stack, a FIFO queue and a priority queue.
    • 2016
      • Increasing the Action Gap: New Operators for Reinforcement Learning
      • Authors: M G Bellemare G Ostrovski A Guez P S Thomas R Munos
      • Published: AAAI 2016
      • This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, mitigates the undesirable effects of approximation and estimation errors on the induced greedy policies. This operator can also be applied to discretized continuous space and time problems, and we provide empirical results evidencing superior performance in this context. Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator. As corollaries we provide a proof of optimality for Baird's advantage learning algorithm and derive other gap-increasing operators with interesting properties. We conclude with an empirical study on 60 Atari 2600 games illustrating the strong potential of these new operators.
    • 2016
      • Deep Reinforcement Learning with Double Q-learning
      • Authors: H van Hasselt A Guez D Silver
      • Published: AAAI 2016
      • The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be prevented. In this paper, we answer all these questions affirmatively. In particular, we first show that the recent DQN algorithm, which combines Q-learning with a deep neural network, suffers from substantial overestimations in some games in the Atari 2600 domain. We then show that the idea behind the Double Q-learning algorithm, which was introduced in a tabular setting, can be generalized to work with large-scale function approximation. We propose a specific adaptation to the DQN algorithm and show that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.
    • 2015
      • Human Level Control Through Deep Reinforcement Learning
      • Authors: V Mnih K Kavukcuoglu D Silver A Rusu J Veness M Bellemare A Graves M Riedmiller A Fidjeland G Ostrovski S Petersen C Beattie A Sadik I Antonoglou H King D Kumaran D Wierstra S Legg D Hassabis
      • Published: Nature 2015
      • The theory of reinforcement learning provides a normative account deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
    • 2015
      • Spatial Transformer Networks
      • Authors: M Jaderberg K Simonyan A Zisserman K Kavukcuoglu
      • Published: NIPS 2015
      • Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.
    • 2015
      • Teaching Machines to Read and Comprehend
      • Authors: KM Hermann T Kočiský E Grefenstette L Espeholt W Kay M Suleyman P Blunsom
      • Published: NIPS 2015
      • Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure
    • 2015
      • DRAW: A Recurrent Neural Network For Image Generation
      • Authors: K Gregor I Danihelka A Graves D Wierstra
      • Published: ICML 2015
      • This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto-encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.
    • 2015
      • Universal Value Function Approximators
      • Authors: T Schaul D Horgan K Gregor D Silver
      • Published: ICML 2015
      • Value functions are a core component of reinforcement learning systems. The main idea is to to construct a single function approximator V (s; θ) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.
    • 2015
      • Hippocampal place cells construct reward related sequences through unexplored space
      • Authors: H Ólafsdóttir C Barry A Saleem D Hassabis H Spiers
      • Published: eLife 2015
      • Dominant theories of hippocampal function propose that place cell representations are formed during an animal's first encounter with a novel environment and are subsequently replayed during off-line states to support consolidation and future behaviour. Here we report that viewing the delivery of food to an unvisited portion of an environment leads to off-line pre-activation of place cells sequences corresponding to that space. Such ‘preplay’ was not observed for an unrewarded but otherwise similar portion of the environment. These results suggest that a hippocampal representation of a visible, yet unexplored environment can be formed if the environment is of motivational relevance to the animal. We hypothesise such goal-biased preplay may support preparation for future experiences in novel environments.
    • 2015
      • Natural Neural Networks
      • Authors: G Desjardins K Simonyan R Pascanu K Kavukcuoglu
      • Published: NIPS 2015
      • We present a novel reparametrization for deep neural networks which is shown to improve conditioning of the Fisher Information matrix. The reparametrization is simple and consists in an explicit whitening operation for each hidden layer of the network. Importantly, the whitening operation is a reparametrization which preserves the function implemented by the network, while improving the conditioning of the gradient. The resulting Projected Natural Gradient Descent algorithm generalizes the recently introduced batch normalisation scheme and is closely related to the online learning algorithm of Mirror Descent. We highlight the efficiency and scalability of our method on both unsupervised and supervised learning settings, including on the challenging Imagenet dataset.
    • 2015
      • Variational Inference with Normalizing Flows
      • Authors: D Rezende S Mohamed
      • Published: ICML 2015
      • The choice of the approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.
    • 2015
      • Learning Continuous Control Policies by Stochastic Value Gradients
      • Authors: N Heess G Wayne D Silver T Lillicrap Y Tassa T Erez
      • Published: NIPS 2015
      • We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment instead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.
    • 2015
      • Move Evaluation in Go Using Deep Convolutional Neural Networks
      • Authors: C Maddison A Huang I Sutskever D Silver
      • Published: ICLR 2015
      • The game of Go is more challenging than other board games, due to the difficulty of constructing a position or move evaluation function. In this paper we investigate whether deep convolutional networks can be used to directly represent and learn this knowledge. We train a large 12-layer convolutional neural network by supervised learning from a database of human professional games. The network correctly predicts the expert move in 55% of positions, equalling the accuracy of a 6 dan human player. When the trained convolutional network was used directly to play games of Go, without any search, it beat the traditional-search program GnuGo in 97% of games, and matched the performance of a state-of-the-art Monte-Carlo tree search that simulates two million positions per move.
    • 2015
      • Gradient Estimation Using Stochastic Computation Graphs
      • Authors: J Schulman N Heess T Weber P Abbeel
      • Published: NIPS 2015
      • In a variety of problems originating in supervised, unsupervised, and reinforcement learning, the loss function is defined by an expectation over a collection of random variables, which might be part of a probabilistic model or the external world. Estimating the gradient of this loss function, using samples, lies at the core of gradient-based learning algorithms for these problems. We introduce the formalism of stochastic computation graphs—directed acyclic graphs that include both deterministic functions and conditional probability distributions and describe how to easily and automatically derive an unbiased estimator of the loss function’s gradient. The resulting algorithm for computing the gradient estimator is a simple modification of the standard backpropagation algorithm. The generic scheme we propose unifies estimators derived in variety of prior work, along with variance-reduction techniques therein, and should enable researchers to develop increasingly intricate models involving a combination of stochastic and deterministic operations, enabling, for example, attention, memory, and control actions.
    • 2015
      • Approximate Hubel-Wiesel Modules and the Data Structures of Neural Computation
      • Authors: J Z Leibo J Cornebise S Gómez D Hassabis
      • Published: arXiv 2015
      • This paper describes a framework for modeling the interface between perception and memory on the algorithmic level of analysis. It is consistent with phenomena associated with many different brain regions. These include view-dependence (and invariance) effects in visual psychophysics [1, 2] and inferotemporal cortex physiology [3, 4], as well as episodic memory recall interference effects associated with the medial temporal lobe [5, 6]. The perspective developed here relies on a novel interpretation of Hubel and Wiesel’s conjecture for how receptive fields tuned to complex objects, and invariant to details, could be achieved [7]. It complements existing accounts of two-speed learning systems in neocortex and hippocampus (e.g., [8, 9]) while significantly expanding their scope to encompass a unified view of the entire pathway from V1 to hippocampus.
    • 2015
      • Learning to Transduce with Unbounded Memory
      • Authors: E Grefenstette KM Hermann M Suleyman P Blunsom
      • Published: NIPS 2015
      • Recently, strong results have been demonstrated by Deep Recurrent Neural Networks on natural language transduction problems. In this paper we explore the representational power of such models using synthetic grammars designed to exhibit phenomena similar to that found in real transduction problems, such as machine translation. These experiments lead us to propose new memory-based recurrent networks that implement continuously differentiable analogues of the traditional data structures Stacks, Queues, and DeQues. We show that these architectures exhibit superior generalisation performance to Deep RNNs on our transduction experiments and often learn the underlying generating algorithm.
    • 2015
      • Multiple Object Recognition with Visual Attention
      • Authors: J Lei Ba V Mnih K Kavukcuoglu
      • Published: ICLR 2015
      • We present an attention-based model for recognizing multiple objects in images. The proposed model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. We show that the model learns to both localize and recognize multiple objects despite being given only class labels during training. We evaluate the model on the challenging task of transcribing house number sequences from Google Street View images and show that it is both more accurate than the state-of-the-art convolutional networks and uses fewer parameters and less computation.
    • 2015
      • Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning
      • Authors: S Mohamed D Rezende
      • Published: NIPS 2015
      • The mutual information is a core statistical quantity that has applications in all areas of machine learning, whether this is in training of density models over multiple data modalities, in maximising the efficiency of noisy transmission channels, or when learning behaviour policies for exploration by artificial agents. Most learning algorithms that involve optimisation of the mutual information rely on the Blahut-Arimoto algorithm --- an enumerative algorithm with exponential complexity that is not suitable for modern machine learning applications. This paper provides a new approach for scalable optimisation of the mutual information by merging techniques from variational inference and deep learning. We develop our approach by focusing on the problem of intrinsically-motivated learning, where the mutual information forms the definition of a well-known internal drive known as empowerment. Using a variational lower bound on the mutual information, combined with convolutional networks for handling visual input streams, we develop a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions.
    • 2015
      • Online Learning of k-CNF Boolean Functions
      • Authors: J Veness M Hutter L Orseau M Bellemare
      • Published: IJCAI 2015
      • This paper revisits the problem of learning a k-CNF Boolean function from examples, for fixed k, in the context of online learning under the logarithmic loss. We give a Bayesian interpretation to one of Valiant’s classic PAC learning algorithms, which we then build upon to derive three efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded by a polynomial function of the size of each example.
    • 2015
      • Weight Uncertainty in Neural Networks
      • Authors: C Blundell J Cornebise K Kavukcuoglu D Wierstra
      • Published: ICML 2015
      • We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.
    • 2015
      • Very Deep Convolutional Networks for Large-Scale Image Recognition
      • Authors: K Simonyan A Zisserman
      • Published: ICLR 2015
      • In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small ( 3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
    • 2015
      • Fictitious Self-Play in Extensive-Form Games
      • Authors: J Heinrich M Lanctot D Silver
      • Published: ICML 2015
      • Fictitious play is a popular game-theoretic model of learning in games. However, it has received little attention in practical applications to large problems. This paper introduces two variants of fictitious play that are implemented in behavioural strategies of an extensive-form game. The first variant is a full-width process that is realization equivalent to its normal-form counter-part and therefore inherits its convergence guarantees. However, its computational requirements are linear in time and space rather than exponential. The second variant, Fictitious Self-Play, is a machine learning framework that implements fictitious play in a sample-based fashion. Experiments in imperfect-information poker games compare our approaches and demonstrate their convergence to approximate Nash equilibria.
    • 2015
      • Toward Minimax Off-policy Value Estimation
      • Authors: L Li R Munos C Szepesvpari
      • Published: AISTATS 2015
      • This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the single-state, or multi-armed bandit case, establish a finite-time minimax risk lower bound, and analyze the risk of three standard estimators. For the so-called regression estimator, we show that while it is asymptotically optimal, for small sample sizes it may perform suboptimally compared to an ideal oracle up to a multiplicative factor that depends on the number of actions. We also show that the other two popular estimators can be arbitrarily worse than the optimal, even in limit of infinitely many data points. The performance of the estimators are studied in synthetic and real problems; illustrating the methods strengths and weaknesses. We also discuss the implications of these results for off-policy evaluation problems in contextual bandits and fixed-horizon Markov decision processes.
    • 2015
      • Massively Parallel Methods for Deep Reinforcement Learning
      • Authors: A Nair P Srinivasan S Blackwell C Alcicek R Fearon A De Maria V Panneershelvam M Suleyman C Beattie S Petersen S Legg V Mnih K Kavukcuoglu D Silver
      • Published: ICML 2015 Deep Learning Workshop
      • We present the first massively distributed architecture for deep reinforcement learning. This architecture uses four main components: parallel actors that generate new behaviour; parallel learners that are trained from stored experience; a distributed neural network to represent the value function or behaviour policy; and a distributed store of experience. We used our architecture to implement the Deep Q-Network algorithm (DQN) (Mnih et al., 2013). Our distributed algorithm was applied to 49 games from Atari 2600 games from the Arcade Learning Environment, using identical hyperparameters. Our performance surpassed non-distributed DQN in 41 of the 49 games and also reduced the wall-time required to achieve these results by an order of magnitude on most games.
    • 2015
      • Deep Structured Output Learning for Unconstrained Text Recognition
      • Authors: M Jaderberg K Simonyan A Vedaldi A Zisserman
      • Published: ICLR 2015
      • We develop a representation suitable for the unconstrained recognition of words in natural images, where unconstrained means that there is no fixed lexicon and words have unknown length. To this end we propose a convolutional neural network (CNN) based architecture which incorporates a Conditional Random Field (CRF) graphical model, taking the whole word image as a single input. The unaries of the CRF are provided by a CNN that predicts characters at each position of the output, while higher order terms are provided by another CNN that detects the presence of N-grams. We show that this entire model (CRF, character predictor, N-gram predictor) can be jointly optimised by back-propagating the structured output loss, essentially requiring the system to perform multi-task learning, and training requires only synthetically generated data. The resulting model is a more accurate system on standard real-world text recognition benchmarks than character prediction alone, setting a benchmark for systems that have not been trained on a particular lexicon. In addition, our model achieves state-of-the-art accuracy in lexicon-constrained scenarios, without being specifically modelled for constrained recognition. To test the generalisation of our model, we also perform experiments with random alpha-numeric strings to evaluate the method when no visual language model is applicable.
    • 2015
      • Compress and Control
      • Authors: J Veness M Bellemare M Hutter A Chua G Desjardins
      • Published: AAAI 2015
      • This paper describes a new information-theoretic policy evaluation technique for reinforcement learning. This technique converts any compression or density model into a corresponding estimate of value. Under appropriate stationarity and ergodicity conditions, we show that the use of a sufficiently powerful model gives rise to a consistent value function estimator. We also study the behavior of this technique when applied to various Atari 2600 video games, where the use of suboptimal modeling techniques is unavoidable. We consider three fundamentally different models, all too limited to perfectly model the dynamics of the system. Remarkably, we find that our technique provides sufficiently accurate value estimates for effective on-policy control. We conclude with a suggestive study highlighting the potential of our technique to scale to large problems.
    • 2015
      • Smooth UCT Search in Computer Poker
      • Authors: J Heinrich D Silver
      • Published: IJCAI 2015
      • Self-play Monte Carlo Tree Search (MCTS) has been successful in many perfect-information two player games. Although these methods have been extended to imperfect-information games, so far they have not achieved the same level of practical success or theoretical convergence guarantees as competing methods. In this paper we introduce Smooth UCT, a variant of the established Upper Confidence Bounds Applied to Trees (UCT) algorithm. Smooth UCT agents mix in their average policy during self-play and the resulting planning process resembles game-theoretic fictitious play. When applied to Kuhn and Leduc poker, Smooth UCT approached a Nash equilibrium, whereas UCT diverged. In addition, Smooth UCT outperformed UCT in Limit Texas Hold’em and won 3 silver medals in the 2014 Annual Computer Poker Competition.
    • 2015
      • Fast gradient descent for drifting least squares regression, with application to bandits
      • Authors: N Korda P L A R Munos
      • Published: AAAI 2015
      • Online learning algorithms require to often recompute least squares regression estimates of parameters. We study improving the computational complexity of such algorithms by using stochastic gradient descent (SGD) type schemes in place of classic regression solvers. We show that SGD schemes efficiently track the true solutions of the regression problems, even in the presence of a drift. This finding coupled with an O(d) improvement in complexity, where d is the dimension of the data, make them attractive for implementation in the big data settings. In the case when strong convexity in the regression problem is guaranteed, we provide bounds on the error both in expectation and high probability (the latter is often needed to provide theoretical guarantees for higher level algorithms), despite the drifting least squares solution. As an example of this case we prove that the regret performance of an SGD version of the PEGE linear bandit algorithm is worse than that of PEGE itself only by a factor of O(log4 n). When strong convexity of the regression problem cannot be guaranteed, we investigate using an adaptive regularisation. We make an empirical study of an adaptively regularised, SGD version of LinUCB in a news article recommendation application, which uses the large scale news recommendation dataset from Yahoo! front page. These experiments show a large gain in computational complexity and a consistently low tracking error.
    • 2015
      • Count-Based Frequency Estimation using Bounded Memory
      • Authors: M Bellemare
      • Published: IJCAI 2015
      • Count-based estimators are a fundamental building block of a number of powerful sequential prediction algorithms, including Context Tree Weighting and Prediction by Partial Matching. Keeping exact counts, however, typically results in a high memory overhead. In particular, when dealing with large alphabets the memory requirements of count-based estimators often become prohibitive. In this paper we propose three novel ideas for approximating count-based estimators using bounded memory. Our first contribution, of independent interest, is an extension of reservoir sampling for sampling distinct symbols from a stream of unknown length, which we call K-distinct reservoir sampling. We combine this sampling scheme with a state-of-the-art count-based estimator for memoryless sources, the Sparse Adaptive Dirichlet (SAD) estimator. The resulting algorithm, the Budget SAD, naturally guarantees a limit on its memory usage. We finally demonstrate the broader use of K-distinct reservoir sampling in nonparametric estimation by using it to restrict the branching factor of the Context Tree Weighting algorithm. We demonstrate the usefulness of our algorithms with empirical results on two sequential, large-alphabet prediction problems.
    • 2015
      • Dependency Recurrent Neural Language Models for Sentence Completion
      • Authors: P Mirowski A Vlachos
      • Published: ACL 2015
      • Recent work on language modelling has shifted focus from count-based models to neural models. In these works, the words in each sentence are always considered in a left-to-right order. In this paper we show how we can improve the performance of the recurrent neural network (RNN) language model by incorporating the syntactic dependencies of a sentence, which have the effect of bringing relevant contexts closer to the word being predicted. We evaluate our approach on the Microsoft Research Sentence Completion Challenge and show that the dependency RNN proposed improves over the RNN by about 10 points in accuracy. Furthermore, we achieve results comparable with the state-of-the-art models on this task.
    • 2014
      • Neural Turing Machines
      • Authors: A Graves G Wayne I Danihelka
      • Published: arXiv 2014
      • We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.
    • 2014
      • Recurrent Models of Visual Attention
      • Authors: V Mnih N Hees A Graves K Kavukcuoglu
      • Published: NIPS 2014
      • Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.
    • 2014
      • Semi-Supervised Learning with Deep Generative Models
      • Authors: D Kingma D Rezende S Mohamed M Welling
      • Published: NIPS 2014
      • The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
    • 2014
      • Bayes-Adaptive Simulation-based Search with Value Function Approximation
      • Authors: A Guez N Heess D Silver P Dayan
      • Published: NIPS 2014
      • Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks.
    • 2014
      • Deep AutoRegressive Networks
      • Authors: K Gregor I Danihelka C Blundell A Mnih C Blundell D Wierstra
      • Published: ICML 2014
      • We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets, including several UCI data sets, MNIST and Atari 2600 games.
    • 2014
      • Stochastic Backpropagation and Approximate Inference in Deep Generative Models
      • Authors: D Rezende S Mohamed D Wierstra
      • Published: ICML 2014
      • We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation rules for gradient backpropagation through stochastic variables and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several real-world data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for high-dimensional data visualisation.
    • 2014
      • Towards End-to-End Speech Recognition with Recurrent Neural Networks
      • Authors: A Graves N Jaitly
      • Published: ICML 2014
      • This paper presents a speech recognition system able to transcribe audio spectrograms with character sequences without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function. A modification to the objective function is introduced, making it possible to train the network to minimise the expectation of an arbitrary transcription loss function. This allows a direct optimisation of the word error rate, even in the absence of a lexicon or language model. The complete system achieves a word error rate of 27.3% on the Wall Street Journal corpus with no prior linguistic information, 21.9% with only a lexicon of allowed words, and 8.2% with a trigram language model. Combining the network with a baseline system further reduces the error rate to 6.7%.
    • 2014
      • Skip Context Tree Switching
      • Authors: M Bellemare J Veness E Talvitie
      • Published: ICML 2014
      • Context Tree Weighting (CTW) is a powerful probabilistic sequence prediction technique that efficiency performs Bayesian model averaging over the class of all prediction suffix trees of bounded depth. In this paper we show how to generalize this technique to the class of k-skip prediction suffix trees. Contrary to regular prediction suffix trees, k-skip prediction suffix trees are permitted to ignore up to k contiguous portions of the context. This allows for significant improvements in predictive accuracy when irrelevant variables are present, a case which often occurs within record-aligned data and images. We provide a regret-based analysis of our approach, and empirically evaluate it on the Calgary corpus and a set of Atari 2600 screen prediction tasks.
    • 2014
      • Neural Variational Inference and Learning in Belief Networks
      • Authors: A Mnih K Gregor
      • Published: ICML 2014
      • Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. The model and this inference network are trained jointly by maximizing a variational lower bound on the log-likelihood. Although the naive estimator of the inference model gradient is too high-variance to be useful, we make it practical by applying several straightforward model-independent variance reduction techniques. Applying our approach to training sigmoid belief networks and deep autoregressive networks, we show that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.
    • 2014
      • Deterministic Policy Gradient Algorithms
      • Authors: D Silver G Lever N Heess T Degris D Wierstra M Riedmiller
      • Published: ICML 2014
      • In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counter-parts in high-dimensional action spaces.
    • 2014
      • Unit Tests for Stochastic Optimization
      • Authors: T Schaul I Antonoglou D Silver
      • Published: ICLR 2014
      • Optimization by stochastic gradient descent is an important component of many large-scale machine learning algorithms. A wide variety of such optimization algorithms have been devised; however, it is unclear whether these algorithms are robust and widely applicable across many different optimization landscapes. In this paper we develop a collection of unit tests for stochastic optimization. Each unit test rapidly evaluates an optimization algorithm on a small-scale, isolated, and well-understood difficulty, rather than in real-world scenarios where many such issues are entangled. Passing these unit tests is not sufficient, but absolutely necessary for any algorithms with claims to generality or robustness. We give initial quantitative and qualitative results on numerous established algorithms. The testing framework is open-source, extensible, and easy to apply to new algorithms.
    • 2014
      • Online Learning of k-CNF Boolean Functions
      • Authors: J Veness M Hutter
      • Published: arXiv 2014
      • This paper revisits the problem of learning a k-CNF Boolean function from examples in the context of online learning under the logarithmic loss. In doing so, we give a Bayesian interpretation to one of Valiant's celebrated PAC learning algorithms, which we then build upon to derive two efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded, ignoring logarithmic factors, by a polynomial function of the size of each example.
    • 2013
      • Learning word embeddings efficiently with noise-contrastive estimation
      • Authors: A Mnih K Kavukcuoglu
      • Published: NIPS 2013
      • We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model. In the worst case, Our algorithms scale quadratically in the number of vertices of the network, but independent of the number of nested communities. In practice, the run time of our algorithms are two orders of magnitude faster than the Infinite Relational Model, achieving comparable or better accuracy.
    • 2013
      • Bayesian Hierarchical Community Discovery
      • Authors: C Blundell Y Whye Teh
      • Published: NIPS 2013
      • We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model. In the worst case, Our algorithms scale quadratically in the number of vertices of the network, but independent of the number of nested communities. In practice, the run time of our algorithms are two orders of magnitude faster than the Infinite Relational Model, achieving comparable or better accuracy
    • 2013
      • Playing Atari with Deep Reinforcement Learning
      • Authors: V Mnih K Kavukcuoglu D Silver A Graves I Antonoglou D Wierstra M Riedmiller
      • Published: NIPS 2013 Deep Learning Workshop
      • We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learn- ing Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
    • 2012
      • The Future of Memory: Remembering, Imagining, and the Brain
      • Authors: D Schacter D Addis D Hassabis V Martin N Spreng K Szpunar
      • Published: Neuron. 76, 677-694
      • During the past few years, there has been a dramatic increase in research examining the role of memory in imagination and future thinking. This work has revealed striking similarities between remembering the past and imagining or simulating the future, including the finding that a common brain network underlies both memory and imagination. Here, we discuss a number of key points that have emerged during recent years, focusing in particular on the importance of distinguishing between temporal and nontemporal factors in analyses of memory and imagination, the nature of differences between remembering the past and imagining the future, the identification of component processes that comprise the default network supporting memory based simulations, and the finding that this network can couple flexibly with other networks to support complex goal-directed simulations. This growing area of research has broadened our conception of memory by highlighting the many ways in which memory supports adaptive functioning.
    • 2012
      • Is the brain a good model for machine intelligence?
      • Authors: D Hassabis R Brooks D Bray A Shashua
      • Published: Nature 482, 462-463
      • To celebrate the centenary of the year of Alan Turing's birth, four scientists and entrepreneurs assess the divide between neuroscience and computing.
    • 2011
      • An Approximation of the Universal Intelligence Measure
      • Authors: S Legg J Veness
      • Published: Solomonoff Memorial Conference Melbourne, Australia, 2011
      • The Universal Intelligence Measure is a recently proposed formal definition of intelligence. It is mathematically specified, extremely general, and captures the essence of many informal definitions of intelligence. It is based on Hutter's Universal Artificial Intelligence theory, an extension of Ray Solomonoff's pioneering work on universal induction. Since the Universal Intelligence Measure is only asymptotically computable, building a practical intelligence test from it is not straightforward. This paper studies the practical issues involved in developing a real-world UIM-based performance metric. Based on our investigation, we develop a prototype implementation which we use to evaluate a number of different artificial agents.
Top