Matches in SemOpenAlex for { <https://semopenalex.org/work/W2089948480> ?p ?o ?g. }
Showing items 1 to 61 of
61
with 100 items per page.
- W2089948480 endingPage "R685" @default.
- W2089948480 startingPage "R682" @default.
- W2089948480 abstract "Neuroscientists study the brain at many different levels, from molecular to psychological. Despite progress on many of these levels, there is a disappointing lack of coherence in neuroscience research: we have yet to find an overarching theoretical framework for understanding what the brain does. Vision is one of the most intensively studied areas of brain function. Yet, even in this field, there are wide disagreements about the goals of cortical processing. In the second half of the twentieth century, there were two important attempts to provide a theoretical framework for understanding vision, by David Marr and James Gibson. There are now grounds for optimism that these two broad approaches can be brought together to provide a biologically plausible and yet computationally tractable framework for understanding and imitating human vision. I shall focus on attempts to describe vision at the level Marr called ‘computational theory’. Marr, 1982Marr D Vision: a computational investigation into the human representation and processing of visual information. W.H. Freeman and company, 1982Google Scholar emphasised that vision was nothing more than an information-processing task. Any such task, he argued, could be described on three levels: (i) computational theory; (ii) specific algorithms; and (iii) physical implementation. The three levels correspond roughly to: (i) defining the problem and setting out how, in principle, it can be solved; (ii) designing a detailed simulation of the process; and (iii) building a working system that will carry it out (see Box 1). The important point is that the levels can be considered independently. As a result, it ought to be possible to mimic the algorithms underlying biological vision in robots: the only difference would be in how they are implemented physically. This concept of independent levels of explanation remains a mantra of vision research. Marr attempted to set out a computational theory for vision as a whole. He suggested that visual processing passes through a series of stages, each corresponding to a different representation, from retinal image to ‘3D model’ representation of objects. One problem with this account is that information needs to be passed continually from one coordinate frame to another. There is increasing interest in models that avoid coordinate transformations of this kind, rather using information stored in retinal coordinates for tasks such as object recognition or navigation (for example Mallot, 2000Mallot H.A Computational vision: information processing in perception and visual behaviour. 2nd edn. MIT Press, 2000Google Scholar). Like Marr, Gibson had a powerful influence on vision research in the last century. Marr himself wrote that “in perception, perhaps the nearest anyone came to the level of computational theory was Gibson”. Gibson promoted an ‘ecological’ approach to studying vision, by which he meant that vision should be understood first and foremost as a tool that enables animals to achieve the basic tasks required for life: avoid obstacles, identify food or predators, approach a goal and so on (for example Gibson, 1979Gibson, J.J. (1979). The ecological approach to visual perception (Houghton Mifflin).Google Scholar). This viewpoint has gained increasing influence. An emphasis on survival of the organism is a more promising basis for a computational theory of vision than Marr's assertion that vision is “knowing what is where by looking”. Where Gibson infuriated his contemporaries was in his musings about the brain mechanisms that might generate these ‘ecological’ behaviours (Ullman, 1980Ullman S Against direct perception.Behav. Brain Sci. 1980; 3: 373-415Crossref Scopus (224) Google Scholar). Even when he avoided any mention of brain mechanisms and stuck to what Marr would call the algorithmic level, his proposals were often loose or unclear (see Box 2). Despite the criticisms, there has been continued interest in exploring the kind of ‘rule-based’ behavioural strategies that Gibson advocated. A recurring theme behind all these strategies is the idea that out of the myriad potential visual signals that could be derived from a moving retinal image, one or two aspects of the information are especially relevant for controlling a particular motor behaviour. Recently, for example, there has been interest in recording the head and eye movements of people carrying out real-world tasks, such as driving. It is clear from these studies how a sequence of simple visuomotor rules or sub-tasks could be linked together to achieve a higher-order goal. Take the task of making a cup of tea, for which Land et al., 1999Land M.F Mennie N Rusted J The roles of vision and eye movements in the control of activities of daily living.Perception. 1999; 28: 1311-1328Crossref PubMed Scopus (738) Google Scholar recorded the entire sequence of head and eye movements. The sub-tasks, such as bringing the hand towards the kettle lid, tend to be straightforward visually guided routines when examined individually. The complexity of behaviours may therefore evolve in two ways: (i) by increasing the range of different sensory parameters available for controlling the motor system, and (ii) by storing increasingly long sequences of sub-tasks that, when strung together, achieve higher-order goals. The approaches advocated by Marr and Gibson are not mutually exclusive. Advances in computer vision may help to bring the two together. Mathematical rigour and computational theory are brought to bear here on tasks that, increasingly, must be carried out in unpredictable, ‘natural’ environments. One current research theme that exemplifies that process of resolution is Bayesian inference. Bayesian inference is sometimes couched in fearsome mathematical terms, but the basic idea is both straightforward and highly relevant to understanding animal behaviour. The brain receives signals from afferent (sensory) fibres. On the basis of these, and of the information it has stored previously, the brain must generate a response (ultimately, a motor response). A reasonable model for this process is that one response is picked out of a list of possibilities by choosing the most appropriate in the organism's current context (for example, most probably rewarded or least probably punished). It is here that Bayes' formula is useful. Bayes pointed out that the probability of state S being the case (such as ‘there is a kettle over to my right’) given information I (here, the sensory information the brain receives) is directly proportional to two quantities that can, in principle, be estimated in advance, and hence, in the context of the brain, stored in memory. The first quantity is the ‘prior’ probability of state S occurring, P(S). This makes sense intuitively: if you are forced to guess what the current state of the world is and you have no evidence (or highly inconclusive evidence) at the moment, you should guess a likely rather than an unlikely state (these prior probabilities being determined on the basis of previous experience). For example, if the kettle was on your right the last time you looked, it is a reasonable assumption that the fuzzy grey shape on the periphery of your vision is (still) the kettle. The second quantity incorporates the actual data, I, and gives an indication of how conclusive it is. It is the probability of receiving evidence I given that the current state of the world really is S, normalised by the total probability of getting information I (summed over all possible states). Again, this makes sense intuitively. For example, fuzzy grey shapes are common in peripheral vision and do not always arise from kettles, so the evidence on its own is inconclusive. Fixating the kettle is a good way to improve the evidence. The higher resolution image is richer (and rarer) and more specific to kettles. In general, if sensory input I is both rare (P(I) is low) and also characteristic of state S (P(I|S) is high), then information I is good evidence that the world is in state S. Put more succinctly: Bayes' rule can be derived from an assertion that the probability of S and I is equal to that of I and S. If the two joint probabilities are expressed in terms of conditional events, this becomes P(S|I) P(I) = P(I|S) P(S), from which the expression for P(S|I) can be obtained. Many perceptual phenomena can be explained parsimoniously using a Bayesian approach (see Box 3 and a review by Knill and Richards, 1996Knill D Richards W Perception as Bayesian Inference. Cambridge University Press, 1996Crossref Google Scholar). Bayesian inference fits well with all of Marr's levels of description. It is a useful tool in describing a problem at the level of computational theory, making explicit what is to be computed and the constraints that are to be used to derive output from input. It can be the basis of specific, working models or algorithms and it can be implemented in a number of ways, such as in neural networks. Following Marr's notion of independence between levels, theories of neural architecture in the brain that might carry out this kind of inference can be developed to deal with the generic quantities P(S), P(I) and P(I|S), without reference to specific stimuli (reviewed by Barlow, 2001Barlow H.B Redundancy reduction revisited.Netw. Comput. Neural Syst. 2001; 12: 241-253Crossref PubMed Scopus (358) Google Scholar). At the same time, Bayesian approaches fit well into the evolutionary or ecological perspective that Gibson advocated. A simple organism, with a simple behavioural repertoire, needs only to divide information about the organism's state with respect to the world into a small number of categories. It can use its motor system to move between these categories (this is, after all, the only way it can know that a motor movement has been successful). A more complex behavioural repertoire requires a greater number of states to be discriminated reliably. This means that sensory systems must evolve to help an organism discriminate between the contexts in which it will generate different motor outputs. At various stages in a task, the sensory parameters that are most helpful in discriminating (and hence controlling) movements will be quite different (see section on Gibson). This leads to a view of the cortex as a pool from which evidence can be drawn. From moment to moment, the neurons with the most relevant information may be located in quite different parts of the cortex, according to the demands of the task. Proposals about how neural signals are combined to give rise to visual percepts include some that do not provide a description of the solution at the level of a computational theory: an example is the idea that synchronised oscillations in the firing rate of neurons in different parts of the brain could account for perceptual phenomena (Singer, 1998Singer W Consciousness and the structure of neuronal representations.Philos. Trans. R Soc. Lond. B. Biol. Sci. 1998; 353: 1829-1840Crossref PubMed Scopus (126) Google Scholar). As they stand, such theories have little explanatory power. A general weakness of these and many other theories is the failure to consider how representations could be built up over time. There is a tendency to assume that neural responses could somehow be combined to generate a vivid reconstruction of the scene in an instant. Not only is this a daunting prospect, but the computational problem is amplified as time is brought into the equation. Having reconstructed the world, heaven forbid that the observer now move their eyes or their head! That would entail a new reconstruction and a new problem of relating it to the one created a moment ago. More promising approaches focus on the small, discrete goals of one epoch (for example, a period of fixation) and how these could be combined, like the pieces of a jigsaw, into a richer representation (for example Ballard et al., 1997Ballard D.H Hayhoe M.M Pook P.K Rao R.P.N Deictic codes for the embodiment of cognition.Behav. Brain Sci. 1997; 20: 723-742PubMed Google Scholar, Land et al., 1999Land M.F Mennie N Rusted J The roles of vision and eye movements in the control of activities of daily living.Perception. 1999; 28: 1311-1328Crossref PubMed Scopus (738) Google Scholar, Rensink, 2002Rensink R.A Change detection.Annu. Rev. Psychol. 2002; 53: 245-277Crossref PubMed Scopus (779) Google Scholar). This view brings Marr and Gibson's ideas together in another way. Gibson emphasised the role of vision as a tool for action. One of the things that makes human vision special is our ability to carry out tasks involving long sequences of movements, each one simple if considered in isolation, to achieve our goals. The processes involved in building up a vivid, detailed visual representation are perhaps best seen as a by-product of that ability, taking time and being divisible into purposeful steps. But as Marr emphasised, whatever the processes turn out to be, emulating them computationally is the best way to understand them fully. Supported by the Royal Society. I am grateful for comments from Bruce Cumming, Tam Curnow, Miles Hansard and Anya Hurlbert." @default.
- W2089948480 created "2016-06-24" @default.
- W2089948480 creator A5056352811 @default.
- W2089948480 date "2002-10-01" @default.
- W2089948480 modified "2023-10-16" @default.
- W2089948480 title "Computational theories of vision" @default.
- W2089948480 cites W2003373333 @default.
- W2089948480 cites W2016711711 @default.
- W2089948480 cites W2029989595 @default.
- W2089948480 cites W2032621648 @default.
- W2089948480 cites W2064229486 @default.
- W2089948480 cites W2106321929 @default.
- W2089948480 cites W2119030642 @default.
- W2089948480 cites W2125838369 @default.
- W2089948480 cites W2130055919 @default.
- W2089948480 cites W2148596731 @default.
- W2089948480 cites W2154653719 @default.
- W2089948480 cites W4213285154 @default.
- W2089948480 doi "https://doi.org/10.1016/s0960-9822(02)01204-6" @default.
- W2089948480 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/12401182" @default.
- W2089948480 hasPublicationYear "2002" @default.
- W2089948480 type Work @default.
- W2089948480 sameAs 2089948480 @default.
- W2089948480 citedByCount "6" @default.
- W2089948480 countsByYear W20899484802015 @default.
- W2089948480 countsByYear W20899484802022 @default.
- W2089948480 crossrefType "journal-article" @default.
- W2089948480 hasAuthorship W2089948480A5056352811 @default.
- W2089948480 hasBestOaLocation W20899484801 @default.
- W2089948480 hasConcept C15744967 @default.
- W2089948480 hasConcept C188147891 @default.
- W2089948480 hasConcept C70721500 @default.
- W2089948480 hasConcept C78458016 @default.
- W2089948480 hasConcept C86803240 @default.
- W2089948480 hasConceptScore W2089948480C15744967 @default.
- W2089948480 hasConceptScore W2089948480C188147891 @default.
- W2089948480 hasConceptScore W2089948480C70721500 @default.
- W2089948480 hasConceptScore W2089948480C78458016 @default.
- W2089948480 hasConceptScore W2089948480C86803240 @default.
- W2089948480 hasIssue "20" @default.
- W2089948480 hasLocation W20899484801 @default.
- W2089948480 hasLocation W20899484802 @default.
- W2089948480 hasOpenAccess W2089948480 @default.
- W2089948480 hasPrimaryLocation W20899484801 @default.
- W2089948480 hasRelatedWork W1828955125 @default.
- W2089948480 hasRelatedWork W2034736453 @default.
- W2089948480 hasRelatedWork W2044499740 @default.
- W2089948480 hasRelatedWork W2061542922 @default.
- W2089948480 hasRelatedWork W2064901328 @default.
- W2089948480 hasRelatedWork W2096678084 @default.
- W2089948480 hasRelatedWork W2190176143 @default.
- W2089948480 hasRelatedWork W2319374022 @default.
- W2089948480 hasRelatedWork W3048727301 @default.
- W2089948480 hasRelatedWork W3193780050 @default.
- W2089948480 hasVolume "12" @default.
- W2089948480 isParatext "false" @default.
- W2089948480 isRetracted "false" @default.
- W2089948480 magId "2089948480" @default.
- W2089948480 workType "article" @default.