The complexity of the visual system allows species across classes to navigate and adapt effectively to challenges in their environment to facilitate their survival. One of the main modeling approaches to understanding the visual system suggests breaking the components of the system into modules whose inner mechanisms are thought to be easier to investigate and understand. This paradigm implicitly views the organization of the visual system like that of a “Swiss Army knife,” a toolkit of special purpose “instruments” or modules (Yufik, 2002). This paper aims to show how a more modern and dynamic “Swiss Army knife” describes as an elegant array of multipurpose tools with optimal functional design may be used as a good metaphor for the organization of complex visual system. I will explore how the principles of evolution highly constrained the design of each of the modules, just like the knife had to minimize the required space while being as useful as possible. Similarly, based on our understanding of the organization of the retina, the primary visual cortex, and the distribution of the optic nerve, this paper aims to show how the integration of each of the “visual modules” can successfully reproduce complex patterns of visual cognition. In other words, several modules can be used in strategic combination to achieve a task effectively.
One of the main innovations in the Swiss Army (SA) knife occurred in 1896, when the inventor Karl Elsener was able to put blades on each side of the handle after five years of work. This great innovation allowed the knife to have twice as many features (see “The SA Knife”). As time passed, the highly constrained handle slowly began to accommodate even more gadgets, making what we know today as the famous SA Knife. Interestingly, the evolution of the visual system follows similar design limitations, starting from the retina and through higher computational modules. As a general principle, evolutionary selection constrained cell volume and maximizes the number or tasks that the brain modules could complete. For instance, consider the retina, which is highly constrained by metabolic rate, and cellular volume (Sterling, 2004). However, regardless of this limitation, it must still be highly efficient in extracting fine detail under low light and poor contrast conditions with high accuracy and speed. Furthermore, the organization of the retina had to account for differences during day and night, which would require more circuitry due to the differences in cognitive processing (Sterling, 2004). One line of evidence of the space constraint comes from organization of the retina in animals living in environments with a broad spectrum of light intensities. Two ypes of photoreceptors with different levels of sensitivity, namely rods and cones, were included in the knife to be able to have both day and night vision. Such modules had to reach a compromise that would maximize survival under both conditions. The evolutionary solution was quite elegant: the cones densely occupy the fovea to enhance spatial resolution during the day, while the rods in the periphery outnumber the cones to maximize photon sensitivity at night. Further evidence that there is a high demand for space comes from the suboptimal small size of the “outer segments” of photoreceptors which may reduce efficiency of photon capture but increases their processing speed, which may would play an important role in survival (Sterling, 2004).
Another aspect in the organization of the retina that follows the SA knife model is the various “retinal modules” successfully reproduce the relevant information of stimuli once they are integrated. For instance, by having three types of cones, each which maximally responds to one type of wavelength, one can reproduce relevant color by using only three types of receptors. Thus, each blade has a particular function in vision, but once they are used in conjunction, we perceive a much broader spectrum of colors. Similarly, ganglion cells at the next processing level of the retina follow a similar design. Each of their receptive field aggregates input from photoreceptors and reduces irrelevant information using center-surround fields and lateral inhibition. This information can then be interpreted at higher levels of processing to form a cohesive visual cognition without any significant loss. In contrast to Yu’s postulate (2007), the different cone cells do serve a unique purpose, but what makes them so powerful is their dynamism in combining the input and using it as a basis function that matches the relevant information from the environment. From this, it follows that the design of the retina follows a SA approach that maximized efficiency while minimizing space, while still maintaining a complete toolkit to capture the essential visual information for higher processing levels.
Next, information from the retina is distributed through the optic nerve to several subcortical structures. Most of the terminations of the nerve have been associated with well-defined functions. For instance, three such projections include the suprachiasmatic nucleus (SCN) —the body’s circadian clock —the accessory optic system— involved in control of eye movement— and the pretectum, involved in visual reflexes (Swanson, 2003). One of the projections of the nerve goes to the motor secondary visual cortices through the superior colliculus. Neurons in this structure are activated based on the location corresponding to the stimulus (Stein, 1998). This shows another elegant evolutionary solution to the space constraint: internal representation of sensory space that is based on events rather than modalities, since cues from the same event are likely to originate from the same location (Stein, 1998). The superior colliculus also shows that visual system modules must work cooperatively for vision to occur, just like a SA knife must have a complete set of blades to be fully functional, with the blades intricately connected by springs. This process involves a complex and dynamic balance: even though it is possible that a combination of knifes can seem to substitute a lost gadget, this is not necessarily always the case, which directly contrasts Yu’s point that all processing is interchangeable. For instance, when a module related with the superior colliculus (SC), the Ectosylvian Sulcus (AES), is temporarily deactivated, SC neurons can no longer integrate the individual inputs to produce an enhanced response (Stein, 1998). A final projection of the optic nerve goes to the LGN in the thalamus, where visual information is layered to represent the left and right visual fields, instead of the source of the information, which shows that the modules in the visual system are not necessarily interested with the source of the information per se, but more in the actual information. Finally, we can expect the evolution of the optic nerves to have followed a design to save the amount of fibers or “wires”. The benefit of such an approach is evident: starting from the retina, where a larger optic nerve will result in a larger blind spot, and less photons being detected. Following a SA knife approach does not imply that each part of the brain must be connected to the signal as suggested by Yu (2007). Information in the optic nerve has already passed through the “retinal gadget” and one would expect that no computation is repeated in the visual system. For instance, when opening a present, one may open the gift wrap with a scissor, and then use a blade to open the plastic, without the blade having to repeat the process accomplished by the scissors. In contrast to what was claimed by Yu, the SA does not necessarily mean that the process must be slow, since modules can be used potentially used in parallel. Similarly, Hebbian-like rules (“fire-together , wire-together”) can be derived. If whenever you take out the cork remover, you usually follow by taking out the midsized knife, we could expect some more sophisticated spring mechanism that would allow the midsize knife to come out more easily after using the cork remover. Thus, it follows that the distribution of the optic nerve and related nuclei exhibit SA design in modularity as well as space optimization.
Information from the LGN is then sent to the primary visual cortex, whose organization continues to show SA knife design. One of the modules of the visual cortex could be the ocular dominance columns, which Hubel (1981) describes as a “machine for combining input from the two eyes” possibly for stereopsis. Another area that can be potentially referred to as modular involves the receptive fields. These were originally described as responding maximally to stimuli in a certain location and orientation and even length (Hubel, 1981). However, the flexibility of the SA knife allows such modules can be defined also in terms of processing streams or “computational strategies” (DeYoe, 1988) instead of actual structures. Another example of this design model comes from cross-modality plasticity. It has been observed that loss of vision does not lead to inactivation of V1, but rather, a reorganization of cortical functions. For instance, this was the case in blind individuals, who used this region while engaging in a Braille reading task (Burton, 2003). The primary visual cortex was not excluded from the space constraints involved in other regions. As Bhole (1981) states “in one or two square mm there seems to exist all the machinery necessary to look after everything the visual cortex is responsible for.” Yu’s argument (2007) that the variable size receptive fields (RF), and their plasticity by definition imply that the brain cannot follow SA design does not directly follow logically. Each of the blades of the knife can be quite dynamic, for instance, by adapting the deepness of a cut. Thus, we can expect that the modules in V1 would adapt to larger RF whenever needed, or even getting input outside from their traditional RF. Furthermore, even if the highest spatial grain has been reached in V1 (Yu, 2007) this does not mean that computation is completed at that level, since much of vision involves integration of temporal information as well.
Until now, the “SA Knife” model of the visual system may have seemed static, with each module having an independent and well-defined function, with little interaction among modules. However, the visual system is constantly adapting to new challenges in our environment, is highly dynamic: for instance, the visual cortex is still undergoes experience-dependent synaptogenesis throughout adulthood (Gilbert, 1996). This interaction among modules is so essential, that Yufik (2002) believes that the integration of initially-separate modules was “the turning point of human evolution” due to the much higher adaptative efficiency. For instance, consider that in V1 the strength and density of connections are highly dynamic, as well as the size of receptive fields (Gilbert, 1996). As argued by Yu, it is possible that perception of basic attributes may not be explained using simple one-to-one hierarchical pathways. However, the level of dynamism in the visual system may even require a more dynamic and loose definition of modules. For instance, this it may be the case that the modules are best represented by computational strategies instead of actual physical structures as proposed by DeYoe & Van Essen (1988). Finally, the modules may even have a very distinct response based on situation. For instance, Vallar (1998) suggested that the relative contribution from each hemisphere may highly depend on the type of representation involved.
Even after extensive scientific investigation, Olshausen & Field (2005) believe that we understand less than 20% of V1 under normal circumstances. However, what we do know is that the visual system is constantly adapting to new challenges in our environment with high adaptability and dynamism. The SA knife is a good metaphor to explain the organization of the entire visual system— from photon reception to complex object recognition— by the guiding principles of constrained design and dynamic interaction of the gadgets. Newer Swiss Army knives are being modernized include gadgets such as a USB memory. Similarly, the visual system continues to adapt to newer and more complex patterns of visual information to facilitate our survival and evolutionary fitness by using complex and well designed modules for visual cognition.