Leanpub: Publish Early, Publish Often

4. People are Complicated!

Complexity is your enemy. Any fool can make something complicated. It is hard to make something simple.

– Richard Branson

The key aspect of user experience is its focus on making a system more humane. This means tailoring the system to the user, and not expecting the user to happily conform to the system. We can only achieve this by understanding how humans sense and perceive the outside environment (the system or interface), process and store this knowledge. Making use of it either directly or in the future, and based on past and current knowledge exert control over an external environment (again, in this case, the system or interface).

4.1 UX is Everywhere

User experiences occur in many contexts and over many domains. This variety sometimes makes it difficult to ‘pigeon hole’ UX as one specific thing - as we have discussed - UX is the broad term used to describe the experience of humans with computers both at the interface and system level.

Indeed, the ACM and the IEEE Curriculum ‘Subjects Coverage’ - lists Human Factors as occurring in the following domains:

AL/Algorithmic Strategies
NC/Networked Applications
HC/Foundations
HC/Building GUI Interfaces
HC/User Centred Software Evaluation
HC/User Centred Software Development
HC/GUI Design
HC/GUI Programming
HC/Multimedia And Multimodal Systems
HC/Collaboration And Communication
HC/Interaction Design For New Environments
GV/Virtual Reality
IM/Hypermedia
IM/Multimedia Systems
IS/Fundamental Issues
NC/Multimedia Technologies
SE/Software Verification Validation
CN/Modelling And Simulation
SP/Social Context
SP/Analytical Tools
SP/Professional Ethics
SP/Philosophical Frameworks

Subject Key: Algorithms and Complexity (AL); Net-Centric Computing (NC); Human–Computer Interaction (HC); Graphics and Visual Computing (GV); Intelligent Systems (IS); Information Management (IM); Software Engineering (SE); Computational Science (CN); Social and Professional Issues (SP).

**Figure: Microsoft Active Tiles.** Microsoft Active Tiles, More Humane? ---Image Credit: Microsoft. — **Figure: Microsoft Active Tiles.** Microsoft Active Tiles, More Humane? —Image Credit: Microsoft.

However, more broadly this is often referred to as human factors that suggest an additional non-computational aspect within the environment; affecting the system or interaction indirectly. This leads on to the discipline of ergonomics, also known as human factors, defined as … ‘the scientific discipline concerned with the understanding of interactions among humans and other elements of a system, and the profession that applies theory, principles, data and methods to design in order to optimise human well-being and overall system performance.’ At an engineering level, the domain may be known as user experience engineering.

However, in all cases, the key aspect of Human Factors and its application as user experience engineering is the focus on making a system more humane (see Figure: Microsoft Active Tiles). There is no better understanding of this than that possessed by ‘Jef Raskin’, creator of the Macintosh GUI for Apple Computer; inventor of SwyftWare via the Canon Cat; and author of ‘The Humane Interface’ [Raskin, 2000]:

“Humans are variously skilled and part of assuring the accessibility of technology consists of seeing that an individual’s skills match up well with the requirements for operating the technology. There are two components to this; training the human to accommodate the needs of the technology and designing the technology to meet the needs of the human. The better we do the latter, the less we need of the former. One of the non-trivial tasks given to a designer of human–machine interfaces is to minimise the need for training.

Because computer-based technology is relatively new, we have concentrated primarily on the learnability aspects of interface design, but the efficiency of use once learning has occurred and automaticity achieved has not received its due attention. Also, we have focused largely on the ergonomic problems of users, sometimes not asking if the software is causing ‘Cognetic’ problems. In the area of accessibility, efficiency and Cognetics can be of primary concern.

For example, users who must operate a keyboard with a pointer held in their mouths benefit from specially designed keyboards and well-shaped pointers. However well-made the pointer, however, refined the keyboard layout, and, however, comfortable the physical environment we have made for this user, if the software requires more keystrokes than necessary, we are not delivering an optimal interface for that user. When we study interface design, we usually think regarding accommodating higher mental activities, the human capabilities of conscious thought and ratiocination.

Working with these areas of thought bring us to questions of culture and learning and the problems of localising and customising interface designs. These efforts are essential, but it is almost paradoxical that most interface designs fail first to assure that the interfaces are compatible with the universal traits of the human nervous system, in particular, those traits that are sub-cortical and that we share with other animals. These characteristics are independent of culture and learning, and often are unaffected by disabilities.

Most interfaces, whether designed to accommodate accessibility issues or not, fail to satisfy the more general and lower-level needs of the human nervous system. In the future, designers should make sure that an interface satisfies the universal properties of the human brain as a first step to assuring usability at cognitive levels.”

In the real world, this means that the UX specialist can be employed in many different jobs and many different guises; being involved in the collection of data to form an understanding of how the system or interface should work. Regardless of the focus of the individual project, the UX specialist is concerned primarily with people and the way in which they interact with the computational system.

**Figure: PET studies of glucose metabolism.**. PET studies of glucose metabolism to map human brain's response in performing different tasks. ---Image Credit: M. E. Phelps, PNAS, 2000, 97, 9226. — **Figure: PET studies of glucose metabolism.**. PET studies of glucose metabolism to map human brain’s response in performing different tasks. —Image Credit: M. E. Phelps, PNAS, 2000, 97, 9226.

4.2 People & Computers

Understanding how humans take in information, understand and learn from this information, and use it to guide their control of the outside world, is key to understanding their experiences [Weinschenk, 2011] (see PET scans in Figure: PET studies of glucose metabolism). Here we are not concerned directly with the anatomical, physiological, or psychological aspects of these processes (there are many in-depth treatise that address these areas), but instead, look at them at the point where user meets software or device. In this case we can think, simplistically, that humans sense and perceive the outside environment (the system or interface), process and store this knowledge making use of it either directly or in the future. Moreover, based on both past and current knowledge exert control over an external environment (again, in this case, the system or interface) [Bear et al., 2007].

4.2.1 Perceiving Sensory Information

Receiving sensory information can occur along some different channels. These channels relate to the acts of: seeing (visual channel), hearing (auditory channel), smelling (olfactory channel), and touching (somatic/haptic channel). This means that each of these channels could, in effect, be used to transmit information regarding the state of the system or interface. It is important then, to understand a few basic principles of these possible pathways for the transmission of information from the computer to the human.

**Figure: Basketball.**. Basketball Awareness Test by Daniel Simons and Christopher Chabris. ---Image Credit: 1999 Daniel J. Simons; http://www.theinvisiblegorilla.com. — **Figure: Basketball.**. Basketball Awareness Test by Daniel Simons and Christopher Chabris. —Image Credit: 1999 Daniel J. Simons; http://www.theinvisiblegorilla.com.

However, before we begin, remember back to our previous discussion – and concerning ‘Figure: Basketball’ – examining Daniel Simons and Christopher Chabris’ experiments which required you to count passes of a Basketball. Well, this phenomenon is just one example of the complexities involved in understanding users, their perception, and their attention. The phenomenon exhibited is known as ‘Inattention blindness’, ‘attention blindness’, or ‘perception blindness’ and relates to our expectations, perception, and locus of attention. Simply, inattention blindness describes our ability to notice something that is in plain view and has been seen. This normally occurs when we are not expecting the stimulus to occur – why would a Gorilla be moonwalking in a basketball match? If we do not think it should be there, our brains compensate for the perceptual input by not cognitively registering that stimuli. However, there are plenty of other tests that also show this phenomenon. Moreover, there are some different explanations for the why it occurs. As you’ve just read, I prefer the ‘Expectation’ explanation, but others suggest: conspicuity, stimuli may be inconspicuous and therefore not seen as important. Mental workload, we may be too busy focused on another task to notice a stimuli, or capacity this is really the locus of attention explanation; as we shall see later in on. In reality, I expect that there is some combination of each explanation at play, but the point is that the phenomenon itself is not what we would expect, and we don’t exactly know why or how it occurs. Keep this level of uncertainty in mind as you read more and as you start you work in the UX domain.

4.2.1.1 Visual Interaction

Visual interaction design is determined by the arrangement of elements or details on the interface that can either facilitate or impede a user through the available resources. This is because the amount of information within the interface that reaches our eyes is far greater than our brain can process, and it is for this reason that the visual channel and, therefore, visual attention is key to good design. Selective visual attention is a complex action composed of conscious and subconscious processes in the brain that are used to find and focus on relevant information quickly and efficiently. There are two general visual attention processes, bottom-up and top-down, which determine where humans next locate their attention. Bottom-up models of visual attention suggest that low-level salient features, such as contrast, size, shape, colour, and brightness correlate well with visual interest. For example, a red apple (a source of nutrition) is more visually salient, and, therefore, attractive, than the green leaves surrounding it. Top-down models, on the other hand, explain visual search driven by semantics, or knowledge about the environment: when asked to describe the emotion of a person in a picture, for instance, people will automatically look to the person’s face.

Both forms of visual attention inevitably play a part in helping people to orientate, navigate and understand the interface. Bottom-up processing allows people to quickly detect items, such as bold text and images, which help to explain how the interface is organised. It also helps people to group the information into ‘sections’, such as blocks of text, headings, and menus. Top-down processing enables people to interpret the information using prior knowledge and heuristics. For example, people may look for menus at the top and sides of the interface, and main interface components in the middle.

It can be seen from eye tracking studies that users focus attention sequentially on different parts of the interface and that computational models have been successfully employed in computer graphics to segment images into regions that the user is most likely to focus upon. These models are based upon a knowledge of human visual behaviour and an understanding of the image in question. Indeed, studies exist which record user’s eye movements during specific interactive tasks, to find out those features within and interface design that were visited, in which order and where ‘gaze hotspots’ were found. In these data, we can see an association between the interface components and eye-gaze, but not one as simple as ‘the user looked at the most visually obvious interface features. Indeed, sometimes large-bold-text next to an attention grabbing feature, such as an image, was fixated upon. However, we can deduce that, for instance, the information in some text may not itself draw a user’s attention but ideally has some feature nearby which do. This idea is supported by other studies that try to create interface design metrics that predict whether an interface is visually complex or not. These studies relate to interface design with complexity explaining that the way an interaction is perceived depends on the way the interface itself is designed and what components are used.

Through an understanding of the visual channel we can see that there are levels of granularity associated with visual design. This granularity allows us to understand a large visual rendering by segmentation into smaller more manageable pieces or components. Our attention moves between these components based on how our visual attention relates to them, and it is this knowledge of how the visual channel works, captured as design best practice, which allows designers to build interfaces that users find interesting and easy to access. The visual narrative the designer builds for the observer is implicitly created by the visual appearance (and, therefore, attraction) of each visual component. It is the observers focus, or as we shall see, the locus of attention, which enables this visual narrative to be told.

4.2.1.2 The Auditory Channel

The Auditory channel is the second most used channel for information input, and as highly interrelated with the visual channel, however, auditory input has some often overlooked advantages. For instance, reactions to auditory stimuli have been shown to be faster, in some cases, than reactions to visual stimuli. Secondly, the use of auditory information can go some way towards reducing the amount of visual information presented on screen. In some regard, this reduces possible information overload from both interface components and aspects of the interaction currently being performed. The reduction in visual requirements means that attention can be freed to stimuli that are best handled visually, and means that auditory stimulation can be tailored to occur within the part of the interaction to which it is best suited. Indeed, the auditory channel is, often, under-used in relation to standard interface components. In some cases, this is due to the need for an absence of noise pollution within the environment. We can see that consistent sound or intermittent sound can often be frustrating and distracting to computer users within the same general environment; this is not the case with visual stimuli that can be specifically targeted to the principal user. Finally, and in some cases most importantly, sound can move the users attention, and focus it to a specific spatial location. Due to the nature of the human auditory processing system, studies have found that using different frequencies similar to white noise are most effective in attracting attention. This is because human speech and communication occur using multiple frequencies whereas single frequencies with transformations such as sirens or audible tones may jar the senses but are more difficult to locate spatially.

Obviously, sound can be used for many different and complementary aspects of the user interface. However, the HCI specialist must consider the nature of the sound and its purpose. In addition, interfaces that rely only on sound, with no additional visual cues, may mean the interface becomes inflexible in environments where the visual display is either obscured or not present at all. Sound can be used to transmit speech and non-speech. Speech would normally be via text to speech synthesis systems. In this way, the spoken word can be flexible, and the language of the text can be changed to facilitate internationalisation. One of the more interesting aspects of non-speech auditory input is that of auditory icons and ‘Earcons’. Many systems currently have some kind of auditory icon, for instance, the deletion of files from a specific directory is often accompanied by sound used to convey the deletion, such as paper being scrunched. Earcons are slightly more complicated as they involve the transmission of non-verbal audio messages to provide information to the user regarding some type of computer object, operation, or interaction. Earcons, use a more traditional musical approach than auditory icons and are often constructed from individual short rhythmic sequences that are combined in different ways. In this case, the auditory component must be learned as there is no intuitive link between the Earcon and what it represents.

4.2.1.3 Somatic

The term ‘Somatic’ covers all types of physical contact experienced in an environment, whether it be feeling the texture of a surface or the impact of a force. Haptics describe how the user experiences force or cutaneous feedback when they touch objects or other users. Haptic interaction can serve as both a means of providing input and receiving output and has been defined as ‘The sensibility of the individual to the world adjacent to his body by use of his body’. Haptics are driven by tactile stimuli created by a diverse sensory system comprising receptors covering the skin, and processing centres in the brain, to produce the sensory modalities such as touch and temperature. When a sensory neuron is triggered by a specific stimulus, a neuron passes to an area in the brain that allows the processed stimulus to be felt at the correct location. It can, therefore, be seen, that the use of the haptic channel for both control and feedback can be important especially as an aid to other sensory input or output. Indeed, haptics and tactility have the advantage of making interaction seem realer, which accounts for the high use of haptic devices in virtual and immersive environments.

4.2.1.4 The Olfactory System

The Olfactory system enables us to perceive smells. It may, therefore, come as a surprise to many studying UX that there has been some work investigating the use smell as a form of sensory input directly related to interaction (notably from [Brewster et al., 2006]). While this is a much under-investigated area, certain kinds of interface can use smell to convey an extra supporting component to the general, and better understood, stimuli of sound and vision. One of the major benefits of smell is that it has a close link with memory, in this case, smell can be used to assist the user in finding locations that they have already visited, or indeed recognise a location they have already been to within the interactive environment. Indeed, it seems that smell and taste [Narumi et al., 2011] is particularly effective when associated with image recognition, and can exist in the form of an abstract smell, or a smell that represents a particular cue. It is, however, unlikely, as a UX’er, that smell will be used in any large way in the systems you are designing, although it may be useful to keep smell in mind if you are having particular problems with users forgetting aspects of previously learnt interaction.

4.2.2 Thinking and Learning

The next part of the cycle is how we process, retain, and reuse the information that is being transmitted via the sensory channels [Ashcraft, 1998]. In this case, we can see that there are aspects of attention, memory and learning, that support the exploration and navigation of the interface and the components within that interface. In reality, these aspects interrelate and affect each other in more complex ways than we, as yet, understand.

Attention or the locus of attention is simply the place, the site, or the area at which a user is currently focused. This may be in the domain of unconscious or conscious thought, it may be in the auditory domain when listening to speech and conversation, or it may be in the visual domain while studying a painting. However, this single area of attention is the key to understanding how our comprehension is serialised and the effect that interface components and their granularity have when we perceive sensory concepts within a resource; creating a narrative, either consciously or unconsciously.

Cognitive psychologists already understand that the execution of simultaneous tasks is difficult, especially if these are conscious tasks. Repeating tasks at a high-frequency mean that they become automatic, and this automaticity means that they move from a conscious action to an unconscious action. However, the locus of attention for that person is not on the automatic task but the task that is conscious and `running’ in parallel. Despite scattered results suggesting (for example) that, under some circumstances, humans can have two simultaneous loci of spatial attention, the actual measured task performance of human beings on computers suggests that, in the context of interaction, there is one single locus of attention. Therefore, we can understand that for our purposes there is only one single locus of attention, and it is this attention that moves over interactive resources based on the presentation of that resource, the task at hand, and the experience of the user.

While the locus of attention can be applied to conscious and unconscious tasks, it is directed by perception fed from the senses. This is important for interface cognition and the components of the interface under observation. As we shall touch-on later, the locus of attention can be expressed as fixation data from eye tracking work and this locus of attention is serialisable. The main problem with this view is that the locus of attention is specific to the user, the task at hand, as well as the visual resource. In this case, it is difficult to predict how the locus of attention will manifest for every user and every task, and, therefore, the order in which the interface components will be serialised. We can see how the locus of attention is implicitly dealt with by current technologies and techniques and, by comparison to gaze data and fixation times, better understand how the serialisation of visual components can be based on the ‘mean time to fixation’.

It would be comforting to believe that we had a well-defined understanding of how memory and learning work in practice. While we do have some idea, most of our knowledge regarding memory and learning and how these fit into the general neurology of the brain is currently lacking. This said we do have some understanding of the kinds of aspects of memory and learning that affect human-computer interaction (see Figure: Participants with Minimal Online Experience). In general, sensory memories move via attention into short-term or working memory, the more we revisit the short-term memory, the more we are likely to be able to move this working memory into long-term memory.

**Figure: Participants with Minimal Online Experience**. Participants with minimal online experience displayed brain activation patterns very similar to those seen in the group of savvy Internet users - after just a brief period of time. ---Image Credit: Faith Brynie, Psychology Today (Brain Sense). — **Figure: Participants with Minimal Online Experience**. Participants with minimal online experience displayed brain activation patterns very similar to those seen in the group of savvy Internet users - after just a brief period of time. —Image Credit: Faith Brynie, Psychology Today (Brain Sense).

In reality, areas of the brain are associated with the sensory inputs: iconic, for visual stimuli; echoic, for auditory stimuli; and haptic for tactile stimuli. However, visual stimuli prove to be the most supported of the senses, from a neurological perspective with over fifty areas of the brain devoted to processing vision, and only one devoted to hearing. In general, then, the UX specialist should understand that we learn and commit to memory by repetition. This repetition may be via rehearsal, it may be via foreknowledge, or it may be via an understanding of something that we have done before, in all cases however repetition is key. Indeed, this repetition is closely linked to the ability to learn and discover an interface, in the first person, as opposed to second hand via the help system. This makes the ability for users to self-explore the interface and interactive aspects of the system, paramount. Many users whom I talk with often state that they explore incrementally and don’t refer much to the manuals. With this in mind, interfaces and systems should be designed to be as familiar as possible, dovetailing into the processes of the specific user as opposed to some generalised concept of the person, and should support aspects of repeatability and familiarity, specifically for this purpose.

Exploration can be thought of like the whole experience of moving from one interface component to another, regardless of whether the destination is known at the start of navigation or if the traversal is initially aimless. Movement and exploration of the interface also involve orientation, interface design, purpose, and mobility. The latter defined as the ability to move freely, easily and confidently around the interface. In this context, a successful exploration, of the interactive elements, is one in which the desired location is easily reached or discovered. Conventionally, exploration can be separated into two aspects: Those of Navigation and Orientation. Orientation can be thought of as knowledge of the basic spatial relationships between components within the interface. It is used as a term to suggest a comprehension of an interactive environment or components that relate to exploration within the environment. How a person is oriented is crucial to successful interaction. Information about position, direction, desired location, and route are all bound up with the concept of orientation. Navigation, in contrast, suggests an ability of movement within the local environment. This navigation can be either by the use of pre-planning using help or previous knowledge or by navigating ‘on-the-fly’ and as such a knowledge of immediate components and barriers are required. Navigation and orientation are key within the UX domain because they enable the user to easily understand, and interact with, the interface. However, in some ways, more importantly, they enable the interface to be learnt by a self-teaching process. Facilitating exploration through the use of an easily navigable interface means that users will intuitively understand the actions that are required of them and that the need for help documentation is reduced.

4.2.3 Explicit and Implicit Communication

Information can be transmitted in a number of different ways, and these ways may be either implicit (covert) or explicit (overt). In this case let us refer to information transmission as communication; in that information is communicated to the user and the user requirements are then communicated back to the computer via the input and control mechanisms. Explicit communication is often well understood and centres on the visual, or auditory, transmission of both text and images (or sounds) for consumption by the user. Implicit communications are, however, a little more difficult to define. In this case, I refer to those aspects of the visual or auditory user experience that are in some ways intangible. By which I mean aspects such as aesthetic or emotional responses [Pelachaud, 2012] to aspects of the communication.

Explicit Communication. Communication and complexity are closely related whereby complexity directly affects the ease of communication between the user and the interface or system. Communication mainly occurs via the visual layout of the interface and the textual labels that must be read by the user to understand the interactive communication with them. People read text by using jerky eye movements (called ‘Saccades’) which then stop and fixate on a keyword for around 250 milliseconds. These fixations vary and last longer for more complex text, and are focussed on forward fixations with regressive (backward) fixations only occurring 10-15 percent of the time when reading becomes more difficult. People, reading at speed by scanning for just appropriate information, tend to fixate less often and for a shorter time. However, they can only remember the gist of the information they have read; and are not able to give a comprehensive discourse on the information encountered (see Figure: Visualising Words). This means that comprehensive descriptions of interfaces for users, when quickly scanning an interactive feature, are not used in the decision-making process of the user. Indeed, cognitive overload is a critical problem when navigating large interactive resources. This overload is increased if the interaction is non-linear and may switch context unexpectedly. Preview through summaries is key to improving the cognition of users within complex interfaces, but complex prompts can also overload the reader with extraneous information.

**Figure: Visualising Words**. Visualising Words. ---Image Credit: Marcus E. Raichle, Department of Radiology, Washington University School of Medicine, St. Louis, Missouri. — **Figure: Visualising Words**. Visualising Words. —Image Credit: Marcus E. Raichle, Department of Radiology, Washington University School of Medicine, St. Louis, Missouri.

In addition to the problems of providing too much textual information the complexities of that information must also be assessed. The Flesch/Flesch–Kincaid Readability Tests are designed to indicate comprehension difficulty when reading a passage of contemporary academic English. Invented by Flesch and enhanced by Kincaid, the Flesch–Kincaid Grade Level Test¹ analyses and rates text on a U.S. grade–school level based on the average number of syllables per word and words per sentence. For example, a score of 8.0 means that an eighth grader would understand the text. Keeping this in mind, the HCI specialist should try to make the interface labels and prompts as understandable as possible. However this may also mean the use of jargon is acceptable in situations where a specialist tool is required, or when a tool is being developed for a specialist audience.

Implicit Communication. Aesthetics is commonly defined as the appreciation of the beautiful or pleasing, but this terminology is still controversial. Visual aesthetics is ‘the science of how things are known via the senses’ which refers to user perception and cognition. Specifically, the phrase ‘aesthetically pleasing’ interfaces are commonly used to describe interfaces that are perceived by users as clean, clear, organised, beautiful and interesting. These terms are just a sample of the many different terms used to describe aesthetics, which are commonly used in interaction design. Human-computer interaction work mostly emphasises performance criteria, such as time to learn, error rate and time to complete a task, and pays less attention to aesthetics. However, UX work has tried to expand this narrow (‘Reductionist’) view of experience, by trying to understand how aesthetics and emotion can affect a viewer’s perception - but the relationship between the aesthetic presentation of an interface and the user’s interaction is still not well understood.

The latest scientific findings indicate that emotions play an essential role in decision making, perception, learning, and more–that is, they influence the very mechanisms of rational thinking. Not only too much, but too little emotion can impair decision making [Picard, 1997]. This is now referred to as ‘Affective Computing’ in which biometric instruments such as Galvanic Skin Response (GSR), Gaze Analysis and Heart Rate Monitoring are used to determine a user’s emotional state via their physiological changes. Further, affective computing also seeks ways to change those states or exhibit control over technology, based upon them².

4.3 Input and Control

The final piece of the puzzle, when discussing UX, is the ability to enter information for the computer to work with. In general, this is usually accomplished by use of the GUI working in combination with the keyboard and the mouse. However, this is not the entire story, and indeed there are many kinds of entry devices which can be used for information input, selection, and target acquisition [Brand, 1988]. In some cases, specialist devices, such as head operated mice have been used, in others gaze and blink detection, however, in most cases, the mouse and keyboard will be the de-facto combination. However, as a UX engineer you should also be aware of the different types of devices that are available and their relationship to each other. In this way, you will be able to make accurate and suitable choices when specifying non-standard systems.

Text Entry via the keyboard is the predominant means of data input for standard computer systems. These keyboards range from the standard QWERTY keyboard, which is inefficient and encourages fatigue, through the Dvorak Simplified Keyboard, ergonomically designed using character frequency, to the Maltron keyboard designed to alleviate carpal tunnel syndrome and repetitive strain injury. There are however other forms of text entry. These range from the T9 keyboard found on most mobile phones and small devices with only nine keys. Through to soft keyboards, found as part of touch screen systems, or as assistive technologies for users with motor impairment, where a physical keyboard cannot be controlled. You may encounter many such methods of text entry, or indeed be required to design some yourself. If the latter is the case, you should make sure that your designs are base on an understanding of the physiological and cognitive systems at work.

The Written Word, cursive script, was seen for a long period as the most natural way of entering text into a computer system; because it relied on the use of a skill already learnt. The main bar to text entry via the written word was that handwriting recognition was computationally intensive and, therefore, underdeveloped. One of the first handwriting recognition systems to gain popularity was that employed by the Apple Newton, which could understand written English characters, however, accuracy was a problem. Alternatives to handwriting recognition arrived in the form of pseudo-handwriting, such as the Graffiti system that was essentially a single stroke shorthand handwriting system. Current increasing computational power and complexity of the algorithms used to understand handwriting have led to a resurgence in its use in the form of systems such as Inkwell. Indeed, with the advent of systems such as the Logitech io2 Digital Writing System, handwriting recognition has become a computer-paper hybrid.

Pointing Devices, for drawing and target acquisition is handled, in most cases, by the ubiquitous mouse. Indeed, this is the most effective way of acquiring a target; with the exclusion of the touch-screen. There are however other systems that may be more suitable in certain environments. These may include the conventional trackpad, the trackball, or the joystick. Indeed, variations on the joystick have been created for mobile devices that behave similarly to their desktop counterparts but have a more limited mobility. Likewise, the trackball also has a miniaturised version present on the Blackberry range of mobile PDAs, but its adoption into widespread mobile use has been limited. Finally, the touch screen is seeing a resurgence in the tablet computing sector where the computational device is conventionally held close enough for easy touch screen activation, either by the operators finger or a stylus.

**Figure: Phantom Desktop Haptic Device**. Phantom Desktop Haptic Device. ---Image Credit: Phantom. — **Figure: Phantom Desktop Haptic Device**. Phantom Desktop Haptic Device. —Image Credit: Phantom.

Haptic Interaction is not widely used beyond the realm of immersive or collaborative environments. In order to interact haptically with a virtual world, the user holds the ‘end-effector’ of the device. Currently, haptic devices support only point contact, so touching a virtual object with the end-effector is like touching it using a pen or stick. As the user moves the end-effector around, its co-ordinates are sent to the virtual environment simulator (for example see Figure: Phantom Desktop Haptic Device). If the end-effector collides with a haptically-enabled object, the device will render forces to simulate the object’s surface. Impedance control means that a force is applied only when required: if the end-effector penetrates the surface of an object, for example, forces will be generated to push the end-effector out, to create the impression that it is touching the surface of the object. In admittance control, the displacement of the end-effector is calculated according to the virtual medium through which it is moving, and the forces exerted on it by the user. Moving through air requires a very high control gain from force input to displacement output, whereas touching a hard surface requires a very low control gain. Admittance control requires far more intensive processing than impedance control but makes it much easier to simulate surfaces with low friction, or objects with mass.

Speech Input is increasing in popularity due to its ability to handle naturally spoken phrases with an acceptable level of accuracy. There are however only a very few speech recognition engines that have gained enough maturity to be effectively used. The most popular of these is the Dragon speech engine, used in Dragon NaturallySpeaking and Mac Dictate software applications. Speech input can be very effective when used in constrained environments or devices, or when other forms of input are slow or inappropriate. For instance text input by an unskilled typist can be time-consuming whereas general dictation can be reasonably fast; and with the introduction of applicators such as Apple’s ‘SIRI’, is become increasingly widespread.

Touch Interfaces have become more prevalent in recent times. These interfaces originally began with single touch graphic pads (such as those built commercially by Wacom) as well as standard touch-screens that removed the need for a conventional pointing device. The touch interface has become progressively more accepted especially for handheld and mobile devices where the distance from the user to the screen is much smaller than for desktop systems. Further, touch-pads that mimic a standard pointing device, but within a smaller footprint, became commonplace in laptop computer systems; but up until the late 2008’s all these systems were only substitutes for a standard mouse pointing device. The real advance came with the introduction of gestural touch and multi-touch systems that enabled the control of mobile devices to become much more intuitive/familiar; with the use of swipes and stretches of the document or application within the viewport. These multi-touch gestural systems have now become standard in most smartphones and tablet computing devices, and their use normalised by providing easy APIs within operating systems; such as iOS. These gestural touch systems are markedly different from gesture recognition whereby a device is moved within a three-dimensional space. In reality, touchscreen gestures are about the two-dimensional movement of the user’s fingers, whereas gesture recognition is about the position and orientation of a device within three-dimensional space (often using accelerometers, optical sensing, gyroscopic systems, and/or Hall effect cells).

Gesture Recognition was originally seen as an area mostly for research and academic investigation. However, two products have changed this view in recent times. The first of these is the popular games console, the Nintendo Wii, which uses a game controller as a handheld pointing device and detects movement in three dimensions. This movement detection enables the system to understand certain gestures, thereby making game-play more interactive and intuitive. Because it can recognise naturalistic, familiar, real world gestures and motions it has become popular with people who would not normally consider themselves to be games console users. The second major commercial success of gesture recognition is the Apple iOS. The iOS³ can also detect movement in three-dimensions and uses similar recognition as the Nintendo games console to interact with the user. The main difference is that the iOS uses gesture and movement recognition to more effectively support general computational and communication-based tasks.

**Figure: Head Operated Mouse**. Head Operated Mouse. ---Image Credit: SmartNAV 4. — **Figure: Head Operated Mouse**. Head Operated Mouse. —Image Credit: SmartNAV 4.

There are a number of specialist input devices that have not gained general popularity but nevertheless may enable the UX specialist to overcome problems not solvable using other input devices. These include such technologies as the Head Operated Mouse (see Figure: Head Operated Mouse), which enables users without full body movement to control a screen pointer, and can be coupled with blink detection to enable target acquisition. Simple binary switches are also used for selecting text entry from scanning soft keypads using deliberate key-presses or, in the case of systems such as the Hands-free Mouse COntrol System (HaMCoS), via the monitoring of any muscle contraction. Gaze detection is also popular in some cases, certainly concerning in-flight control of military aircraft. Indeed, force feedback has also been used as part of military control systems but has taken on a wider use when some haptic awareness is required; such as remote surgical procedures. In this case, force feedback mice and force feedback joysticks have been utilised to make the remote sensory experience more real to the user. Light pens have given rise to pen-computing but have now themselves become less used especially with the advent of the touch-screen. Finally, immersive systems, or parts of those immersive systems, have enjoyed some limited popularity. These systems originally included body suits, immersive helmets, and gloves. However, the most popular aspect used singularly has been the interactive glove, which provides tactile feedback coupled with pointing and target acquisition.

4.4 Summary

Understanding how we use our senses: seeing (visual channel), hearing (auditory channel), smelling (olfactory channel), and touching (haptic channel) enables us to understand how to communicate information regarding the state of the system via the interface. Knowledge of the natural processes of the mind: attention, memory and learning, exploration and navigation, affective communication and complexity, and aesthetics, enables an understanding of how the information we transmit can better fit the users mental processes. Finally, understanding how input and control are enacted can enable us to choose from the many kinds of entry devices which can be used for information input, selection, and for target acquisition. However, we must remember that there are many different kinds of people, and these people have many different types of sensory or physical requirements which must be taken into account at all stages of the construction process.

4.4.1 Optional Further Reading / Resources

[M. H. Ashcraft.] Fundamentals of cognition. Longman, New York, 1998.
[S. Brand.] The Media Lab: inventing the future at MIT. Penguin Books, New York, N.Y., U.S.A., 1988.
[A. Huberman] Andrew Huberman. YouTube. (2013, April 21). Retrieved August 10, 2022, from https://www.youtube.com/channel/UC2D2CMWXMOVWx7giW1n3LIg
[C. Pelachaud.] Emotion-oriented systems. ISTE, London, 2012.
[R. W. Picard.] Affective computing. MIT Press, Cambridge, Mass., 1997.
[J. Raskin.] The humane interface: new directions for designing interactive systems. Addison-Wesley, Reading, Mass., 2000.
[S. Weinschenk.] 100 things every designer needs to know about people. Voices that matter. New Riders, Berkeley, CA, 2011.

Up next

5. Practical Ethics