Voice Browsing Approach to E-business Access: a Blind's Perspective

Information accessibility on World Wide Web (WWW) still remains to be a complex issue for blind users as majority of websites are invaded by contents, both non-visual (audio) and visual (video & images). Accessibility measures in terms of a usable and blind-friendly website should be made available to blind users. A particular area that can be improved on is the e-Business segment for blinds. Easily-accessible, e-Business platforms could further promote entrepreneurship practices through the development of blind-friendly features such as voice. In current study, we investigated the existing difficulties (such as inconsistent webpage structure, incompetent voice recognition engines and use of long sentences as commands, etc) to access the web-content and suggested a different (numeric hotlist) approach, in order to make the web more accessible, exclusively for blinds.


Introduction
Most web applications attract a majority of users but still the efforts to make them blind-friendly are not enough.The core issues of insufficiency of these applications are: a) Accessibility standards (provided by Web Content Accessibility Guidelines (WCAC), under Web Accessibility Initiative (WAI)) are not followed b) Inefficient speech recognition engines in recognizing voice commands accurately.Waibel & Lee (1990) pioneered the voice technology systems which enabled tasks such as bank balance inquiries, flight schedules and phone call transfers.Such voice technology-related advancements were especially useful in the field of medicine, acknowledged later on by Bajgoric (2006).Investigated by Carter & Markel (2001), following are the major organizations that address web-accessibility issues.
• W3C's (World Wide Web Consortium) Web Accessibility Initiative is the mother lode of web accessibility information.
• The Center for Applied Special Technology (CAST) sponsors Bobby, a program to check the accessibility level of a website.
• The Trace Center is a research organization at the University of Wisconsin-Madison.
• National Center for Accessible Media (NCAM), researches, develops and tests methods of integrating web access technologies.
• AWARE (Accessible Web Authoring Resources and Education), educates developers to focus on web accessibility.• Microsoft's Enable also encourages developers to manufacture accessible products.A consistent structure by using headings and lists would really facilitate to state the browsing issues as we propose the idea of using Numeric Hotlist for navigating a webpage through voice, which would assist blind users to interact with the system more naturally.The impacts of voice enabled systems have yet to be revealed but speech-enabled mobile phones, navigational systems, and automobiles might make a distinct contribution towards blind people's life.Although future would certainly bring a lot of technological advancements but the pace of development, in this particular area (voice enabled systems), remains to be slow.The paper is structured as follows; Section 2 provides a brief literature review on existing voice technology and its weaknesses.An overview of voice browsing is given in section 3. Section 4 presents analysis and suggestions to enhance the existing models with the help of voice technologies (such as Microsoft Speech Software Development Kit).And section 5 gives a future outlook for the web semantics with special emphasis on blinds.

Problem Definition
The term web accessibility refers to easy access by all, however, Web Accessibility Initiative, a group managed by World Wide Web Consortium (W3C), describes web accessibility as to create and develop the web in such a way that allows blind and older people to not only access but contribute to the web same as any normal person Talib, Shuqin, Abrar & Shafiq (2009).WAI has defined certain standards for web accessibility to keep the web evolving in a single direction.Theses standards involve every type of disability where access to the web is affected either visually, aurally, physically, cognitively, neurologically or by speech.

Existing Web
Web Content Accessibility Guidelines (WCAG 2.0) provides a foundation for developing web applications by offering necessary web accessibility support but unfortunately the practical use of these guidelines is very inadequate and insufficient.Most of the websites do not meet even the basic standards provided by WCAG 2.0, making the websites inaccessible for blind users.WCAG 2.0 listed seven common barriers to web accessibility for blinds:  images without alternative text  image map hot spots without alternative text  misleading use of structural elements on a webpage  uncaptioned audio or un-described video  lack of alternative information for users who cannot access frames or scripts  tables that are difficult to decipher when linearized

 sites with poor color contrast
The web developers should understand the practical difficulties faced by blind users, as Asakawa (2005) and Fukuda, Saito, Takagi & Asakawa (2005) believed that web developers try to comply with the web accessibility guidelines without understanding the pragmatic requirements of blind users.The structure of the web pages are complex, and simple accessibility rules are not followed which results in web contents not being accessible by blind and elderly users Rana, Reynolds, Cirstea & Entecott (2007).Only in China, approximately 12.33 million people of the population have some form of visual impairment.About 314 million people are visually impaired worldwide; 45 million of them are blind World Health Organization (2009).These statistics make it clear how fast the numbers of blind users in the world are growing.
In our opinion, web accessibility is not only important for blind or elderly people; it is also a concern for those people who are naïve to the computer or web.Discussing on this issue, it was disclosed that almost 97% of the websites do not provide the basic accessibility standards and this statement was based after accessibility agency Nomensa (2006), tested the leading websites in five different sectors (travel, retail, banking, government and media), across 20 countries.This news is not only shocking but shows the lack of responsibility taken by web developers who completely ignore the needs of a whole range of (blind) people and hence contradict to the idea of web access for all.

Technology
WWW is becoming the ultimate source of information, so access to web gets more vital as for web developers; the most complex development task is to address the blinds.Some solutions were proposed to overcome this difficulty, out of which one was, Screen Readers Charles, Hemphill & Thrift (1995) but unfortunately, screen readers can only provide the text of the applications and its contents; however it is difficult to deliver the same interaction by audio (by converting text into speech).Screen readers and voice browsers (another proposed solution) usually read the document from top left to the bottom right corner which makes it difficult for blind users to access a specific point on the page especially when they are at the bottom.It is because of the inconsistent structure of the document which always varies from site to site as the guidelines are not followed Matlay (2004).Furthermore, Buzzi, C, Buzzi, M, Leporini & Akhter (2009) described that interaction via screen reader involves several issues such as:  Information overloading, hence time consuming  Repetition of information to reach the required one  Poor structure leads to user annoyance in finding the relevant information  Reading row vise from tables makes the information out of order  Links, buttons and content's titles should be self explanatory  Finding it difficult to work with form control elements  Alternative descriptions for visual and non-visual contents should be provided.

Voice Browsing
A browser is a software application typically used to bring the contents of (an HTML) webpage into display, which allows quick access to the links, buttons, text and pictures.Asakawa & Itoh (1998) emphasized that blind people can easily access the web by using non-visual browsers such as a voice browser.Furthermore, Christian, Kules, Shneiderman & Youssef (2000) described that a voice browser is at least capable of rendering web pages in audio format or it can interpret speech input for navigation.Supported with technology, an ideal web browser is capable of two-way communication with the user through speech or voice.It not only allows the user to listen to what is on the screen, but also allows the uttering of specific commands (e.g. next page, skip, active links on page and submit forms), all by voice.
Figure 1 demonstrates how the Voice Browser works; the input voice passes through the voice recognition process and would invoke an action in the browser only if it matches with one of the voice commands/vocabulary available in the voice recognition engine.However, integrated use of scripts like JavaScript, and visual interfaces like flash, makes it even more difficult for voice browsers to audibly present the web page to blind users.Gupta and Kaiser (2005) observed that ads, non-description link text e.g.'click here', 'see more' and inaccessible forms on the web pages seems to be a big problem which restrict accessibility developers to develop a generic approach to extract the useful contents from a webpage.Theofanos and Redish (2003) blamed excessive use of graphics and image-links preventing a solid solution for blinds.Moreover, Rudall & Mann (2006) found that long input/command is more complicated and prone to errors.But inconsistency of web structure remains a restriction for developers to come up with a universal solution; as it is impossible to change the whole web.

Web Architecture
The heading tags such as <H1>, <H2> are not used by web developers and to synchronize the contents, navigational aids are used.However, navigation through these aids is not a trouble-free task for blind users.Relating to this, the structure of the web can be built with a semantic approach, interpreting the visual content (by scanning) into an appropriate, sequenced audio (Optical Character Recognition technology can be used to convert the image if the image represents some text, more accurately printed text) which will not only assist blind users to navigate properly but issues like time constraint, user annoyance and informal structure, will also be addressed as well Kouroupetroglou, Salampasis & Manitsaris (2006).

Related Work
Charles, Hemphill & Thrift (1995) offered the idea of flexible vocabulary and dynamic grammar interface by designing a speech user Agent.The user can interact with the system by commands or speak-able hotlist, to control the browsing experience like scroll down, back as shown in the figure 2. The user can use the speak-able hotlist in which a grammar can be associated with a URL, for example, 'Weather in Wuhan' should open the http://www.bbc.com/weather/wuhan.The inspiration of smart pages was introduced to implement this in which, grammar is defined and for a given link many alternative grammars can be introduced.So, an idea which started to make the existing web useful ended in a list dependent approach.Ramakrishnan, Stent & Yang (2004) realized the actual problem with voice browsers (and screen readers), which are now fully capable to read the text on the screen (or alternative text for images), but are unable to convey the logical structure and semantics of the content in a web document.In our opinion a dynamic conversion approach, is required instead of attempting to change the whole existing web, which is not possible in anyway.So, a universal solution is required using existing browsers (or using a new standard browser), which can convert the existing web into a useable format (for blinds).

Analysis and Findings
In order to improve voice accessibility, we need to have a voice recognition engine and in current scenario we opt to use Microsoft's Speech Software Development Kit (MSSDK) v5.1, which provides a fine opportunity to develop high quality speech applications, offering enormous voice functions.However, MSSDK does not accept a long list of voice commands at any one time, which is around 400-500 commands for the numeric list.For "natural text commands" (or word commands), this capacity is almost less than 70 commands.The programs can be set in to Dictation Mode on user request, where the user can add and/or alter the entered commands word by word, as described in the table 1. Reasonable speech input option should be offered, where the user can comfortably input the data through speech and navigates between contents.Voice output is also one of the biggest setbacks as sometimes the voice quality is poor and hard to understand but the good news is that there are 'Voices' available in the market which are very similar to the normal speech output for Voice Browser.
From the above mentioned information, we can understand that a command should not be longer than 3 words to be recognized successfully, i.e. if each command consists of a single word then nearly 60-85 commands can be offered otherwise we would have to offer only 30-50 set of commands (see table 1).This indicates that only limited set of commands can be offered for recognition at any one time.
The proper and efficient use of numeric commands (hotlist) in Dictation Mode is quite helpful but is a huge task to be achieved.Dynamic grammar and improved voice recognition in dictation mode are some of the open options to be explored in this field.Web Accessibility through speech can be greatly improved, if all the applications, algorithms and web structure developed in compliance with the WCAG guidelines.There is a need of such tools, which can help developers to build web pages and generate contents according to the guidelines making it more convenient for blinds to access web.In an attempt to enhance the accessibility features for the web, Voice Browser also provides prospects to increase the accessibility for windows, its applications and commands.Opening new horizons to pursue the voice access for operating system and web based technologies is indeed a huge task, which requires more research, corroboration and support from all the scientific community.

Conclusion
Voice accessibility, has a great potential as a marketing tool, and might not only increase the financial worth, by providing opportunities for the blind to become a part of online business activities but also the company's repute for its social responsibilities.Improvement in speech recognition abilities would definitely help to enhance the capabilities of voice browsers, so that more generic and effective solutions to escalate the formal e-Business access could be obtained, explicitly for blinds.The optimum focus of current effort can be summarized as: introducing numeric hierarchies (same as HTML heading tags) by including dynamic voice commands; introducing the whole range of document object model elements to offer comprehensive voice accessible browsing solutions; reading data from pictures (and graphics) using Optical Character Recognition (OCR) techniques and offering it through speech; reading webpage data and associating related data with each other by introducing semantic relations; reading data from objects and scripts e.g.Flash, JavaScript, VB Script and converting it to understandable (speech) format; extracting and converting pop-ups, alerts and banners into useful data; searching specific text within elements; and finally moving ahead by handling windows through voice commands.Furthermore, we also need to take account of noise level, microphone being used and computer hardware to get the fine quality of voice recognition.Use of help command at any segment of navigation between web content, can significantly reduce the height of complexity for blind users.Global expansion in e-Business volume will open new prospects for the blind to be a part of it; either as a consumer or entrepreneur, which will reveal niche markets creating new opportunities with an enormous potential in previously an untargeted e-Business area.(Charles et al, 1995)

Figure 1 .
Figure 1.Conceptual components and their interaction diagram

Figure 2 .
Figure2.Adopted from Surfing the web by voice,(Charles et al, 1995)

Table 1 .
Command list in dictation mode for voice browser