1. Trang chủ
  2. » Ngoại Ngữ

Couresy of Mandy Mobley,Lynn Qu,Eric Sit,Jessica Wong.Used with permission

47 5 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Structure, Practice and Innovation in EE/CS
Tác giả Mandy Mobley, Lynn Qu, Eric Sit, Jessica Wong
Trường học Massachusetts Institute of Technology
Chuyên ngành Electrical Engineering and Innovation in EE/CS
Thể loại paper
Năm xuất bản 1998
Thành phố Newton
Định dạng
Số trang 47
Dung lượng 318,5 KB

Cấu trúc

  • 1.1 The Mission (4)
  • 2.1 Introduction (6)
  • 2.2 Types of Problems (6)
  • 2.3 Traditional approaches to SR (7)
  • 2.4 Hidden Markov Models-Theory (8)
  • 2.5 Markov Models in Linguistics (0)
  • 2.6 HMM Ph.D. Dissertation at Carnegie Mellon (9)
  • 2.7 Emergence of HMMs (9)
    • 2.7.1 Benchmarking (9)
    • 2.7.2 Paradigm Shift (10)
  • 3.1 The Bakers (11)
  • 3.2 The DRAGON system (11)
  • 3.3 IBM (13)
  • 3.4 Verbex(Exxon) (13)
  • 4.1 Paul Bamberg (14)
  • 4.2 Larry Gillick (16)
  • 4.3 Early projects and technologies (18)
  • 5.1 Joel Gould (19)
  • 5.2 Technical Difficulties (20)
    • 5.2.1 Processor Power and Memory (20)
    • 5.2.2 Insufficient Speech Recognition Models (20)
  • 5.3 Social Difficulties (21)
    • 5.3.1 Baby Dragon (21)
    • 5.3.2 Military Spelling versus Natural Spelling (21)
    • 5.3.3 Re-engineering the engineers (22)
    • 5.3.4 Keeping It Away from the Competitors (23)
    • 5.3.5 Making It Natural (24)
    • 5.3.6 Change in Market (24)
    • 5.3.7 Change in Marketing Channels (25)
    • 5.3.8 Change in Corporate Attitude (26)
  • 5.4 Success (26)
  • 6.1 IBM (27)
    • 6.1.1 ViaVoice (28)
  • 6.2 Lernout and Hauspie (29)
    • 6.2.1 Alliance With Microsoft (0)
  • 6.3 Microsoft (32)
  • 6.4 Others (33)
    • 6.4.1 Philips (33)
    • 6.4.2 Future Competitors (34)
  • 7.1 PC Desktop Integration (35)
  • 7.2 Handheld Devices (36)
  • 7.3 Language Translation (36)
  • 7.4 Asian Languages (37)
  • 7.5 Speech Understanding (37)

Nội dung

The Mission

Dragon Systems, known as the Natural Speech Company™, is a prominent global provider of speech and language technology As of August 1998, it holds the position of the seventh largest business software publisher in the United States, according to PC Data.

Our group first became interested in this company after reading an article in the September issue of MIT’s Technology Review.Written by Simson Garfinkel,it introduced Dragon Systems as

Dragon Systems, founded in 1982 by Jim and Janet Baker, distinguished itself as the first company to launch a continuous speech recognition product for personal computers, outpacing industry giant IBM Notably, the company achieved this milestone without the support of venture capitalists or a formal business plan Despite the high failure rate of startups in the speech recognition sector, Dragon's remarkable success piqued our interest We sought to explore the factors contributing to Dragon's achievements, assess its current standing in the market, and understand what it must do to maintain its leadership in the industry.

To accomplish our goal, we conducted interviews with key personnel at Dragon Systems, including a co-founder, the Vice President of Research, and the Chief Architect Engineer responsible for their popular products Additionally, we reviewed published press materials and academic papers, such as Jim Baker's Ph.D dissertation, to gain deeper insights into the company and the industry.

We conducted interviews with speech recognition experts from both academia and industry to explore the concept of success for Dragon Systems Our goal was not to find a definitive answer but to gather insights that would enhance our understanding of the subject Additionally, we aim to apply the knowledge gained in our coursework to this project, bridging the gap between theoretical concepts and practical applications in the field of speech recognition.

Dargon Systems, headquartered in Newton, Massachusetts, is situated less than 10 miles from MIT The company resides in a striking red brick building that was formerly an old rope mill, making it a notable landmark in a residential area During our visit on a sunny December morning, we were warmly welcomed by Janet Baker’s assistant, Carlin Folkedal, who guided us through the premises Throughout the building, we noticed an impressive collection of dragon memorabilia, a passion shared by founders Jim and Janet, who have been collecting dragon-themed items for many years Visitors often contribute to this collection by bringing their own dragon-themed gifts.

Each conference room is named after a mythical dragon,with the fable pasted on the doors.

Japanese bamboo screens elegantly obscure the gray divides of cubicles, while vibrant oriental fans adorn the exposed brick walls The spacious, high-ceilinged rooms are illuminated by tall windows, creating an illusion of greater space Recently, Dragon Systems acquired a smaller adjacent building, relocating its research and engineering teams there, while the original building now accommodates the financial, marketing, user support, human resources, and quality assurance departments.

The workforce is diverse in age, encompassing individuals from their early 20s to those in their 50s While the male-to-female ratio in the research and engineering divisions is noticeably skewed, other divisions maintain a more balanced gender representation.

In many office environments, employees predominantly use keyboards for typing rather than dictating with microphone headsets Folkedal notes that this preference is largely due to familiarity, stating, "Most of us here do tend to use the keyboard more." He adds that it is primarily the Quality Assurance team that frequently utilizes the microphone for their tasks.

The recent conclusion of Comdex '98, the largest software trade show of the year, marked a successful showcase for Dragon's latest products After months of diligent preparation, the engineering team celebrated the positive reception of their demonstrations in Las Vegas Their hard work has paved the way for us to accurately recount the evolution of Dragon Systems.

This article is structured into four key sections: it begins with the history of speech recognition, followed by an exploration of the Bakers' journey from their initial involvement in the field to the establishment of their company The third section delves into Dragon Systems, highlighting two significant figures and providing a brief overview of the company's history The discussion then shifts to Dragon's standout product, NaturallySpeaking, and its impact on the market The final section addresses the current and future challenges faced by Dragon Systems In conclusion, we reflect on the insights gained about Dragon Systems, the speech recognition industry, and the essential elements for achieving success in this context.

2 THE PROBLEM OF SPEECH RECOGNITION Dragon Systems

2 The Problem of Speech Recognition

Introduction

Since the 1950s, speech recognition has been a subject of extensive research, yet a universal solution remains elusive Significant advancements in the field were made possible in 1971 when the Advanced Research Projects Agency provided crucial funding.

Recognition involves converting the analog data of an acoustic signal into digital information for computer processing This task shares similarities with general pattern recognition and encounters many of the same challenges.

Success at this task depends upon a wide range of factors,including the input signal,the

The quality of the signal and the unique properties of different languages significantly impact speech recognition Some languages present greater challenges, with factors such as homophones, tonality, and difficult-to-distinguish phonemes complicating the recognition process According to Joel Gould, Chief Architect at Dragon Systems, French is the most challenging language for recognition, while Italian is the easiest, with English positioned in between.

Types of Problems

Discrete speech recognition involves recognizing words that are separated by pauses, making it an older form of speech recognition originally designed for telephony to identify spoken digits Users must pause between each digit for the system to accurately interpret the speech signal This separation simplifies the recognition process to matching single signals with single words Today, this technology assists phone directory users in specifying city names or selecting menu options without needing a touch-tone phone.

Continuous speech recognition involves the challenge of understanding words that blend together without pauses, resembling natural conversational speech For individuals learning a foreign language, this presents significant difficulties, as they lack the familiarity and training to effortlessly discern word boundaries Similarly, speech recognition systems, which lack the complex contextual knowledge inherent to human understanding, encounter comparable challenges in processing continuous speech, much like a novice in a new language.

Continuous speech is by far the harder problem of the two,which many thought had no

Attainable solution until the next century.

Traditional approaches to SR

Earliest solutions to the recognition problem used techniques of Artificial Intelligence to extract Information of spoken words from acoustic signals.

Template-based Date files contain “prototypical”voice patterns of individual words The prob- lem or recognition is then finding the best match for the signal among the possible “tem-

3 D.R.Reddy Speech Recognition by Machine: A Review

2 THE PROBLEM OF SPEECH RECOGNITION Dragon Systems plates.”This is an effective method for very constrained recognition tasks with small vocab- ularies.

Knowledge-based recognition systems enhance speech recognition by incorporating linguistic rules, which mimic human speech processing more effectively than merely analyzing acoustic signals This approach allows knowledge-based recognizers to tackle a broader range of recognition challenges compared to template-based systems.

Stochastic Stochastic,or probabilistic approaches have gained wide popularity in the last decade

Hidden Markov Models are among the most widely used techniques in this field They offer greater flexibility compared to Template-Based models and facilitate the integration of various knowledge sources more effectively than Knowledge-Based methods.

Connectionist The newest player to SR,Connectionist approaches such as Neural Nets are still

Controversial in nature, these approaches share similarities with stochastic methods as they necessitate training to refine their strategies However, they also take into account the interactions among numerous computing units, aiming to emulate the computations performed by the human nervous system This represents a potential future paradigm for strategic reasoning (SR).

Artificial Intelligence (AI) approaches predominantly dominate the field, with stochastic methods viewed as strictly mathematical In the 1970s, rule-based recognizers like KEAL and HEARSAY achieved commendable performance However, in the past twenty years, there has been a significant transition in speech recognition technology towards Hidden Markov Model (HMM) approaches.

Hidden Markov Models-Theory

The Markov Model, developed by Russian mathematician Andrei Markov in the late 19th and early 20th centuries, is a mathematical framework used to describe the probabilities associated with discrete states in a process This model defines various states and the probabilities of transitioning between them, adhering to the Markovian property, which asserts that the transition probability from one state to another relies solely on the current state and not on the sequence of prior states For instance, the transition probability P23 is determined only by the current state 2, independent of any past or future events.

Figure 1:Simple Markov Model with four states.Each transition pij corresponds to the probability of moving from state i to state j in one step.

2 THE PROBLEM OF SPEECH RECOGNITION Dragon Systems

In the 1950s and 1960s, the Markov Model was discussed as a potential method for parsing speech from phonemes However, renowned Institute Professor Noam Chomsky contended that the Markov Model had inherent flaws that hindered its effectiveness in recognizing grammatical English sentences.

Chomsky discredits the Markov-based system as a model for speech with the following argu- ments:

1.The Markov model does not separate the clearly grammatical from the clearly ungrammatical

2.Successive improvements in the Markov model will not change its status with respect to (1).

3.There are types of sentences in the language which the Markov model cannot generate.

4.It is impossible to collect the data necessary to build a Markov model.

Experiments conducted and compiled by Damerau and others did show that a recognizer of grammatical English sentences using a Markov Model was in fact feasible,though too slow to be practical.

2.6 HMM Ph.D.Dissertation at Carnegie Mellon

In 1975, Jim Baker earned his Ph.D from Carnegie Mellon University, combining his expertise in statistical mathematics with his passion for speech recognition (SR) He developed "The DRAGON System," which utilized Hidden Markov Models (HMM) as its foundational theory Despite Baker's significant contributions, Damerau did not reference his work in any of his research Nonetheless, Baker was a pioneer in the SR field, being the first to implement the theory and advocate for its practical applications.

Hidden Markov Models (HMMs) are effective for speech recognition as they integrate various types of knowledge about speech problems seamlessly Their performance has been demonstrated to surpass that of many AI approaches, making them a preferred choice in the field.

The emergence of Hidden Markov Models (HMMs) in industry was significantly influenced by regular benchmark evaluations conducted by key organizations, including the National Institute of Standards and Technology (NIST), the Department of Defense Advanced Research Project Agency (DARPA), and the National Security Agency (NSA).

5 F.J.Damerau, Markov Models in Linguistic Theory

6 Baker,James K Stochastic Modeling as a Means of Automatic Speech Recognition

7 http://www.nist.gov/speech

2 THE PROBLEM OF SPEECH RECOGNITION Dragon Systems

In 1984,Dr.Duane Adams,then of DARPA,approached Dr.David Pallet 8 of NIST about im- plementing benchmark tests for the benefit of the DARPA speech recognition research community 9

Competitors in the evaluations were not allowed to share their code but had to explain the functionality of their systems As the evaluations progressed, it became evident that Hidden Markov Models (HMMs) were outperforming other recognition systems, leading to their widespread adoption by various recognition technologies.

A paradigm shift, as described by Thomas Kuhn, occurs during a scientific revolution when an accepted model is replaced by a new one, often triggered by a crisis The evolution of Hidden Markov Models (HMMs) in speech recognition technology exemplifies this significant transition in established frameworks.

SR engines,brought about by the change in modeling power and clear demonstrations of superiority.

Current competitive speech recognition products rely on Hidden Markov Models (HMMs) as their foundational recognition engine The initial belief that HMMs were unsuitable for speech recognition due to their mathematical basis has been conclusively disproven by their widespread adoption and success in the industry today.

The transition from discrete to continuous speech recognition is closely linked to the development of Hidden Markov Models (HMMs) Traditional template-based methods fall short in addressing continuous speech, as they are restricted to matching single-word patterns While knowledge-based systems can process strings of words, they lack the ability to identify word boundaries In contrast, HMMs overcome these challenges with their robustness and adaptability, making them well-suited for continuous speech recognition They facilitate a seamless transition in forming word hypotheses, effectively bridging the gap between single-word recognition and continuous speech processing.

8 http://www.nist.gov/speech/pallettd.htm

9 Dr.David Pallet(E-mail),Group Manager of Spoken Natural Language Processing Group,NIST

11 Thomas Kuhn, The Structure of Scientific Revolutions

Jim Baker and Janet Maciver, graduate students at Rockefeller University in New York, formed a partnership in both life and research Jim, a mathematician with a focus on probability and statistics, and Janet, a biophysicist, began exploring the challenges of speech recognition in the fall of 1970 They married in 1971 and have since collaborated as a husband-and-wife team in the field of speech recognition.

Jim Baker's Ph.D dissertation introduces his HMM-based recognition system, Dragon, which is inspired by the Bakers' historical connection to dragons The Dragon system is comprised of five key programs: MAKDIC, MAKGRM, MAKNET, GETPRB, and DRAGON.

MAKDIC develops a comprehensive dictionary focused on acoustic-phonetic knowledge, while MAKGRM constructs a finite-state grammar representation The integration of these components is achieved through MAKNET, which merges the outputs of both MAKDIC and MAKGRM into a unified network Additionally, GETPRB generates probability estimates for phonetic transitions, enhancing the overall functionality of the system.

DRAGON computes conditional probabilities to find the best match for acoustic input,thus“rec- ognizing”speech.

MAKDIC is a phonetic dictionary that serves as the vocabulary for the model, organized into phonetic components While we won't delve into the specifics of how raw acoustic signals translate into these phonetic units, it's important to note that the dictionary encompasses all words recognized by DRAGON, allowing for customization for specific tasks For effective speech recognition, a comprehensive dictionary is essential.

MAKGRM enhances language processing by incorporating grammatical rules defined through a Context-Free Grammar (CFG) A CFG consists of variables, terminals, and rules, where terminals represent the letters or words of the language Variables serve as placeholders in language construction, while rules dictate how these variables can be combined to create valid words and sentences This structured approach allows for a concise and recursive definition of grammatical rules.

MAKNET combines the results of MAKDIC and MAKGRM into a single Markov Model network with associated probabilities.

GETPRB estimates the conditional probabilities of producing a set of acoustic values,given a Phonetic value This is important in classifying the acoustic signal as its phonetic equivalent.

DRAGON effectively consolidates knowledge from previous components to compute conditional probabilities, enabling it to match acoustic signals with spoken words accurately Unlike many other models that rely on best-first search strategies within a classification decision tree, DRAGON explores all potential network paths to identify the optimal route While this comprehensive approach may seem time-consuming, DRAGON mitigates this by maintaining a compact search space, resulting in an inherently time-bounded search process This characteristic sets DRAGON apart, as it can offer a linear time bound that other models cannot achieve.

12 Michael Sipser Introduction to the Theory of Computation

HMM Ph.D Dissertation at Carnegie Mellon

In 1975, Jim Baker earned his Ph.D from Carnegie Mellon University, merging his expertise in statistical mathematics with his passion for speech recognition (SR) He pioneered the implementation of the DRAGON System, which utilized Hidden Markov Models (HMM) as its foundational theory Despite his significant contributions, Baker's work was not acknowledged by Damerau in his research Nonetheless, Baker's practical application of these theories established him as a key figure in the development of the SR field.

Emergence of HMMs

Benchmarking

The emergence of Hidden Markov Models (HMMs) in industry was significantly influenced by regular benchmark evaluations conducted by key organizations, including the National Institute of Standards and Technology (NIST), the Department of Defense Advanced Research Project Agency (DARPA), and the National Security Agency (NSA).

5 F.J.Damerau, Markov Models in Linguistic Theory

6 Baker,James K Stochastic Modeling as a Means of Automatic Speech Recognition

7 http://www.nist.gov/speech

2 THE PROBLEM OF SPEECH RECOGNITION Dragon Systems

In 1984,Dr.Duane Adams,then of DARPA,approached Dr.David Pallet 8 of NIST about im- plementing benchmark tests for the benefit of the DARPA speech recognition research community 9

Evaluations revealed that while competitors did not share their code, they were obligated to explain the functionality of their systems Over time, it became evident that Hidden Markov Models (HMMs) were outperforming other recognition systems, leading to a trend where additional systems began to integrate HMMs into their frameworks.

Paradigm Shift

A paradigm shift, as described by Thomas Kuhn, occurs during a scientific revolution when an established model is replaced by a new one, often triggered by a crisis The evolution of Hidden Markov Models (HMMs) in speech recognition technology exemplifies this significant transformation in the accepted framework.

SR engines,brought about by the change in modeling power and clear demonstrations of superiority.

Current competitive speech recognition products predominantly utilize Hidden Markov Models (HMMs) as their foundational recognition engine The initial belief that HMMs are unsuitable for speech recognition due to their purely mathematical basis has been thoroughly disproven by their widespread adoption and success in the industry.

The transition from discrete to continuous speech recognition is closely linked to the development of Hidden Markov Models (HMMs) Traditional template-based methods struggle with continuous speech as they are restricted to single-word pattern matching, making them ineffective for this complex challenge While knowledge-based systems can process strings of words, they lack the capability to identify word boundaries In contrast, HMMs excel in this area due to their robustness and adaptability, making them well-suited for continuous speech recognition They facilitate a seamless transition in forming word hypotheses, effectively addressing the limitations of single-word processing.

8 http://www.nist.gov/speech/pallettd.htm

9 Dr.David Pallet(E-mail),Group Manager of Spoken Natural Language Processing Group,NIST

11 Thomas Kuhn, The Structure of Scientific Revolutions

The Bakers

Jim Baker and Janet Maciver, both graduate students at Rockefeller University in New York, met while pursuing their studies in the early 1970s Jim, a mathematician with a focus on probability and statistics, and Janet, a biophysicist, developed a shared interest in speech recognition during the fall of 1970 They married in 1971 and have since collaborated as a husband-and-wife team in the field of speech recognition.

The DRAGON system

Jim Baker's Ph.D dissertation introduces his HMM-based recognition system, Dragon, which is inspired by the family's historical fascination with dragons The Dragon system comprises five key programs: MAKDIC, MAKGRM, MAKNET, GETPRB, and DRAGON.

MAKDIC develops a comprehensive dictionary of acoustic-phonetic knowledge, while MAKGRM constructs a finite-state grammar representation MAKNET integrates the outputs from both MAKDIC and MAKGRM into a unified network Additionally, GETPRB generates probability estimates for phonetic transitions, enhancing the overall system's functionality.

DRAGON computes conditional probabilities to find the best match for acoustic input,thus“rec- ognizing”speech.

MAKDIC is a phonetic dictionary that serves as the model's vocabulary, organized into phonetic components This dictionary includes all words recognizable by the DRAGON speech recognition system, allowing for customization for specific tasks For effective speech recognition, a comprehensive dictionary is essential.

MAKGRM enhances language processing by incorporating grammatical rules defined through Context-Free Grammar (CFG) A CFG consists of variables, terminals, and specific rules, where terminals represent the letters or words of the language, and variables serve as placeholders in language construction The rules govern the combinations of these variables to create meaningful words and sentences Consequently, CFG offers a concise method for recursively establishing grammar rules.

MAKNET combines the results of MAKDIC and MAKGRM into a single Markov Model network with associated probabilities.

GETPRB estimates the conditional probabilities of producing a set of acoustic values,given a Phonetic value This is important in classifying the acoustic signal as its phonetic equivalent.

DRAGON synthesizes knowledge from prior components to compute conditional probabilities for accurately matching acoustic signals with spoken words It uniquely explores all possible network paths to identify the optimal route, contrasting with other models that typically rely on best-first search methods within a classification decision tree While this approach might suggest increased processing time, DRAGON mitigates this by maintaining a compact search space, leading to significant advantages Its comprehensive search across all classification possibilities ensures a time-bounded process, a feature that many other models lack, as they cannot guarantee a linear time constraint.

12 Michael Sipser Introduction to the Theory of Computation

Dragon Systems cannot ensure that their computations will complete before the next scheduled computation begins This factor is vital for continuous speech recognition, which aims to operate in real-time.

The use of Hidden Markov Models in speech recognition sparked debate due to its innovative approach, diverging from the conventional methods employed by most speech recognition systems at that time.

IBM

Jim and Janet left CMU to work for years on HMM speech recognition at IBM,Yorktown

Heights.Here,they made significant progress into continuous SR But they were unhappy onlyDoing development and not getting the product out to real users.

Verbex(Exxon)

In 1979,they looked elsewhere to see where they could do development work with a real productVerbex,a research branch of Exxon,was doing work on discrete speech 13

13 Janet Baker interviewed by S.L.Garfinkel

4 BIRTH OF THE DRAGON Dragon Systems

In 1982, Exxon discontinued its efforts in continuous speech recognition technology after launching its first product for collecting spoken data over the phone, leaving the Bakers unemployed once more Despite this setback, they remained unconcerned about their job prospects, as the computer industry had not experienced a recession up to that point.

Jim Baker reflects on the assumption that job opportunities would always be available, but acknowledges the unpredictable nature of funding for speech recognition projects This uncertainty means there’s no assurance of continuing work in areas of interest To secure a long-term commitment to their passion, the only viable solution is to establish their own company.

Jim Baker reflects on the mindset towards risk, stating that if the company were to fail, they would simply seek other job opportunities He emphasizes that the worst outcome would be returning to the path they would have taken if they hadn’t pursued this venture, highlighting the importance of trying despite potential setbacks.

The Bakers launched Dragon Systems without a formal business plan or venture capital, relying instead on their savings from jobs at IBM and Exxon to fund the venture for 15-18 months Balancing the challenges of entrepreneurship with raising two young children and managing a substantial mortgage, they operated the business from their home basement in Newton, Massachusetts.

Paul Bamberg

Paul Bamberg, a long-time faculty member of Harvard University's Physics Department, claims to have taught more courses at Harvard than anyone else in history, including summer and extension school classes With a robust build, a full graying beard, and a loud voice, he defies the stereotype of a traditional Ivy League professor Bamberg earned his Bachelor's in Physics from Harvard before studying at Oxford as a Rhodes Scholar After returning to Harvard in 1967, he has taught a diverse array of subjects, such as Pre-Med Physics, Math for Physics, and Theory of Algorithms In addition to his teaching role, he is a “Dragon Fellow” and one of the early employees of Dragon Systems.

Bamberg has always been passionate about the industry, believing that working two thirty-hour jobs is more beneficial than a single sixty-hour job, especially since you only get paid for forty hours of that time He first encountered the Bakers while working in the research department at Verbex under Jim Baker When the Bakers departed, Bamberg was invited to lead the research department, but instead of pursuing a managerial role, he eagerly chose to co-found Dragon when the Bakers reached out to him.

Bamberg played a pivotal role in developing key projects, particularly the original DragonDictate speech-to-text software, where he wrote the majority of the code and utilized his own voice and vocabulary for training Until recently, he served as the Vice President of Research at the company, although he remained disinterested in management Notably, he proposed a three-year rotation in positions, which was subsequently adopted by Dragon Systems.

4 BIRTH OF THE DRAGON Dragon Systems

Now Bamberg has relinquished his seat to Larry Gillick,and is happily heading the engineering Team which produced the prototype for non-English recognizers.

Bamberg is a leading recruiter for Dragon Systems, often encouraging students by highlighting their test performance He emphasizes that scoring above eighty is commendable, above ninety is impressive, and above ninety-five merits a personal discussion after class Furthermore, he actively promotes Dragon Systems to his colleagues.

It is no wonder the company today employs a high number of Harvard graduates and former staff members.

Bamberg shares a fascinating anecdote from the early days of Dragon, highlighting the concept of heterogeneous engineering, which Donald MacKenzie describes as "the engineering of the social as well as the physical world." He recalls how, despite being a small company operating from the Bakers' home, they attracted visits from Fortune 500 executives With a touch of humor, Bamberg recounts how his wife dressed as the "company maid" to impress Alan Key of Atari, emphasizing the resourcefulness and creativity that characterized their early efforts This experience remains a memorable part of their journey, even as they now partner with more powerful companies.

As Dragon has expanded, Bamberg reflects on the significant changes within the company, recalling its origins as a close-knit operation where everyone was familiar with one another He notes that while the team used to work during holidays and enjoy extended vacations, the growth of the business has necessitated a more professional approach, requiring them to function more like a conventional company.

Bamberg, with his extensive academic background, defines success more modestly than many top engineers in rapidly growing companies He reflects on the pride he felt when their innovations made significant impacts at global trade shows, contrasting this with the more ambitious standards of others who might not consider success achieved until they surpass industry giants like IBM in the speech recognition sector For Bamberg and his colleague Jim Baker, the primary goal was to establish an organization focused on fostering a lifetime of engaging research, a milestone they believe they have already accomplished.

Larry Gillick

Larry Gillick serves as the Vice President of Research at Dragon Systems, where he manages a team of 50 to 60 research scientists, comprising both full-time and temporary personnel His expertise, similar to that of many colleagues at Dragon Systems, is rooted in physics and mathematics.

Gillick got his Bachelor’s degree in Physics from Swarthmore College.He had a hard time deciding between Molecular Biology or Statisics as something he would like to pursue afterwards.

After conducting research on protein synthesis at Columbia University for two years, Gillick's interest in the field diminished, humorously noting, “I found it hard to keep the glass clean.” He then pursued a Ph.D in Mathematics at MIT, where he also took on teaching roles at both MIT and Northeastern University.

In addition to lecturing,he also analyzed clinical trials and consulted for the industry.

In late 1984, Gillick discovered an intriguing advertisement from Dragon Systems in the MIT Gazette Curious about the young company, he contacted Jim Baker to arrange a site visit Impressed by the innovative work he observed, Gillick recognized the potential of Baker's projects.

In January 1985, Gillick joined Dragon Systems, where he focused on developing isolated-word recognition for large-vocabulary speech recognition His primary challenge was to create an efficient acoustic model that required minimal memory, especially since Dragon initially lacked the funds for advanced computational resources This emphasis on efficiency would later become a crucial asset for the company.

After fourteen years at Dragon, Gillick now leads the research department, where his role emphasizes project management He mentions, “I’ve become used to doing stuff by remote control,” although he admits to missing the hands-on experience of writing code Despite this shift, he continues to engage in research by reading papers and analyzing algorithms to develop new recognizer models, while also representing his department in executive meetings.

Slenderly built,with thick glasses and dark,curly hair,Gillick is a self described “typical

In his leisure time, Gillick indulges in reading history and philosophy books, playing the piano, and listening to jazz and musicals He expresses his fondness for math with a smile, which is evident in his office where the whiteboard is covered with math formulas and graphs His workspace is cluttered with stacks of papers, including one intriguingly titled "Paranoia in Taiwan."

Gillick defends the increased departmentalization of the company, stating that despite its growth, Dragon maintains permeable departments that foster a richer intellectual environment He emphasizes that the company's expansion has allowed for roles like a director of marketing, which were previously unaffordable when Dragon was smaller.

Gillick believes that Dragon Systems has achieved success, though not in the same way as Bamberg He emphasizes their pioneering efforts in isolated-word and continuous speech recognition, which have become integral to their culture However, maintaining this culture requires ongoing effort to innovate and stay ahead in a rapidly growing market The research department fosters an atmosphere similar to a graduate school, prioritizing innovation over bureaucracy Gillick highlights that, while their culture is academic, their primary focus is on developing products rather than merely publishing research papers.

The success of their research lies in its product-oriented approach, which distinguishes them from companies that failed due to research conducted without practical application Gillick emphasizes that employee satisfaction is equally crucial for a thriving business He notes that their team finds joy in their work, stating, "We enjoy our work a lot," and highlights the fun involved in technology development, ensuring that the research environment feels engaging rather than burdensome.

4 BIRTH OF THE DRAGON Dragon Systems

Early projects and technologies

For the first eight years, Dragon Systems remained a small company, refraining from paying salaries and even declining potential hires to protect their employment As revenue began to grow, the company eventually started offering modest salaries of $100 each.

In its early days, Dragon Systems concentrated on research and licensing its technology, securing its first major client, Apricot Computers, Ltd., in 1984 This partnership introduced a wireless PC with integrated microphone capabilities, utilizing Dragon’s speech recognition for basic command and control functions Although Apricot faced bankruptcy shortly after the product's launch, Dragon Systems forged further alliances with industry giants like IBM, leading to the release of IBM VoiceType in 1991, and Microsoft, which incorporated Dragon’s technology into the Windows Sound System in 1992.

In 1986, Dragon received its first ARPA contract, initiating its work on large vocabulary, speaker-independent continuous speech recognition For the first decade, government funding played a significant role in Dragon's research budget, but eventually, the focus transitioned towards broadcast communications, as noted by Bamberg.

In 1986, Dragon launched the VoiceScribe 1000, a speech recognition system capable of processing 1,000 words, designed for IBM PC-compatible computers This hardware-dependent system came with a compatible peripheral board for IBM PC/XT or PC/AT Critics praised it, with PCWeekly describing it as “a complete and versatile isolated-word voice-recognition system.”

The magazine praised the product for successfully bridging the gap between practical applications and exploratory potential, highlighting its documentation as one of the finest introductions to voice-recognition technology A year later, the DragonWriter 1000 was launched, featuring an advanced package that promised to cut speech recognition errors by 50% Both products eventually paved the way for more advanced systems operating on personal computers.

In 1990, Dragon Systems launched DragonDictate-30K, a groundbreaking speech recognition system featuring a 30,000-word vocabulary This was the first commercially available large vocabulary speech-to-text solution for general-purpose dictation on a PC, marking a significant milestone in the industry The release garnered immediate recognition and established Dragon Systems as a leader in speech recognition technology.

5 BENDING THE TRAJECTORY:CSR Dragon Systems

Despite its impressive technology, Dragon Dictate software struggled to gain traction in the mainstream market A key factor was that while users could input text more quickly than typing, the requirement to pause after each spoken word diminished the overall user experience.

Users have long desired continuous speech recognition (CSR) software capable of understanding natural speech However, experts in the field believed that achieving CSR would take considerable time, as they outlined a natural trajectory for its development, reflecting the inherent potential of the technology.

Dragon Systems did not believe in the natural trajectory of speech recognition that these experts had declared.They set out to prove these experts wrong.

Joel Gould

Joel Gould strides into the room with a bushy mustache and dark, dusty hair, exuding confidence and determination His commanding presence is accentuated by a loud, clear voice that conveys authority, leaving no room for disagreement.

He graduated from MIT with a Bachelor's degree in Electrical Engineering in 1983 and a Master's degree in Software Engineering in 1984 After working at the Cambridge Scientific Center until its closure, he was introduced to Jim Baker by a friend in 1992.

Motivated by a desire to create impactful and highly visible products, Gould joined Dragon System with the goal of developing offerings that would stand out on store shelves, particularly at Egghead.

Joel Gould tackled the daunting challenge of continuous speech recognition, requiring him to closely monitor both technical advancements and marketing trends in the field.

However,Dragon’s quest for continuous speech recognition was not without problems as will be described shortly.

Technical Difficulties

Processor Power and Memory

Despite prevailing beliefs that advancements in computer speed and memory would delay the development of continuous speech recognition by five to ten years, the Bakers predicted that high-end desktop machines would achieve this capability within a few years Additionally, Dragon Systems, lacking the computational resources of larger competitors, focused on creating more efficient algorithms to meet their objectives within limited time constraints, as noted by Gould.

32 MacKenzie,Donald, Inventing Accuracy:A Historical Sociology of Nuclear Missile Guidance

5 BENDING THE TRAJECTORY:CSR Dragon Systems

Insufficient Speech Recognition Models

Classical pattern matching techniques have struggled to effectively address the challenges of continuous speech recognition One significant issue is their inability to accurately distinguish between words, leading to confusion in speech analysis Additionally, the process of searching for the correct match within a database requires an exponential amount of computational time, making it impractical for real-time applications.

Hidden Markov Models (HMM) have gained popularity in discrete speech recognition due to their ability to be expanded for continuous speech recognition, unlike traditional pattern matching methods This advancement allowed speech recognition challenges to be addressed in polynomial time rather than exponential, making HMM a pivotal technology for continuous speech recognition Notably, Dragon Systems was ahead in this domain with their Dragon Dictate product, which utilized HMM technology, and they had also been developing continuous speech recognition solutions for government applications with funding from DARPA.

Social Difficulties

Baby Dragon

Despite the success of their Dragon Dictate product, Dragon Systems remained a small company To ensure the timely release of Dragon NaturallySpeaking, they recognized the need to strengthen their research department However, limited capital posed a significant challenge to making this necessary investment.

The Bakers recognized a critical challenge in their business, prompting Jim Baker to assemble a new development team focused on creating Dragon's first continuous speech recognizer Simultaneously, Janet Baker negotiated a deal with Seagate Technologies to acquire 25 percent of Dragon's stock, providing essential funds to expand the company's engineering, marketing, and sales teams Within a year, Dragon boasted the world's largest speech research team, consisting of over fifty scientists and engineers, which was instrumental in the development of Dragon NaturallySpeaking Despite its achievements, Dragon Systems remains a small entity at risk of being overshadowed by larger competitors Market research expert Peter Ffoulkes emphasizes the need for Dragon to form partnerships to enhance their market presence, noting that their current resources limit their growth potential.

Military Spelling versus Natural Spelling

In dictation, spelling out words is often essential, as speech recognition systems struggle to differentiate between similar-sounding letters such as v, b, e, and d This challenge mirrors the difficulties people face when trying to understand names communicated over the telephone.

Dragon Dictate addressed the issue of individual letter output by implementing the military alphabet, effectively improving the accuracy of dictation For instance, when dictating the sound "be," this approach enhances clarity and precision in transcription.

5 BENDING THE TRAJECTORY:CSR Dragon Systems of context, “B.”, “b”, “be”,or “bee”can show up.To force the program to output “b” or “B”, saying’Bravo’instead works.See Figure 2 below.

A=Alpha B=Bravo C=Charlie Dta Eo F=Foxtrot

G=Golf H=Hotel I=India J=Juliet K=Kilo L=Lima

M=Mike N=November O=Oscar P=Papa Q=Quebec R=Romeo

S=Sierra T=Tango U=Uniform V=Victor W=Whiskey X=X-ray

Many customers express discomfort with using the military alphabet for spelling, preferring a more natural approach to communicate their words effectively.

The implementation of natural spelling by Dragon System necessitated a comprehensive redesign of both the algorithm and the user interface, according to Gould This shift in approach emphasized that the primary objective is not merely to enhance spelling accuracy, but to effectively address the needs of the customer’s program For instance, if the software incorrectly spells a customer's last name, it should be able to produce a natural spelling that aligns with the user's expectations.

Natural spelling presents significant challenges, but high accuracy can be attained by integrating letter recognition techniques with a focus on sequences that correspond to dictionary words.

To effectively address customer issues, Dragon shifted its design philosophy, focusing on creating products that meet customer needs rather than solely addressing engineering challenges.

Re-engineering the engineers

Two years ago, major speech research houses would have dismissed the possibility of technology like Dragon Naturally Speaking, according to Gould He emphasizes that one of his biggest challenges in developing Naturally Speaking was persuading the research community of its feasibility.

Department that it was possible.” 36

According to Gould, research is typically conservative in nature For decades, Dragon's research department has focused on developing large vocabulary general-purpose continuous speech recognition systems, achieving significant advancements each year Their ultimate objective is to accurately recognize any individual's voice in real-time.

However,when Dragon calculated the time when Dragon NaturallySpeaking would ship,the recognition performance would not be adequate given the projection of expected improvements based on past history.

Gould emphasized the importance of maintaining optimism in research, asserting that concentrating efforts on enhancing recognition performance for commercial products, rather than solely addressing government requirements, would lead to significantly better outcomes than historical data suggests.

Gould assured the engineers that achieving their goals was feasible, emphasizing the necessity of exceeding a specific accuracy threshold for their product He committed to enhancing the accuracy to surpass the required percentage, demonstrating a proactive approach to product development.

5 BENDING THE TRAJECTORY:CSR Dragon Systems his team to design additional algorithms to work around the know limitations in the continuous speech recognition.

Keeping It Away from the Competitors

Dragon, a small company focused on innovation in speech recognition technology, recognized the critical importance of being the first in the continuous speech recognition market Competing against a major player like IBM posed significant challenges for them.

Gould emphasizes that while Dragon is an innovative company, it remains small and vulnerable to larger competitors He warns that if they pre-announce future plans, bigger rivals could easily replicate their products For instance, he cites NaturallyOrganized, noting that each new product launch risks undermining their own market position.

Engineers at Dragon Systems collaborated with Microsoft to ensure their continuous speech recognition product was compatible with Windows However, when Dragon Systems requested timely fixes to Microsoft’s code, Microsoft engineers questioned the urgency, believing that a fully functional continuous speech recognizer was still years or even decades away.

Dragon was cautious in managing their relationship with Microsoft, deliberately concealing their near completion of a continuous recognizer To maintain the illusion that their inquiries were unrelated to a specific product, they established a fictional research department email address In truth, the individuals on this email list were the very developers behind NaturallySpeaking.

Making It Natural

Since the dawn of the modern computer era, users have primarily interacted with computers via keyboards The mouse, invented by Douglass Engelbart in 1968, gained widespread popularity only with the rise of graphical user interfaces In the 1980s, voice interaction was largely a novelty, with few users engaging in speech recognition Dragon aimed to change this perception, striving to make conversing with computers a natural and intuitive experience.

This might suggest why all the Dragon products after Dragon Dictate begin with the word

Naturally This word implies Dragon’s intention of making their speech recognition product natural and ubiquitous.

Change in Market

The transition to continuous speech recognition technology has opened up a distinct market, significantly different from that of discrete recognition products As a result, Dragon needed to customize its offerings to meet the unique demands of this emerging market.

Dragon Systems primarily designed DragonDictate with a focus on the accessibility marketplace, emphasizing hands-free computer use This commitment significantly influenced the product's development and shaped its key characteristics, ensuring that users could operate their computers entirely without the use of hands.

5 BENDING THE TRAJECTORY:CSR Dragon Systems

Dragon Dictate for Windows was developed with flexibility in mind, allowing it to be utilized across diverse environments This adaptability stemmed from Dragon's uncertainty about the specific markets, beyond accessibility, where the product would be sold As a result, the program was designed to be user-configurable, enabling end-users to tailor the software to meet the needs of various markets that may not have been initially considered during its development.

Finally,in Dragon NaturallySpeaking,the focus even more.Gould says, “I like to say that

Dragon NaturallySpeaking aimed to develop a user-friendly program tailored for general text creation, prioritizing broad appeal over immediate accessibility features Future versions are expected to address accessibility concerns.

Dragon NaturallySpeaking was developed to help individuals, particularly professionals like lawyers and doctors, create documents and emails more efficiently without the need for fast typing skills The product's design reflects this focus, prioritizing speed and ease of use over complete voice control, as users in the accessibility market often prefer using a mouse for computer navigation.

Dragon focused on simplifying the text creation process, according to Gould Notably, the initial version of Dragon NaturallySpeaking lacked user-configurable commands, emphasizing its commitment to ease of use.

Dragon System maintains a positive outlook on its Dragon Dictate product, emphasizing its enduring market presence According to Bamberg, there will always be demand for discrete speech recognition solutions, particularly in the assistive needs sector, where Dragon Dictate is regarded as a vital tool for enhancing daily computing tasks.

Gould firmly claims that Dragon Dictate is "dead," arguing that the advanced speech technology of NaturallySpeaking far surpasses it, eliminating the need for discrete software This perspective may stem from Gould's experience as the lead architect of NaturallySpeaking, highlighting his deep knowledge of the technology's capabilities.

Change in Marketing Channels

The shift in customer demographics posed a significant challenge, as traditional marketing strategies for Dragon Dictate were ineffective for Dragon NaturallySpeaking, targeting the wrong audience Consequently, this change in customer type required a comprehensive overhaul of the marketing approach to effectively engage the new market.

DragonDictate for Windows was mainly distributed through a network of value-added resellers (VARs), each focusing on a limited number of high-maintenance clients, including individuals with disabilities These VARs tailored the product to meet the specific needs of their customers The pricing of DragonDictate for Windows was strategically established to enable VARs to earn a reasonable profit despite the low sales volume.

Dragon NaturallySpeaking primarily marketed through retail channels, necessitating lower prices due to competitive pressures rather than marketing strategies Additionally, it offers a VAR product through a VAR channel, specifically the Professional Edition, which shares many features with DragonDictate for Windows, including configurability This allows resellers to tailor the software to meet specific customer needs.

41 Dragon had always intended to add this.User configurable commands later appeared in Dragon NaturallySpeak- ing Professional Edition version 2.0 in the fall fof 1997

5 BENDING THE TRAJECTORY:CSR Dragon Systems

Change in Corporate Attitude

To develop Naturally Speaking, Dragon Systems had to shift from a research-focused organization to a market-driven company, according to Gould.

The research department at Dragon has been focused on addressing specific problem sets tied to government projects, particularly those funded by DARPA This influence is evident in design choices for Dragon Dictate, including the incorporation of the military alphabet for spelling.

Dragon Dictate serves a crucial role in the assistive needs market, where users adopt the software out of necessity Customers invest time to master its unique features, including the military alphabet, despite any design choices made by the engineers This adaptability highlights the commitment of users to navigate the intricacies of Dragon Dictate to meet their needs.

The market for Dragon NaturallySpeaking, which focuses on continuous speech recognition, differs significantly from that of discrete speech recognition While the government showed interest in this technology, it was not a priority compared to discrete options Additionally, the assistive market had less demand for continuous speech recognition, as the primary customers were mainstream professionals accustomed to keyboard interaction These users did not have an immediate need for speech recognition technology.

Dragon Systems faced the challenge of identifying and attracting customers for Dragon NaturallySpeaking, which required a significant shift in corporate strategy This transition emphasized the importance of the marketing department, leading Dragon to evolve from a research-focused organization to one that prioritizes customer needs and experiences.

Success

Dragon NaturallySpeaking made a significant impact at the 1997 COMDEX trade show, where it unexpectedly stole the spotlight The software, developed by Dragon Systems, achieved remarkable success and was recognized with PC Week’s Best of 1997 COMDEX award.

Since then,Dragon Naturally Speaking has won more than 44 awards worldwide,including PC

In 1997, the software industry recognized significant achievements, including the World’s Class Awards for Most Promising Software Newcomer in May, PC Week’s Best of COMDEX in November, and Popular Science’s Best of What’s New Awards, where it was honored as the Grand Winner in Computers and Software in December Additionally, it received the Technical Excellence Award from PC Magazine in December 1997, highlighting its innovative contributions to the field.

In the calendar year 1998,Dragon Naturally Speaking sales exceeded competitor’s offerings in both revenues and units according to PC Data(April 1998).

The computer industry was astonished by the remarkable achievements of the small start-up, Dragon Systems, which showcased groundbreaking technology at a leading trade show Defying the expectations of speech recognition experts, Dragon Systems successfully altered the conventional development path of continuous speech recognition to their advantage.

However,Dragon System’s stardom and early market advantage in the speech recognition field Would soon be compromised by the emergence of large,aggressive corporate competitors.

IBM

ViaVoice

Patri Pugliese, Research Operations Manager and Government Contracts Manager at Dragon, highlighted that IBM unveiled ViaVoice just before Dragon Systems launched Dragon NaturallySpeaking This marked the introduction of the world's first general-purpose large vocabulary continuous speech recognition product during a press conference.

42 IBM Software-Speech Recognition,http://www.software.ibm.com/speech/

On April 2, 1997, Dragon Systems launched its speech recognition software in both New York City and San Jose, CA, prompting IBM to expedite its own product development By mid-June, IBM responded to the competition by releasing version 1.0 of ViaVoice just six weeks later.

In the competitive landscape of speech recognition software, IBM's ViaVoice initially held an advantage over Dragon NaturallySpeaking due to its lower price point of $99 compared to NaturallySpeaking’s $695 This sparked a price war, leading to significant reductions, with Dragon NaturallySpeaking Standard 3.0 eventually priced at $99.95 and IBM ViaVoice at $49.95 ViaVoice distinguished itself by allowing dictation into various word processors and spreadsheets, which, as noted by Michael Caton in his January 21, 1998 PC Week article, provided a notable productivity boost for users In response to IBM’s offerings, Dragon launched Dragon Point & Speak in early 1998, a budget-friendly solution priced under $59 that enabled seamless dictation into popular Windows applications, enhancing user convenience across various platforms.

In a comparative analysis of speech recognition software, Dragon NaturallySpeaking outperformed IBM's ViaVoice, according to former Dragon Systems employee Gardner She noted that Dragon offers a more integrated product, attributing this to the strong leadership of Joel Gould within the development team In contrast, IBM's size and corporate culture resulted in a less cohesive and more cumbersome user experience Despite leaving Dragon due to conflicts with Gould, Gardner still favors Dragon Systems for its agility and innovation as a smaller company, suggesting that this is where true success lies.

Lernout and Hauspie

Microsoft

Paul Bamberg expresses his concerns about Microsoft, stating that he worries primarily about the tech giant He fears a future where all Windows users are required to pay an annual license fee, which could lead to a scenario where, for just an additional fifteen dollars, they would gain access to a fully integrated dictation system within their operating system.

We can’t compete,”says Bamberg of this sticky,but realistic,situation 66

62 Microsoft and Lernout & Hauspie Announce Strategic Alliance In Support of Voice-Enabled Computing, http://www.microsoft.com/presspass/press/1997/sept97/msl & hpr.htm

Microsoft, often referred to as the "sleeping giant" by Jim Glass of MIT's EECS department, has been slow to enter the speech recognition market with its own competing product However, the company maintains a stronghold on the industry by enforcing compliance with its Speech Application Programming Interface (SAPI) As Gould explains, "Microsoft wins in the middleware," as this requirement ensures that all products designed to operate in the Windows environment must adhere to SAPI standards.

Microsoft is set to launch its own speech recognition product alongside SAPI, with plans to integrate a Text-To-Speech engine in Windows 2000 The company is actively developing 'Whisper', a proprietary speech recognition system, while focusing on enhancing acoustic and language modeling algorithms to boost accuracy and usability Their research aims to create advanced conversational systems for both telephone and home platforms, as well as to improve dictation capabilities across multiple languages.

“Our business plans do not assume we will replace Microsoft,”Gould exclaims realistically.

Microsoft currently does not cater to specific vertical markets like law and healthcare In the future, there may be products developed that will be compatible with Microsoft software to serve these industries Dragon Systems anticipates that their software will be available on most operating systems soon, although there is a possibility that Microsoft could eventually surpass them While Dragon may not directly compete with Microsoft in these niche markets, they will be in competition with L&H.

Larry Gillick expresses his ambition to be a significant contender in the evolving speech recognition market, despite the challenges posed by Microsoft, which has been scrutinized through antitrust lawsuits for its aggressive tactics He acknowledges that aiming for complete dominance isn't beneficial for the industry As Microsoft potentially pushes Dragon out of the PC market, it may lead to exploration in other applications of speech recognition, which will be elaborated on in the following section.

Others

Philips

Philips FreeSpeech98 is a limited speech recognition solution that falls short in accuracy compared to competitors like NaturallySpeaking, Voice Xpress, and ViaVoice Additionally, it lacks essential features such as multi-user support, true modeless operation, and voice macros, as highlighted in a review by PC Magazine.

FreeSpeech98, launched in October 1998, targets home users with its 30,000-word dictionary Philips partnered with UbiQ, a prominent sales and marketing organization, to promote FreeSpeech98 to the global OEM community UbiQ anticipates that one million licenses will be sold to PC manufacturers for integration into desktop and laptop computers by the end of 1999.

Despite Philip’s plans to take a larger share of the speech recognition market,PC Magazine’s Craig Stinson doubts its anticipated customer acceptance “Although[Philips FreeSpeech98]is

69 MSR Research Areas:Speech Technology,http://research.microsoft.com/srg/srproject.htm

71 Craig Stinson,Philips FreeSpeech98,PC Maganize,October 20,1998

72 Philips Speech Processing-Press Office-Press Release,http://www.speech.be.philips.com:100/bin/owa/psp-s-press?xidw3

Dragon Systems offers the lowest-priced product in the speech recognition category, FreeSpeech98, which unfortunately does not come with a microphone and has the lowest accuracy among the programs tested Additionally, it lacks many essential features Investing a little more in a more robust solution is advisable for better performance.

Future Competitors

The speech recognition market is still in its infancy, with significant growth anticipated in the coming years, as noted by industry expert Gould Dragon Systems is poised to embrace this competitive landscape, with President Janet Baker emphasizing that multiple competitors are crucial for driving technological advancement and expanding the market She welcomes the influx of new players, highlighting the ample opportunity for growth Pugliese reinforces this sentiment by stating that Dragon's key advantage lies in delivering higher accuracy for standard users, focusing on essential features that truly meet consumer needs.

73 Craig Stinson,Philips FreeSpeech98,PC Maganize,October 20,1998.

7 SPEECH RECOGNITION APPLICATIONS OF THE FUTURE Dragon Systems

7 Speech Recognition Applications of the Future

Research in speech recognition is evolving, with experts like Larry Gillick highlighting ongoing advancements in recognizers that can interpret various speech styles, from careful to natural and broadcast Continuous speech recognition is increasingly valued for its practicality, yet discrete speech recognition remains relevant for certain consumer devices.

Janet Baker aims to make speech recognition universally accessible, showcasing her unique understanding of the technology and its potential applications Her visionary approach positions her ahead of her time, enabling her to foresee the significance of advancements in speech recognition Perhaps IBM's characterization of the Bakers as 'premature' reflects their innovative foresight in this evolving field.

Unfortunately,speech recognition systems are still hampered by the rate of growth of the

Handware.Although hardware is advancing in astronomical proportions,it still does not allow

Efficient speech recognition algorithms are essential for real-time functionality, as emphasized by Gillick, who states that the technology only gained traction when continuous speech recognition became feasible Consequently, Dragon often delays implementing new improvements in their recognizer until the necessary hardware becomes widely accessible for everyday PCs Gillick notes that due to hardware limitations, they frequently need to scale down advancements, but there are always additional innovations ready to be introduced.

PC Desktop Integration

Both Joel Gould and Janet Baker compare the adoption of the mouse to the adoption of speech.

Fifteen years ago, college students were primarily 'keyboard-centric' users, but today they have shifted to being 'mouse-centric,' according to Gould He anticipates that in the next decade, the majority of college students will evolve into 'speech-centric' users Janet highlights a Boston Globe article from December 6, 1998, noting that while the adoption of the mouse took thirty years, speech recognition technology is already well past the halfway mark in its adoption journey.

Handheld Devices

Olympus has launched the D1000 Digital Voice Recorder, which is available with IBM's ViaVoice Speech recognition software for around $300 According to Matthew Gravern in PC Magazine, users can connect the recorder to their computer's audio card line-in adapter to transcribe their recordings automatically Additionally, Dragon has advanced the mobile market with its innovative Dragon NaturallyOrganized software.

“We have started to move off into the mobile space,”says Joel Gould “Janet often says,

‘Keyboards are getting smaller,but fingers aren’t.”’ 84 Dragon NaturallyOrganized,the world’s first

78 Simpson L.Garfinkel,Enter the Dragon,Technology Review,September/October 1998.

83 Matthew Gravern,New Speech Techonlogy,PC Magazine,October 20,1998.

7 SPEECH RECOGNITION APPLICATIONS OF THE FUTURE Dragon Systems

On November 16, 1998, Dragon Systems unveiled the Natural Speech Productivity Assistant, NaturallyOrganized, at COMDEX 98 This innovative software operates on the Dragon NaturallyMobile digital recorder, the first digital recorder designed specifically for speech recognition Similar to a PalmPilot, users interact with the device using their voice to record reminders like "Remember to buy milk" or "Send an email to Professor Mindell." Once the user connects the recorder to a PC, NaturallyOrganized processes these tasks, distinguishing between notes, emails, and appointments, and executes commands upon user approval The initial version of Dragon NaturallyOrganized is compatible with the ACT! Contact management system, with plans for future updates to support additional applications such as Microsoft Outlook, Goldmine, and Timeslips.

Language Translation

Many companies are interested in the language translation area of the speech recognition market

Lenout & Hauspie has revolutionized online translation with the L&H iTranslator, providing comprehensive translation services for English, German, Spanish, French, Arabic, Japanese, and Korean through a web server and client/server architecture.

Dragon is collaborating with DARPA on a long-term project focused on language translation technology Paul Bamberg shares that he received an email from a DARPA supporter expressing interest in a phrase translation system for use in Bosnia, which requires a speech interface for demonstration This translation device, known as Babelfish, enables U.S soldiers to communicate essential wartime phrases effectively.

The device utilizes Dragon Dictate to recognize and process the query about the storage of dangerous materials It converts the spoken phrase into keystrokes, verifies it with the user, and then vocalizes it in the user's native Bosnian language Paul highlights the swift progress, noting that a contract was established just two weeks after the initial demonstration He emphasizes the system's recognition within DARPA, mentioning that it was regarded as one of the significant achievements of the previous year.

Paul Bamberg is optimistic about this area of research.He predicts, “Ten years from now[we will have]one spoken language in,another spoken language out.”

Asian Languages

Victor Zue, head of the Spoken Language Systems Group at MIT, identifies the Asian market as a key area for future growth in speech recognition technology He notes that languages with challenging keyboard entry make speech recognition essential, prompting companies like Intel and Microsoft to establish research centers in China In response to this emerging market, Dragon is actively developing its own Mandarin recognition capabilities to stay competitive.

85 Dragon Systems Current News,http://www.dragonsys.com/frameset/currentnew2.html

87 L&H iTranslation,http://www.itranslator.com/

7 SPEECH RECOGNITION APPLICATIONS OF THE FUTURE Dragon Systems

Speech Understanding

Speech understanding is a key objective for many speech recognition organizations, despite L&H's misleading claims about their natural speech command structure Currently, most available products function merely as dictation tools, capable of transcribing spoken words without grasping their intended meaning Numerous companies, along with government and educational institutions, are actively conducting research to advance the field of speech understanding.

Victor Zue is developing JUPITER, a cutting-edge conversational system that offers real-time weather updates over the phone Users can inquire about weather conditions in specific locations, such as "What's the weather like in Beijing?" or "Is it raining in Los Angeles?", and JUPITER accurately interprets and responds to these queries Notably, the system boasts an impressive accuracy rate of 80% for first-time users, which significantly increases to over 95% for experienced users.

Joel Gould confidently asserts that advancements in speech understanding are imminent, while Larry Gillick expresses skepticism, suggesting that technology may never fully match human capabilities and could take centuries to develop When this evolution in speech recognition occurs, it is poised to revolutionize societal interactions with computers, marking a significant paradigm shift in the industry.

91 JUPITER,(617)258-0300,http://www.sls.lcs.mit.edu/sls/whatwedo/applications/Jupiter.htm1

8 CONCLUSION:CAN DRAGON CLAIM SUCCESS? Dragon Systems

8 Conclusion:Can Dranon Claim Success?

Dragon Systems is a leading name in the speech recognition industry, thanks to the pioneering efforts of Jim and Janet Baker in developing the dominant HMM technology Their contributions have earned them significant respect within the field The company's products consistently receive numerous awards and positive reviews at international trade shows, solidifying Dragon Systems' reputation for success in the market.

Success is a subjective concept that differs for each individual According to Joel Gould, a company’s success is defined by sustainable growth and profitability Personally, I consider myself successful if, five years from now, the way people interact with their computers through speech reflects significant advancements.

Janet Baker humorously reflects on Joel's personal definition of success, contributing her own perspective: "Success is reaching your goals." Together with Jim, she highlights their achievement in developing a large-scale vocabulary continuous speech recognizer, while acknowledging that challenges remain, emphasizing that their work is far from complete.

A company's success is frequently evaluated through the lens of its mission statement goals Janet Baker highlights that Dragon Systems aims to develop leading speech technology that becomes ubiquitous However, relying solely on mission statements as a measure of success can be unrealistic, as many lofty goals serve more as aspirational targets for employees These long-term objectives often play a crucial role in fostering unity among employees within a corporate setting.

Jim Baker and Paul Bamberg established Dragon Systems with the aim of creating an organization that embodies a lifetime of engaging research challenges This initiative fosters an environment rich in innovative opportunities and intellectual exploration.

“sort of like an academic environment,”according to Larry Gillick “We talk about ideas and have lots of books on the shelves.” 98

The establishment of the entity discussed by Bamberg has been accomplished, yet the widespread adoption of speech recognition technology is still pending Evaluating Dragon Systems solely based on their goal completion does not provide a comprehensive understanding of their success Success is not merely a destination but a continuous journey that requires ongoing effort While Dragon Systems has achieved recognition as a successful company, sustaining that success presents its own significant challenges.

The alliance between Microsoft and Lernout & Hauspie gives Microsoft a competitive edge in developing speech recognition products for the Windows platform With Microsoft planning to include its own speech technology applications in Windows 2000, Dragon Systems faces a significant challenge, as customers may prefer free Microsoft applications over paid alternatives, even if they are superior In response, Dragon is adapting its business model to focus on emerging markets for speech recognition, including mobile devices, handhelds, and language translation tools.

Dragon Systems stands out as a pioneer in the speech recognition industry, driven by Co-Founder Jim Baker's innovative introduction of Hidden Markov Models as a key algorithm In 1997, the company launched the first product utilizing this groundbreaking technology, solidifying its position as a leader in the field With a commitment to innovation, Dragon Systems is well-equipped to continue making significant strides in speech recognition.

In conclusion, Dragon Systems has established itself as a pioneer in the market with the launch of Dragon NaturallySpeaking in 1998, the world's first productivity assistant As Gillick notes, their innovative approach and strategic foresight allow them to outperform competitors consistently Bamberg highlights Dragon's global success, evidenced by the enthusiastic crowds at trade shows eager to explore their offerings Joel Gould emphasizes the company's aggressive pursuit of advancements in speech technology, reinforcing Dragon's commitment to leading the industry with imaginative products.

Dragon's success is significantly attributed to its strong employee relations, fostering a positive work environment By prioritizing employee satisfaction, Dragon ensures that staff members look forward to coming to work and are willing to put in extra hours Cornelia Sittel, a Quality Assurance representative, highlights this atmosphere, noting, “It’s nice to get a lot of work done and not be stressed out.” This commitment to employee happiness contributes to Dragon's overall productivity and success.

Dragon fosters a unique workplace culture that encourages pet companionship, eliminates dress codes, and offers flexible working hours This culture has remained consistent over the past decade, driven by a collective ambition to create marketable technology According to Gillick, the research department prioritizes ideas that are not only intriguing but also commercially viable, ensuring that innovation takes precedence over bureaucratic obstacles.

In conclusion, Dragon Systems is poised for success due to their commitment to creating enjoyable products and fostering a positive work environment As Gillick notes, the team's passion for their work and the emphasis on research contribute significantly to their achievements, highlighting the importance of a supportive atmosphere in driving innovation and productivity.

“Dragon’s growth has paralleled the growth of the PC,”claims Gillick,regarding the recent recruiting efforts of Dragon Systems “We are tracking it as gets more powerful.” 104 Despite

Ngày đăng: 18/10/2022, 16:33

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] Baker,James K. “Stochastic Modeling as a Means of Automatic Speech Recognition.”Carnegie- Mellon University,1975 Sách, tạp chí
Tiêu đề: Stochastic Modeling as a Means of Automatic Speech Recognition
[2] Caton,Micheael. “IBM Takes Dictation Further”, PC Week,January 21,1998 Sách, tạp chí
Tiêu đề: IBM Takes Dictation Further
[3]Damerau,Frederick J. “Markov Models and Linguistic Theory.” Mouton & Co.,The Nether- lands,1971 Sách, tạp chí
Tiêu đề: Markov Models and Linguistic Theory
[4]Garfinkel,Simpson L. “Enter the Dragon”, Technology Review,September/October 1998 Sách, tạp chí
Tiêu đề: Enter the Dragon
[8]Reddy,D.R. “Speech Recognition by Machine:A Review.”IEEE Proceedings,64(4):502-531, April 1976 Sách, tạp chí
Tiêu đề: Speech Recognition by Machine:A Review
[9]Sipser,Michael. “Introduction to the Theory of Computation.”PWS Publishing Company, Bostton,1997 Sách, tạp chí
Tiêu đề: Introduction to the Theory of Computation
[29]CompUSA:The Online Superstore,http://www.compusa.com Link
[30]Dragon Systems Current News,http://www.dragonsys.com/frameset/currentnew2.htm1 Link
[31]IBM Software-Speech Recognition,http://www.software.ibm.com/speech/ Link
[32]JUPITER,(617)258-0300,http://www.sls.lcs.mit.edu/sls/whatwedo/applications/Jupiter.htm1 Link
[33]L&H Itranslation,http://www.itranslator.com/ Link
[34]L&H Press Release(19980428)Voice Xpress(TM),http://www.lhs.com/news/releases/19980428-VoiceXpress.a Link
[36]MSR Research Areas:Speech Technology,http://research.microsoft.com/srg/srproject.htm Link
[37]PC Magazine:Speech Recognition,http://www.zdnet.com/pcmag/features/speech98/rev3.html Link
[38]Philips Speech Processing - Press Offce - Press Release, http://www.speech.be.philips.com:100/bin/owa/PsP-s-press?xid=773 Link
[5]Gravern,Matthew.”New Speech Technology”,PC Magazine,October 20,1998 Khác
[6]MacKenzie,Donald,”Inventing Accuracy:A Historical Sociology of Nuclear Missile Guidance” Khác
[7]Pallet,D.Group Manager of Spoken Natural Language Processing Group,NIS.(Email) Khác
[10]Stinson,Craig.Philips FreeSpeech98,PC Maganize,October 20,1998 Khác
[11]Baker,James.Interviewed by S.L.Garfinkel,1997 Khác

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w