Kirk: "Computer, how many furlongs in a light year."There's the memorable scene in the movie, Star Trek IV: The Voyage Home, in which Scotty attempts to interact with a computer by talking into a mouse. Also, by 2001, we should have had talking computers like the HAL 9000. Everyone seems to think that computers should accept verbal questions and give good answers; that is, computers should have a natural language user interface (NL). Some applications are now coming close to that ideal. Siri would be one example. One reason why we're getting close to the Star Trek ideal is the phenomenal computing power that's now available at such a low cost in many consumer devices. Another reason might be the decision by NL scientists to divorce themselves from the artificial intelligence (AI) field. Many computer scientists decided to stop identifying their work as artificial intelligence when there was a backlash against AI's being over-sold to funding agencies in the last decades of the twentieth century. NL was long in coming, since the problem goes far beyond parsing speech into a text file for further processing. The question, "List all bloggers on web sites with physics degrees," might be a problem to an NL system, since there are probably no web sites with physics degrees. Computer scientists at MIT's Computer Science and Artificial Intelligence Laboratory have tackled a natural language interface for the specific task of forming regular expressions. Regular expressions are a scripting language contained in many programming languages and word processors to aid text search and replacement. This research was presented in June at the annual conference of the North American Chapter of the Association for Computational Linguistics.[1-2] I used a regular expression in OpenOffice to remove extra linefeeds from the manuscript of one of my novels. Unless you use these expressions regularly (pun intended), it takes a while to craft the exact expression that solves your problem. As can be seen in the example in the following figure, you essentially need to be a computer scientist to craft even a simple regular expression.
Computer: "Working... There are 46,996,813,387,000 furlongs
in a light year."
Kirk: "Well, Spock, I don't think he traveled by horse."
Spock: "Agreed, Captain."
Even simple regular expressions are not that simple. (MIT Graphic by Christine Daniloff.)[1] |
Something to make humans chortle and natural language interfaces choke. The first verse of "Jabberwocky" in English and Swedish. Jabberwocky is a nonsense poem written by mathematician, Lewis Carroll. It's contained in his 1871 novel, Through the Looking-Glass, and What Alice Found There. (Swedish translation (Tjatterskott) by Harry Lundin.) |
In other work on natural language processing at MIT, Barzilay, Tao Lei, Fan Long and Martin Rinard have developed a program, called an input parser, to sort the data from other information in a computer file.[3] A text file, for example, might have information about text formatting along with the actual text.[1] Their parser interprets the natural language specification of the file format, something that a programmer needs to do when creating a program to read and write such files. The MIT team had a good development resource, about 180 file format examples used in the Association for Computing Machinery's International Collegiate Programming Contest. The MIT natural language interface succeeded in about 80 percent of the specifications. In the failed cases, changing just a word or two of the specification usually gave a working input parser.[1] The natural language interface was efficient, taking about ten minutes of calculation on an ordinary laptop to produce the parsers for all these specifications.[1] Luke Zettlemoyer, an assistant professor of computer science and engineering at the University of Washington, who was not a part of the natural language interface team, said that "the techniques they have developed should definitely generalize to other related programming tasks."[1]
([A-Za-z]{3})&(\b[A-Za-z]+\b)&(X.*)
three letter [A-Za-z]{3} word \b[A-Za-z]+\b starting with ’X’ X.*