Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines

Since 1851, obituaries in The New York Times have been dominated by white men. With Overlooked, we’re adding the stories of remarkable people whose deaths went unreported in The Times.

By Nellie Bowles

When most scientists were trying to make people use code to talk to computers, Karen Sparck Jones taught computers to understand human language instead.

In so doing, Sparck Jones’s technology established the basis of search engines like Google. A self-taught programmer with a focus on natural language processing, and an advocate for women in the field, she also foreshadowed by decades Silicon Valley’s current reckoning, warning about the risks of technology being led by computer scientists who were not attuned to its social implications.

“A lot of the stuff she was working on until five or 10 years ago seemed like mad nonsense, and now we take it for granted,” said John Tait, a longtime friend who works with the British Computer Society.

Her seminal 1972 paper in the Journal of Documentation laid the groundwork for the modern search engine. In it, she combined statistics with linguistics — an unusual approach at the time — to establish formulas that embodied principles for how computers could interpret the relationships between words.

By 2007, Sparck Jones said, “pretty much every web engine uses those principles.”

“Anything that does index-term weighting using any kind of statistical information will be using a weighting function that I published in 1972,” she said in an interview with the British Computer Society.

Karen Ida Boalth Sparck Jones was born on Aug. 26, 1935, in Huddersfield, England, a textile manufacturing town. Her parents were Alfred Owen Jones, a chemistry lecturer, and Ida Sparck, who worked for the Norwegian government while in exile in London during World War II.

When studying history and then philosophy (the department was then called moral sciences) at Cambridge, she met the head of the Cambridge Language Research Unit, Margaret Masterman, whom Sparck Jones would describe as “a very strange and interesting woman” who used her maiden name professionally — and who was her inspiration for entering the field.

Sparck Jones, too, would keep her name when she married Roger Needham, a fellow computer scientist, in 1958, saying, “It maintains a permanent existence of your own.”

Sparck Jones started working for Ms. Masterman. She wanted to figure out how to program a computer to understand words that could have many meanings (for example “field”) and set about programming a massive thesaurus.

“A lot of the stuff she was working on until five or 10 years ago seemed like mad nonsense, and now we take it for granted,” John Tait, of the British Computer Society, said about Sparck Jones.CreditComputer Laboratory/University of Cambridge

“All words in a natural language are ambiguous; they have multiple senses,” she said, during an oral history conducted by the Institute of Electrical and Electronics Engineers’ History Center. “How do you find out which sense they’ve got in any particular use?”

In 1964, Sparck Jones published “Synonymy and Semantic Classification,” now seen as a foundational paper in the field of natural language processing.

In 1972, she introduced the concept of inverse document frequency, which counted the number of times a term was used in a document in order to determined its importance and is a foundation of modern search engines.

In the 1980s, she began working on early speech recognition systems.

Most mornings and afternoons, Sparck Jones and her husband, a pioneer in software security, debated theory in the department’s tea room. It was one of many passions they shared.

Their home in Coton, just west of Cambridge, was full of books, art and found items like an interesting piece of driftwood or a Victorian-era knife grinder. They had a second house in the same village that stored an overflow collection of books and served as her artist’s workshop. One of her pieces was hung up at the Microsoft Research Lab.

Sparck Jones and Needham restored an 1872 vintage sailboat called Fanny of Cowes and would race against other old boats along the east coast of England. They chose not to have children.

“They wanted their intellectual life,” said Andrew Herbert, her friend and a fellow computer scientist. “They were clearly deeply in love with each other all the way through their life.”

Sparck Jones had a booming voice and a puckish sense of humor. At work, she usually wore a simple uniform: bluejeans, red sweater, white blouse. She also wore a brooch that she made from part of a horseshoe and some stones. When she had to bike to a formal dinner, as one often did at Cambridge, she was known to use clothing pegs to pin her dress to the handlebars.

In 1982, the British government tapped Sparck Jones to work on the Alvey Program, an initiative to encourage more computer science research across the country. In 1993, she wrote, with Julia R. Galliers, “Evaluating Natural Language Processing Systems,” the seminal textbook on the topic.

In 1994, she became president of the Association for Computational Linguistics, an international group for professionals in the field.

In 1999, she became a full-time professor at Cambridge, and it bothered her that it took so long. For all the years before, she had been on contract with the university, an untenured and lower-status form of academic employment referred to as “living on soft money.”

“Cambridge was, in many ways, not user-friendly, in the sense of women-friendly,” she said of the delay.

In 2004, she won the Association for Computational Linguistics Lifetime Achievement Award and in 2007, the British Computer Society’s Lovelace Medal and the Association for Computer Machinery/AAAI Allen Newell Award.

She died on April 4, 2007, of cancer. She was 71. She did not receive an obituary in The New York Times, although her husband, who died in 2003, did.

Today, researchers are still citing her formulas.

Ideas she wrote about that seemed abstract at the time are now being put into practice as artificial intelligence research becomes more prevalent.

“It points to how far ahead of her time she was, how consequential her work was, how little it was valued for the first twenty years,” said Martha Palmer, a professor in the Linguistics and Computer Science departments at the University of Colorado.

She also mentored a generation of researchers, male and female. She came up with a slogan: “Computing is too important to be left to men.”

Sparck Jones was ahead of her time in another respect. Decades before Silicon Valley was having its moral reckoning, Sparck Jones cautioned engineers to think of their work’s impact on society.

“There is an interaction between the context and the programming task itself,” Sparck Jones said. “You don’t need a fundamental philosophical discussion every time you put finger to keyboard, but as computing is spreading so far into people’s lives you need to think about these things.”

[embedded content]

Be the first to comment

Leave a Reply

Your email address will not be published.