Thursday, 23 February 2012

2004 NFAIS meeting.(Report From The Field)

NFAIS helds its Annual Conference Feb. 22-24 in Philadelphia. According to the program: "A new information mindset is being created. Information usage behavior ... is being shaped by computer-driven technologies.... Use of traditional information sources ... is on the decline. This trend, combined with the current movement towards open access publishing on the Internet, has the potential to completely transform the information access and retrieval process in the very near future."

Search Engine Study

The conference opened with a keynote by Yahoo!'s chief scientist, Jan Pedersen (see page 33). The remainder of the first day was devoted to search engines and user behavior. Simon Inger, director of Scholarly Information Strategies, Ltd., presented some fascinating data from a comparative study of Google and traditional information services. For the study, a set of known articles was searched using Google. STM articles are easy to find on Google because of their specific language. In addition, their publishers make an effort to have their content indexed by this engine.

In contrast, business, management, or social science articles are harder to find with Google because their language utilizes words that are more commonly used, and their publishers' data is less well-crawled by the engine. Thus, it's more difficult to separate useful articles from the noise of large results sets. Google's overall coverage is more general than that of the selective A&I databases, but Inger noted that Google is actively working with primary publishers to increase its coverage of their data.

User Behavior

A panel of four speakers addressed user behavior. David Seaman, executive director of the Digital Library Foundation, discussed the needs of humanities researchers. He noted that they're frustrated by the time it takes to find relevant information, the lack of material in their discipline, and the need to analyze the credibility of information. They also feel hampered by a lack of training on how to find information. In the eyes of these users, publishers have a "silo" mentality, and there's little ability for users or library services to work with content across publishers or aggregators.

Seaman said that libraries are failing in their service mission because they are unable to repackage content. Publishers therefore have an opportunity to help libraries become data-aggregation services for their customers. They need metadata that will make it easier to gather and repackage information for local delivery and analysis. It's not sufficient to simply offer the current fragmented-data landscape to users.

Carol Tenopir, a professor at the University of Tennessee-Knoxville, summarized some of her extensive research. She cautioned the audience that even though we know much about user behavior, there's also still a lot we don't know. Students generally turn to Internet search engines first and feel confident about their searching ability. They recognize that not all information retrieved from the Internet is reliable. Usage patterns of subject experts vary depending on the discipline in which they work. Most users draw on both print and electronic resources, and they print out information that they plan to review at length. Users will emphasize electronic resources if they're convenient and relevant and if they save time.

We still don't know much about the differences in information-system use based on users' gender, when they were born, their culture, or their geographic location. In a continuing study of more than 18,000 users since 1977, Tenopir and her colleagues have found that information use does not seem to depend on age.

In a sample of medical faculty members, another study found that because users with higher degrees tend to read and need information more than those with lower degrees, they use electronic information more extensively. Tenopir suggested that we determine whether systems should be designed for "average" users or if they should cater to users' differences.

Joanne Witiak, an information scientist at Rohm & Haas Co., described an environment typical of many large corporate information services. Between 1997 and 2002, document delivery became an outsourced self-service operation, desktop searching tools were emphasized, and there was increased emphasis on marketing and training by the library organization. More than 85 percent of the company's users now search for information themselves.

Content

Increasingly, items other than conventional documents are being integrated into digital library catalogs. Tom Moritz, director of library services at the American Museum of Natural History, described how museum specimens and images are being indexed and integrated into that organization's digital library. Drawing on the Dublin Core model, the library developed a "Darwin Core" set of metadata elements to handle materials such as field notebooks, specimen catalogs, and frozen tissue slides as well as conventional documents (see http://www.library.amh.org).

Patrick Healey of NASA's Goddard Space Flight Center library described how he's indexing seminars, mini-courses, and historical videos. His users want to access specific content within a video without watching the whole tape. The Goddard system uses speech-recognition technology to compare the spoken words with a stored dictionary of candidate terms. Speaker accents and inflections present major challenges in this project.

Bernard Rous, director of publications at ACM, said that in an electronic world, the same work can have several different versions. Often, an electronic document has no pagination or article number and was never in print. It's therefore difficult to determine which version should appear in a catalog, and traditional citation processes become confused.

Steven Bachrach, editor in chief of the Internet Journal of Chemistry (IJC), feels that the three keys to the next generation of publishing are transformability, interoperability, and customization. An advantage of a completely electronic journal like IJC is that large collections of raw data can be made available to researchers. In a print journal, there would be no room to publish this material.

Text mining is becoming a valuable tool for information users, and new technologies are being developed to help researchers. David Lewis, formerly with Bell Laboratories and now a consultant, gave an excellent overview of text mining. He said that words are like barnacles: They get on everything. However, they're valuable as surrogates for information entities.

The former document-centric world has changed. There's now more text, a greater range of text types, more computer power, and new advances in statistics and algorithms, all of which have spurred the growing use of text-mining techniques. Lewis said that text mining is very different from indexing because it uses patterns of words, not the words themselves. It's a new use for the skills of information professionals that can add significant value to textual databases.

One of the databases most widely used in text-mining research is the National Library of Medicine's PubMed. Jane Rosov, coordinator of data distribution at NLM, described the changes made in licensing agreements to accommodate these new uses for NLM's data. The library now has more licenses for research purposes than for redistribution.

Search Engines

The second day of the conference focused on search engines and business models. David Evans, CEO of Clairvoyance Corp., said that although search creates an illusion of value, a better way to present results is to collect facets and organize and label them. Elements of state-of-the-art information retrieval include good queries, weighting terms, linguistics, and term stemming. The most important factor is to obtain interactive feedback from the user and incorporate that into the search. Today's Web search engines don't do this because search times would increase.

Chaomei Chen described his research at Drexel University, which uses time-series and co-citation analyses to visualize information structures as a function of time and to detect significant turning points and trends. He's studying problems such as how much time it takes to identify a trend and what distinguishes an abrupt change from a gradual one.

Derek Schueren, sales director at Recommind, described the use of probabilistic, latent semantic analysis to understand concepts in text. This technology classifies search results by concept and can distinguish between multiple meanings of words. It shows promise in helping users see their results in context and in a way that makes sense for them.

Corinne Jorgensen, a professor at Florida State University, described her research on image retrieval. She said that today's image-retrieval systems are not useful because people talk about images differently than text. Current image-indexing systems are text-driven, but newer methods will use image characteristics, such as color, shape, texture, and composition. A new standard, MPEG-7, has been developed to describe multimedia content. Image retrieval still has a long way to go. It will depend on several technologies and must use both human perception and cognitive responses.

Business Models

In the business model session, Lynne Brindley, CEO of The British Library, described a 3-year modernization program currently underway at her institution. Major initiatives include understanding key sectors to develop a market-facing approach, modernizing the U.K.'s legal deposit requirements to include electronic materials, providing secure electronic document delivery to the desktop, digital preservation, and building capacity in the organization by taking advantage of employees' new roles and skills.

Corilee Christou, vice president of Reed Business Information, discussed her company's response to the growing penetration of the Internet into U.S. homes. She said that Web and information gluts are now occurring. In addition, new brands are appearing, traditional brands no longer provide the same value to users, and portals are becoming common. Publishers must determine who their current audiences are and what applications their content can support. As advertising revenues shrink because of the Web, some publishers are converting free controlled-circulation publications into subscription-based works.

Jan Velterop, publisher of BioMed Central, discussed the newest trend in publishing: open access. He feels that open access is a logical consequence of having the technological capabilities of the Internet at our disposal and that the subscription model of publishing is obsolete. With open access, publishing becomes a service, and the subscription charge becomes an "article processing" charge paid by the author's organization. Authors receive peer review, immediate publication upon acceptance, XML coding, Web formatting, and deposit in open access archives. Users are able to access the material at no cost, so authors, funders, societies, and science in general receive the benefits of research results.

The conference closed with a stirring presentation by Ben Schneiderman, a professor at the University of Maryland. His book, Leonardo's Laptop, uses Leonardo da Vinci as a model for "new computing" paradigms. It speculates on the intriguing question of how Leonardo would have used a laptop and what applications he would have created. Leonardo, a Renaissance man, combined science and art, integrated engineering and aesthetics, and balanced technological advances and human values. Schneiderman concluded that computing should be usable, universal, and useful, and we must create applications that meet human needs.

[Author's Note: Speaker presentations are available at http://www.nfais.org.]

Donald T. Hawkins is information technology and database consultant at Information Today, Inc. and editor in chief of information and computer science databases at EBSCO Publishing. His e-mail address is dthawkins@verizon.net.

No comments:

Post a Comment