Home
Preliminary Program

Morning Tutorials

  • Introduction to IR
  • Machine Learning for IR
  • User-Interface Design for IR in E-Commerce
  • Content Image Retrieval
  • Multilingual IR
  • High-Performance IR

    Afternoon Tutorials

  • Medical Informatics
  • Internet Search Engines
  • Text Summarization
  • IR Evaluation
  • Advanced Machine Learning for IR

  • SIGIR '99 Tutorials


    Morning Tutorials

    Title: Introduction to Information Retrieval
    Instructor(s): W. Bruce Croft, University of Massachusetts
    Time: Morning, 8:30-12:30

    Tutorial Description: The tutorial will present a general overview of the field of information retrieval. This will include a summary of the important research problems, a short historical perspective, and a more detailed description of basic techniques and approaches in the areas of retrieval models, indexing models and text representation, evaluation, information needs and query processing, and system architecture. Important new directions in the field, such as event detection and tracking, information organization, and multimedia retrieval will also be briefly described.

    Who Should Attend: Those interested in a general overview, Beginning-Intermediate.

    About the Instructor(s):

    W. Bruce Croft is a Professor in the Department of Computer Science at the University of Massachusetts, Amherst, which he joined in 1979. In 1992, he became the Director of the NSF Center for Intelligent Information Retrieval, which combines basic research with technology transfer to a variety of government and industry partners. His research interests are in information retrieval models, text representation techniques, the design and implementation of search engine and filtering systems, and user interfaces. He has published more than 100 articles on these subjects. This research is also being used in a number of operational retrieval systems. Dr. Croft was Chair of the ACM Special Interest Group on Information Retrieval from 1987 to 1991. He is currently Editor-in-Chief of the ACM Transactions on Information Systems and an Associate Editor for Information Processing and Management. He has served on numerous program committees and has been involved in the organization of many workshops and conferences. He has received 2 awards from the information industry for his research contributions. He is an ACM Fellow and is currently serving on the Computer Science and Telecommunications Board of the National Research Council.


    Title: Machine Learning for Information Retrieval
    Instructor(s): David Lewis and Yoram Singer, AT&T Labs Research
    Time: Morning, 8:30-12:30

    Tutorial Description: This tutorial will discuss machine learning methods for IR tasks, including retrieval, categorization, and routing/filtering. The emphasis will be on supervised learning (i.e. learning from manually classified examples), with some attention to unsupervised methods (e.g. clustering, LSI) for representation change. The use of machine learning in commercial IR software will be touched upon, but the emphasis will be on research findings. The tutorial will attempt to clarify the links between important but sometimes confusing concepts from IR (e.g. term weighting, query expansion, relevance feedback, classification, etc.) and important but sometimes confusing concepts from machine learning (e.g. feature extraction, overfitting, generalization, classification, etc.). As well as being a standalone introduction, this tutorial can serve as preparation for Singer & Lewis' afternoon tutorial, "Advanced Machine Learning Techniques for IR".

    Who Should Attend: Intermediate.

    About the Instructor(s):

    David D. Lewis is a Principal Research Staff Member at AT&T Labs. Prior to that he was a research faculty member at the University of Chicago, and did his Ph.D. in Computer Science at the University of Massachusetts at Amherst. He has published extensively on the application of machine learning and natural language processing to IR, and has organized several workshops in these areas. He has been heavily involved in the design of the TREC evaluations and the construction of test collections for text categorization.

    Yoram Singer is a Principal Research Staff Member at AT&T Labs. He received a PhD in computer science from the Hebrew University of Jerusalem, Israel, in 1995. His research focuses on theory and applications of machine learning algorithms. He serves on the editorial boards of Machine Learning Journal and Neural Computing Surveys.


    Title: User-Interface Design for Information Retrieval in E-Commerce and Performance Tools
    Instructor(s): Aaron Marcus, President Aaron Marcus and Associates, Inc. (AM+A)
    Time: Morning, 8:30-12:30

    Tutorial Description: Good user interfaces and information visualizations (UI+IVs) enable users to find, retrieve, comprehend, use, and remember information more quickly, with greater ease, and with deeper satisfaction. The tutorial will introduce the basic steps of the task-centered, user-oriented development process for user-interfaces and information-visualizations (UI+IVs). The tutorial will explain the five basic components of all UI+IVs: metaphors, mental models, navigation, interaction, and appearance and will focus on organization, economy, and effective visual communication techniques that improve usability and appeal. Research and development participants will learn practical principles, become familiar with existing techniques, and discover potential new research topics. They will observe and analyze techniques for making systems more intelligible, functional, aesthetic, and marketable. Examples will cover perceptual, conceptual, and communication issues in typography, symbol systems, color, spatial composition, animation, and sequencing. Case studies will include SABRE, one of the world's largest extranets, Kaiser clinical medical systems, and financial systems.

    Who Should Attend: Introductory-Intermediate The tutorial is designed to cater for the needs of several audiences: - Researchers who wish to understand implementation issues that must be addressed in practical large information-retrieval systems - Practitioners who wish to learn current ``best practice'' in the area of user-interface and information-visualization design, especially for information summaries and queries - Students who wish to establish a basis for research in user-interface and information-design aspects of information retrieval system implementation.

    About the Instructor(s):

    Mr. Aaron Marcus has given tutorials on user-interface and information- visualization design since 1980. Over the last 15 years, his firm has designed or evaluated more than 120 user interfaces for American Express, Bank of America, Charles Schwab, Hewlett- Packard, IBM, Kaiser, Kodak, McDonnell-Douglas, Motorola, NCR, N2K/Music Boulevard, Oracle, Pacific Bell Smart Yellow Pages, Reuters, SABRE, US Federal Reserve Bank, and Unigraphics. Government clients have included INTELSAT, Lawrence Berkeley Laboratory, National Library of Medicine, and the US Departments of Defense and Labor. Mr. Marcus has written a basic reference, "Graphical User Interfaces," Handbook Of Human-Computer Interaction, Elsevir Science, Amsterdam, Netherlands, 1997, pp. 423-440, co-authored The Computer Image (1982), for which he wrote the essay "Color: A Tool for Computer Graphics Communication," co-authored Human Factors and Typography for More Readable Programs (1990), authored Graphic Design for Electronic Documents and User Interfaces (1992), and co-authored The Cross-GUI Handbook (1994). Mr. Marcus received a BA in physics from Princeton University (1965) and a BFA./MFA in graphic design from Yale University Art School (1968). In 1992, Mr. Marcus received the National Computer Graphics Association Industry Achievement award for his contributions to the field.


    Title: Content Based Image Retrieval
    Instructor(s): R. Manmatha and S. Chandu Ravela, University of Massachusetts
    Time: Morning, 8:30-12:30

    Tutorial Description: Managing information in today's world requires tools which search, retrieve, classify and categorize this information. While there are successful search engines which search and retrieve information in the form of ASCII text (e.g., INQUERY, Infoseek), there are few successful tools for managing images. Users desire semantic similarity from a retrieval engine. However, this is difficult to achieve for images since the technology to recognize objects in general does not exist. However, similarity based on attributes like color, texture, shape and appearance may often approximate semantic similarity. Many image retrieval systems are based on using these attributes.

    In this tutorial, we discuss why the problems underlying the field of image retrieval are difficult, the current state of the technology and directions in which the technology is likely to be improved. Completely automatic systems are difficult to build with the current state of technology. However, in many cases good results may be achieved by carefully formulating the problem, by careful specification of the query and by close user involvement in the retrieval process. This tutorial will discuss these aspects and the different current technologies that may be applied for retrieving images. We will also discuss how such systems may be evaluated.

    Topics to be discussed include: The notion of visual similarity and difficulties therein, content based retrieval using color, texture, shape and appearance. Applications of image retrieval. Query specifications. Representations and techniques including filtering, color spaces, histograms and principal component analysis. Examples of systems.

    Who Should Attend: This tutorial is geared towards researchers, developers and engineers with an interest in multimedia information management. It does not assume a specific background but some mathematical maturity will be of significant advantage. The aim of the tutorial is to introduce people to the issues, the state of technology and possible future directions in the field of image retrieval. At the end of the course, people should be able to appreciate some of the common techniques used in the area of image retrieval and also be able to apply some of the simpler techniques.

    About the Instructor(s):

    R. Manmatha is a research assistant professor in the Computer Science Department at the University of Massachusetts. He is the lead researcher at the multimedia indexing and retrieval group at the Center for Intelligent Information Retrieval. Manmatha's current research is in the areas of image retrieval and document recognition and analysis. He has worked on image retrieval, finding text in images and in methods to retrieve handwritten manscripts and has more than 20 papers in these areas. He is involved in a project from the US Patent and Trademark Office to index and retrieve trademarks using both content based retrieval and text. He has consulted with several companies on imaging products.

    S. Chandu Ravela is a doctoral candidate at the University of Massachusetts. He is a researcher at the multimedia indexing and retrieval group and the computer vision lab. and works on image retrieval. He has worked on multimedia content management, image matching, and computer vision. Mr. Ravela has written a number of papers in the area of image retrieval. In the area of image retrieval, he is actively working on a project from the US Patent and Trademark Office to index and retrieve trademarks using both content based retrieval and text. Mr. Ravela has consulted with industry on imaging products. He received a MS in computer science from the University of Massachusetts in 1994 and a BE in Computer Engineering and Science from the Regional Engg College, Trichy, India in 1991.


    Title: Multilingual Information Access
    Instructor(s): Judith Klavans, Columbia University, and Peter Schäuble, Eurospider Information Technology AG
    Time: Morning, 8:30-12:30

    Tutorial Description: As globalization is emerging, information access across language boundaries is becoming a critical issue. The tutorial introduces the main approaches to multilingual information access and the state of the art is presented in detail. A special focus will be cross-language information retrieval. We will also cover areas of information presentation, including multilingual multidocument summarization. We will also discuss the acquisition and usefulness of various multilingual resources. The tutorial presenters will elaborate on all issues of multilingual information access from two different point of views, i.e. information retrieval and computational linguistics. Prototype and commercial systems as well as evaluation methodologies will be discussed.

    Who Should Attend: Introductory/Intermediate.

    About the Instructor(s):

    Judith L. Klavans is Director of the Center for Research on Information Access (CRIA) at Columbia University. Prior to that, she was a Research Staff Member at the IBM T. J. Watson Research Center where she focused on natural language processing, the use of machine-readable dictionaries in building lexicons, and the use of large monolingual and bilingual corpora for enhancing these lexicons.

    Peter Schäuble is CEO of Eurospider Information Technology AG. Prior to that he was Assistant Professor of Computer Science at ETH Zurich and headed an Information Retrieval research group. Eurospider is commercializing ETH protype systems for the automatic audio indexing of spoken text and for cross-language information retrieval.

    The two presenters are the co-leaders of the multilingual informaton access working group that were funded jointly by the National Science Foundation and European Commission,DG-13, to explore key issues in Digital Libraries.


    Title: Implementation of High-Performance Information Retrieval Systems
    Instructor(s): Alistair Moffat, The University of Melbourne and Justin Zobel, RMIT University
    Time: Morning, 8:30-12:30

    Tutorial Description: Basic information retrieval techniques, developed and refined over more than thirty years, are well-known. However, it is only this decade that these techniques have been applied to document collections containing gigabytes of text, resulting in significant innovations in the way large text databases are created and queried. This tutorial examines the practical problems of indexing, querying, storing, and updating gigabyte-sized text databases, including those that are the result of Web-based information harvesting. We describe a variety of recently-developed techniques for coping with the problems introduced by the scale of modern text collections, including fast index construction methods, fast query evaluation strategies, and fast text and index compression mechanisms.

    Who Should Attend: The tutorial is designed to cater for the needs of several audiences: Researchers who wish to understand implementation issues for multi-gigabyte IR systems; practitioners who wish to understand current "best practice" in IR implementation; and to learn about novel techniques for efficient resource usage in IR systems; and students with a research interest in IR system implementation.

    About the Instructor(s):

    Dr. Alistair Moffat is an Associate Professor in the Department of Computer Science at the University of Melbourne. He completed a Ph.D. at the University of Canterbury in 1985. Since then Dr. Moffat has published more than 80 refereed papers in the areas of sorting and searching algorithms; text, image, and index compression; and the implementation of information retrieval systems. He is a coauthor of the book ``Managing Gigabytes: Compressing and Indexing Documents and Images'', a second edition of which will be published in 1999. Dr. Moffat has also served on the Program Committee of the IEEE Data Compression Conference (1995--1999), the ACM SIGIR Conference on Research and Development in Information Retrieval (1996--1999), the RIAO'97 Conference, and the SPIRE Conference on String Processing and Information Retrieval (1998--1999).

    Dr. Justin Zobel completed his Ph.D. at the University of Melbourne in 1990. He then joined the academic staff of the Department of Computer Science at RMIT, where he is now an Associate Professor and a member of the RMIT Multimedia Database Systems group. Dr. Zobel has published more than 80 refereed papers in the areas of information retrieval, database systems, text databases, genomics, and logic programming, and is the author of the 1997 book ``Writing for Computer Science: The Art of Effective Communication'', and coauthor of the texts ``Indexing Techniques for Advanced Database Systems'' and ``Document Computing: Technologies for Managing Electronic Document Collections''. He is active in the information retrieval community, serving on the Program Committee for the ACM SIGIR Conference and the WWW8 Conference, and has been a Program Chair of the 1997 Australasian Computer Science Conference and of the 1995 Australasian Database Conference.

    In collaboration these two researchers developed the MG system, a research prototype text management system that incorporates a range of novel techniques to achieve compact storage of large amounts of data while still providing fast content-based access. The work on the MG system has been supported by several grants from the Australian Research Council. MG has been used by the Multimedia Database Systems Group to support its activities in the international TREC information retrieval project, and also serves as the kernel of the New Zealand Digital Library Project.


    Afternoon Tutorials

    Title: Medical Informatics: Acquisition, Storage, and Use of Information in Health Care
    Instructor(s): William Hersh, Oregon Health Sciences University
    Time: Afternoon, 1:30-5:30

    Tutorial Description: The goal of this tutorial is to provide an overview of the field of medical informatics, which is concerned with the acquisition, storage, and use of information in health care. The specific objectives are to:

    1. Give an overview of the field, the research questions it addresses, and the institutions where academic programs are located.
    2. Provide a survey view of the major areas within the field, including electronic medical records, artificial intelligence and decision support, telemedicine, coding and classification, and security and confidentiality.
    3. Provide a detailed view of information retrieval issues within the field, including content, indexing, retrieval, and evaluation.

    Who Should Attend? The tutorial is aimed at all who have an interest in the acquisition, storage, and use of information in health care, particularly through the use of information technologies.

    About the Instructor(s):

    Dr. William Hersh, is Associate Professor and Chief of the Division of Medical Informatics and Outcomes Research in the School of Medicine at Oregon Health Sciences University (OHSU) in Portland, Oregon. Dr. Hersh has been at OHSU since 1990, where he has developed a research program in medical information retrieval. His research focuses on the development and evaluation of information retrieval systems for clinicians. The majority of his research funding comes from the National Library of Medicine, where he has also been involved with the Unified Medical Language System project. Dr. Hersh has published over four dozen scientific papers and is author of the book, Information Retrieval: A Health Care Perspective (Springer-Verlag, 1996). He is also a Fellow of the American College of Medical Informatics and serves on the Editorial Board of three journals: Journal of the American Medical Informatics Association, Information Processing and Management, and Information Retrieval. He also serves on the Board of Directors of the American Medical Informatics Association. Dr. Hersh also teaches at OHSU, where his duties include serving as Program Director for the Master of Science in Medical Informatics program and Course Director for an annual two-day continuing education course, Using Computers to Solve Clinical Problems.


    Title: Inside Internet Search Engines
    Instructor(s): Jan Pedersen, Infoseek and William Chang
    Time: Afternoon, 1:30-5:30

    Tutorial Description: We will discussion the major scientific and engineering issues that distinguish Internet Search Engines from traditional IR applications. We will review the new technologies that have been developed over the last few years specific to Web content; for example popularity and quality-based reranking. We will also discuss the engineering methodology used to scale a search system to millions of queries per days and hundreds of millions of document. If time permits, we will touch on the economics of current search engines and speculate on their possible forward evolution.

    Who Should Attend: The tutorial will be instructive to anyone interested in the design and operation of an Internet search engine. However, we will expect some familiarity with IR concepts, such as the vector space model, and will keep the discussion fairly technical.

    About the Instructor(s):

    Jan Pedersen is a graduate of Princeton University (AB in Statistics) and Stanford University (PhD in Statistics). He was on the staff of the Xerox Palo Alto Research Center for ten years, first working on statistical approaches to natural language processing, and later managing a research group in the area of Information Access. He is associated with a number of inventions, most notably the Scatter-Gather, cluster-based, document browsing paradigm. In 1996 Jan Pedersen joined Verity Inc., a leading vendor of text retrieval software, as manager of the Advanced Technology Group. The prototyping efforts of this group culminated in a product release of the Verity Knowledge Organizer early in 1998. Jan Pedersen is currently Director for Advanced Technology, Search and Directory, at Infoseek Corp.

    William Chang was the inventor of Infoseek's search engine and one of the architects of Infoseek/GO Network's portal strategy to embrace media content, community, and commerce. Previously at the Cold Spring Harbor Laboratory in New York, Dr. Chang developed a method for discovering protein evolutionary relationships and helped construct a genomic map of fission yeast for cancer research. He received his Ph.D. from U.C. Berkeley in 1991 for a sublinear pattern matching algorithm and has a B.A. in mathematics from Harvard University. Dr. Chang also served as an ACM SIGAPL appointed board member in 1994.


    Title: Automated Text Summarization
    Instructor(s): Eduard Hovy, Chin-Yew Lin and Daniel Marcu, University of Southern Calif
    Time: Afternoon, 1:30-5:30

    Tutorial Description: In this tutorial, we review the state of the art in automatic text summarization, and discuss and critically evaluate current approaches to the problem. The tutorial is structured as follows:

    1. The need for text summarization.
    2. What is a summary, exactly? What types are there? We outline a typology of summaries, including the following distinctions: indicative vs. informative; abstract vs. extract; generic vs. query-oriented; background vs. just-the-news; single-document vs. multi-document; and so on.
    3. An overview of the principal paradigms and approaches. We describe the typical decomposition of summarization into three stages, and explain in detail the major approaches to each stage. We contrast the strengths and weaknesses of the statistical/IR-based and the AI/NLP-based paradigms.
    4. Topic Identification. For this stage, we outline techniques based on stereotypical text structure, cue words, high-frequency indicator phrases, intratext connectivity, and discourse structure centrality. We provide detailed examples together with measures of effectiveness.
    5. Topic fusion. For this stage, we outline some ideas that have been proposed, including concept generalization and semantic association, and describe the inherent problems of large-scale world knowledge.
    6. Summary generation. For this stage, we outline the problems of sentence planning to achieve information compaction and to ensure coherence of the resulting summary.
    7. Evaluation: how good is a summary? Evaluation is a difficult issue. We describe various suggested measures and discuss the adequacy of current evaluation methods. We illustrate the measurement of individual features and show how some features are surprisingly bad and others surprisingly good predictors of importance.
    8. The future. Finally, we present a set of open problems that we perceive as being crucial for immediate progress in automatic summarization. Throughout, we highlight the strengths and weaknesses of statistical and symbolic/linguistic techniques in implementing efficient summarization systems. We discuss ways in which summarization systems can interact with and/or complement information extraction and information retrieval systems.

    Who Should Attend: Introductory-Intermediate.

    About the Instructor(s):

    Eduard Hovy is the director of the Natural Language Group at the Information Sciences Institute of the University of Southern California, and is a member of the Computer Science Departments of USC and of the University of Waterloo. He completed a Ph.D. in Computer Science (Artificial Intelligence) at Yale University in 1987. His research focuses on machine translation, automated text summarization, text planning and generation, and the semi- automated construction of large lexicons and terminology banks; the Natural Language Group at ISI currently has projects in most of these areas. He is the author or editor of four books and over 100 technical articles. With regard to text summarization, Dr. Hovy is one of the architects of the SUMMARIST system being built at ISI. Currently, Dr. Hovy serves as the President of the Association of Machine Translation in the Americas (AMTA). He has served on the Executive Board of the Association for Computational Linguistics (ACL) and on the editorial boards of several journals. He has been program chair for numerous workshops and conferences, including AMTA-98. Dr. Hovy regularly co-teaches the Natural Language Processing course at the University of Southern California, as well as an occasional three-days course on MT at UCLA.

    Chin-Yew Lin is a computer scientist at the Information Sciences Institute of the University of Southern California. He completed a Ph.D. in Computer Engineering at the University of Southern California in 1997. Dr. Lin is the principal designer of a mutliligual information retrieval, summarization, and translation system, MuST. MuST integrates state-of-the-art off-the-shelf information retrieval engines, multilingual automated text summarization technologies from SUMMARIST project, and rapid-prototype shallow machine translation engine SHALT. MuST enables users to search Internet or local free text database of multiple languages (currently, English, Arabic, Bahasa Indonesia, Japanese, and Spanish), summarize documents to different lengths, and translate the full texts or summaries into English. Dr. Lin has demonstrated a prototype system to several government and private agencies and several sites have requested it for field testing and technology transfer. His research focuses on automated multimedia summarization, information retrieval, machine learning, and machine translation.

    Daniel Marcu is a computer scientist at the Information Sciences Institute and a research assistant professor in the Computer Science Department, University of Southern California. He graduated with a Ph.D. in Computer Science from the University of Toronto in 1998. His publications span a wide range of computational linguistics topics that include text and discourse theories, knowledge representation for natural language processing, and natural language generation. His current focus is on building theoretical and algorithmic foundations for discourse (multi-sentence) parsing and discourse-based summarization of unrestricted domain texts; and on applying discourse theories to real-world natural language applications. His work on discourse parsing and discourse-based summarization constitutes the core of a forthcoming MIT Press book.


    Title: Theory and Practice in Text Retrieval System Evaluation
    Instructor(s): Chris Buckley, Sabir Research Inc., and Ellen Voorhees, NIST
    Time: Afternoon, 1:30-5:30

    Tutorial Description: A variety of different effectiveness measures can be used to evaluate the output of a retrieval system. The "trec_eval" program of TREC, for example, reports 85 numbers using some 20-odd different measures for a single retrieval run, though retrieval performance is routinely reported using only a small subset of these measures. This course will examine a wide variety of measures, giving the advantages and disadvantages of each, as well as examining the situations in which use of a measure is most appropriate. Other subjects to be covered include relevance judgments, significance tests, set-based evaluation, and user-oriented evaluation.

    Examples of questions to be answered:

    1. Why is average precision paid so much attention?
    2. Why are error rate and related measures prevalent in other disciplines but not heavily used for IR?
    3. Why isn't completeness the primary goal when selecting documents for relevance judging?
    4. Why should you never report both of Prec@10 and Recall@10?
    5. Why are just straight recall and precision extremely poor measures to use for set-based evaluation?
    6. Why is Prec@10 documents an OK (but not perfect) measure for comparing systems, but a poor measure for tuning a single system?

    Who Should Attend: Intermediate. The course will be taught from first principles, but some familiarity with retrieval experiments and current methodology would be extremely helpful. The course is intended primarily for developers of retrieval systems.

    About the Instructor(s):

    Chris Buckley is President of Sabir Research, Inc. He has been the primary implementor of the SMART experimental retrieval system since 1985. He has been very active in defining the TREC evaluation schemes and is the author of the standard TREC evaluation program (trec_eval). He received his PhD from Cornell University, and has written numerous research papers on many areas of IR.

    Ellen Voorhees is the manager of the TREC project at the National Institute of Standards and Technology (NIST). Prior to joining NIST, she was a Senior Member of Technical Staff at Siemens Corporate Research in Princeton, NJ. She received her PhD from Cornell University where she studied under Gerard Salton. Her research interests include retrieval system evaluation, test collection design, and statistical natural language processing.


    Title: Advanced Machine Learning Techniques for IR
    Instructor(s): Yoram Singer and David Lewis
    Time: Afternoon, 1:30-5:30

    Tutorial Description: This tutorial will discuss the use of state-of-the-art machine learning methods for information retrieval systems. We will focus on a few, well analyzed, machine learning algorithms that were also studied empirically. In particular, we will discuss in depth two domains that use ML techniques for IR: text categorization and information extraction. A major emphasis of the tutorial will be on making the connection between different machine learning techniques (e.g. boosting vs. support vector) and the implications to IR. Tentative list of topics:

    I. Categorization methods:

    1. Introduction to the theory of separating hyperplanes.
    2. Margin classifiers:
        a. Support vector machines.
        b. Boosting algorithms.
        c. Naive Bayes as a margin classifier.
    3. Online algorithms:
        a. Winnow.
        b. Weighted Majority and Sleeping-Experts.
        c. Exponentiated Gradient.
        d. Perceptron learning.
        f. A unified approach.
        g. Converting an online algorithm to a batch classifier.

    II. Finite state models for information extraction:

    1. Deterministic automata learning:
        a. Learning with queries (active learning).
        b. Signature techniques.
    2. Probabilistic automata learning:
        a. Signatures revisited.
        b. (Observable) Markov models and probabilistic suffix trees.
        c. Hidden Markov models.
    This tutorial assumes basic knowledge in machine learning. The introductory tutorial being proposed by Lewis & Singer would be a good preparation for the advanced tutorial but is not mandatory.

    Who Should Attend: Advanced.

    About the Instructor(s):

    Yoram Singer is a Principal Research Staff Member at AT&T Labs. He received a PhD in computer science from the Hebrew University of Jerusalem, Israel, in 1995. His research focuses on theory and applications of machine learning algorithms. He serves on the editorial boards of Machine Learning Journal and Neural Computing Surveys.

    David D. Lewis is a Principal Research Staff Member at AT&T Labs. Prior to that he was a research faculty member at the University of Chicago, and did his Ph.D. in Computer Science at the University of Massachusetts at Amherst. He has published extensively on the application of machine learning and natural language processing to IR, and has organized several workshops in these areas. He has been heavily involved in the design of the TREC evaluations and the construction of test collections for text categorization.


     

    SIGIR 99 Call for Participation. Last modified 05/12/99.