A hybrid Approach to Answering Biographical/Definitional Questions Ana Licuanan, Scott Miller, Ralph Weischedel, Jinxi Xu



Yüklə 263 Kb.
tarix04.01.2017
ölçüsü263 Kb.
#4397


A Hybrid Approach to Answering Biographical/Definitional Questions

  • Ana Licuanan, Scott Miller, Ralph Weischedel, Jinxi Xu

  • 10 June 2003


Techniques

  • Sentence selection using

    • Information Retrieval (IR)
    • Linguistic features
      • Appositives
      • Copulas
      • Propositions
    • Semantic processing using Information Extraction
      • Co-reference within document
      • Relations
  • Sentence compression

  • Redundancy removal



Baseline for Comparison Purposes



Sentence Selection by IR

  • Hypothesis:

    • Good sentences tend to contain words that are common in human created biographies
  • Method:

    • Candidate sentences that contain the question subject are ranked according to their similarity to a set of training biographies
      • Similarity is computed using a IR engine (BBN's IR system)
      • Each sentence is treated as a query
      • The training biographies are treated as a single vector of words
      • The similarity scores are normalized to make them comparable
    • Top N sentences containing the question term returned as the answer
  • Training Data:

    • 17,000 short biographies from www.s9.com
    • 6,000 online encyclopedia biographies from www.wikipedia.org


System Diagram



Sentence Selection using Linguistic Features

  • Hypothesis:

    • Good sentences tend to contain linguistic features that are common in human generated answers
  • Method:

    • Good features include the target (QTERM) as an argument in
      • Propositions
      • Appositives
      • Copulas extracted from parse trees
    • Headword is used to represent argument value to reduce data sparseness
    • Sentence with a feature f is ranked by
      • P(R|f)=P(R) P(f|R) / P(f) (Bayes rule)
      • P(f|R) estimated from training data (i.e, mined biographies)
      • P(f) estimated from a background corpus (a sample of the AQUAINT corpus with 10M words)
  • Example: “Blobel, a biologist at Rockefeller University, won the Nobel Prize in Medicine.”

    • Proposition: =QTERM =“won” =“prize”
    • Appositive: =“biologist”


Sentence Selection based on Information Extraction

  • Hypothesis:

    • Good sentences tend to contain semantic relations that are common in human generated answers
  • Method:

    • SERIF, a state of art Information Extraction Engine used
    • Co-reference used for name comparison, e.g.,
      • Depending on context, “he” and “Bush” may be the same person
    • Relations used as additional features for sentence selection. Types of relations include:
      • Spouse-of (e.g. “Clinton”, “Hillary”)
      • Founder-of (e.g. “Gates”, “Microsoft”)
      • Management-of (e.g. “Welch”, “GE”)
      • Residence-of (e.g. “John Doe”, “Boston”)
      • Citizenship-of (e.g. “John Doe”, “American”)
      • Staff-of (e.g. “Weischedel”, “BBN”)


Sentence Compression

  • Motivation:

    • A good sentence may contain portions irrelevant to the question
    • Goal: extract only the pertinent parts of a sentence
  • Method:

    • Operations are performed on parse trees
    • Find the smallest phrase that contains all the arguments of an important fact (i.e. proposition/appositive/copula/relation)
    • Relative clauses not attached to the question term are trimmed from phrase


Redundancy Removal

  • Hypothesis:

  • Method:

    • Each response item (a sentence or phrase) is decomposed into a set of features
      • Appositives
      • Copulas
      • Propositions
      • Relations
    • All candidate items are ranked based on features
      • PRf) = P(R) . P (f|R) /P(f)
    • For each response item,
      • Output it if it contains new features
      • Skip it if it does not




Reference (Human) Answer: Full Sentences

  • Dr. Gunter Blobel is a cellular and molecular biologist at Rockefeller University in New York City.  He won the 1999 Nobel Prize in medicine for discovering that proteins carry signals that act as ZIP codes, helping them find their correct locations within the cell.  The research that Blobel has conducted for 30 years helps explain the molecular mechanisms behind diseases like cystic fibrosis.

  • Blobel said he was donating most of the prize money to the Friends of Dresden, an independent American group that supports the restoration and preservation of Dresden's artistic and architectural legacy.  He witnessed thebombing of Dresden as a child.  Blobel was born in 1936 in Waltersdorf, Silesia, Germany, now part of Poland.Blobel graduated from high school with high grades, but couldn't continue to study because he didn't want to join the Communist youth parties. Blobel was born in 1936 in Waltersdorf, Silesia, Germany, now part of Poland.

  • After earning a medical degree from the University of Tubingen, he interned at a small hospital where he said he "realized that treatment of disease was irrational and not based on profound knowledge."  His interest then became research.  In the early 1950s, Blobel escaped to the West through Berlin and became a U.S. citizen in the 1980s.  Blobel earned a Ph.D. from the University of Wisconsin in 1967.  That same year, Blobel moved to Rockefeller University as a post-doctoral fellow largely because he wanted to work with Dr. George E. Palade, a pioneering cell biologist.



Reference (Human) Answer: Phrases

  • A cellular and molecular biologist at Rockefeller University

  • Won the 1999 Nobel Prize in medicine for discovering that proteins carry signals that act as ZIP codes, helping them find their correct locations within the cell

  • Has conducted research for 30 years

  • Research helps explain the molecular mechanisms behind diseases like cystic fibrosis  

  • Born in 1936 in Waltersdorf, Silesia, Germany, now part of Poland

  • Earned a medical degree from the University of Tubingen

  • Escaped to the West through Berlin and became a U.S. citizen in the 1980s

  • Earned a Ph.D. from the University of Wisconsin in 1967.



Sentences Selected by IR



Sentence Selected Using Linguistic Features



Sentences Selected Using Relations

  • Dr. Gunter Blobel , a cellular and molecular biologist at Rockefeller University in New York City , won the 1999 Nobel Prize in medicine on Monday for discovering that proteins carry signals that act as ZIP codes , helping them find their correct locations within the cell . (*APP*=biologist)

  • -LRB- Angel Franco/New York Times Photo -RRB- -LRB- NYT10 -RRB- NEW YORK -- Oct. 11 , 1999 -- SCI - NOBEL - MEDICINE , 10-11 -- Dr. Guenter Blobel , a cellular and molecular biologist at Rockefeller University on Monday . (*APP*=biologist)

  • -- Guenter Blobel , a cell biologist at Rockefeller University in New York , was awarded the Nobel Prize in Physiology or Medicine Monday for discovering how proteins get shipped to their proper destinations within the body after being manufactured by tiny molecular factories inside cells . (*APP*=biologist)

  • Reefers : SCI - NOBEL - MEDICINE -LRB- Undated -RRB- -- Dr. Guenter Blobel , a cellular and molecular biologist at Rockefeller University in New York City , is awarded the 1999 Nobel Prize in Physiology of Medicine for discovering that proteins carry signals that act as ZIP codes , helping them find their correct locations within the cell . (*APP*=biologist)

  • LRB- AP -RRB- -- Dr. Guenter Blobel of The Rockefeller University in New York won the Nobel Prize for medicine today for protein research that shed new light on diseases including cystic fibrosis and early development of kidney stones . (rname=ROLE/GENERAL-STAFF arg1=*TERM* arg2=University)

  • Blobel was born in 1936 in Waltersdorf , Silesia , Germany , now part of Poland . (verb=born obj=*TERM* in=Germany)

  • Young Gunter graduated from high school `` with highest grades , although he never studied very much , '' his oldest brother , Dr. Hans Blobel , said in a telephone interview from his home in Geissen , Germany . (verb=graduated, sub=*TERM*)



Sentence Compression

  • Dr. Gunter Blobel , a cellular and molecular biologist at Rockefeller University in New York City , won the 1999 Nobel Prize in medicine on Monday for discovering that proteins carry signals that act as ZIP codes , helping them find their correct locations within the cell . (*APP*=biologist)

  • -LRB- Angel Franco/New York Times Photo -RRB- -LRB- NYT10 -RRB- NEW YORK -- Oct. 11 , 1999 -- SCI - NOBEL - MEDICINE , 10-11 -- Dr. Guenter Blobel , a cellular and molecular biologist at Rockefeller University on Monday . (*APP*=biologist)

  • -- Guenter Blobel , a cell biologist at Rockefeller University in New York , was awarded the Nobel Prize in Physiology or Medicine Monday for discovering how proteins get shipped to their proper destinations within the body after being manufactured by tiny molecular factories inside cells . (*APP*=biologist)

  • Reefers : SCI - NOBEL - MEDICINE -LRB- Undated -RRB- -- Dr. Guenter Blobel , a cellular and molecular biologist at Rockefeller University in New York City , is awarded the 1999 Nobel Prize in Physiology of Medicine for discovering that proteins carry signals that act as ZIP codes , helping them find their correct locations within the cell . (*APP*=biologist)

  • LRB- AP -RRB- -- Dr. Guenter Blobel of The Rockefeller University in New York won the Nobel Prize for medicine today for protein research that shed new light on diseases including cystic fibrosis and early development of kidney stones . (rname=ROLE/GENERAL-STAFF arg1=*TERM* arg2=University)

  • Blobel was born in 1936 in Waltersdorf , Silesia , Germany , now part of Poland . (verb=born obj=*TERM* in=Germany)

  • Young Gunter graduated from high school `` with highest grades , although he never studied very much , '' his oldest brother , Dr. Hans Blobel , said in a telephone interview from his home in Geissen , Germany . (verb=graduated, sub=*TERM*)



Redundancy Removal

  • Dr. Gunter Blobel , a cellular and molecular biologist at Rockefeller University in New York City , won the 1999 Nobel Prize in medicine on Monday for discovering that proteins carry signals that act as ZIP codes , helping them find their correct locations within the cell . (*APP*=biologist)

  • -LRB- Angel Franco/New York Times Photo -RRB- -LRB- NYT10 -RRB- NEW YORK -- Oct. 11 , 1999 -- SCI - NOBEL - MEDICINE , 10-11 -- Dr. Guenter Blobel , a cellular and molecular biologist at Rockefeller University on Monday . (*APP*=biologist)

  • -- Guenter Blobel , a cell biologist at Rockefeller University in New York , was awarded the Nobel Prize in Physiology or Medicine Monday for discovering how proteins get shipped to their proper destinations within the body after being manufactured by tiny molecular factories inside cells . (*APP*=biologist)

  • Reefers : SCI - NOBEL - MEDICINE -LRB- Undated -RRB- -- Dr. Guenter Blobel , a cellular and molecular biologist at Rockefeller University in New York City , is awarded the 1999 Nobel Prize in Physiology of Medicine for discovering that proteins carry signals that act as ZIP codes , helping them find their correct locations within the cell . (*APP*=biologist)

  • LRB- AP -RRB- -- Dr. Guenter Blobel of The Rockefeller University in New York won the Nobel Prize for medicine today for protein research that shed new light on diseases including cystic fibrosis and early development of kidney stones . (rname=ROLE/GENERAL-STAFF arg1=*TERM* arg2=University)

  • Blobel was born in 1936 in Waltersdorf , Silesia , Germany , now part of Poland . (verb=born obj=*TERM* in=Germany)

  • Young Gunter graduated from high school `` with highest grades , although he never studied very much , '' his oldest brother , Dr. Hans Blobel , said in a telephone interview from his home in Geissen , Germany . (verb=graduated, sub=*TERM*)



System Output

  • A cellular and molecular biologist at Rockefeller University in New York City

  • Dr. Guenter Blobel of The Rockefeller University in New York

  • Blobel was born in 1936 in Waltersdorf , Silesia , Germany , now part of Poland

  • Young Gunter graduated from high school `` with highest grades , although he never studied very much , ''



Evaluation

  • Goal:

    • A repeatable, automatic scorer to allow frequent experiments
  • Test bed: 26 biographical questions with human created answers (1 human answer/question)

    • ¼ from pilot corpus
    • ¾ from BBN creation
  • For each question, system produces the top N response items that are less or equal to the size of the manual answer

    • BLEU metric from machine translation evaluations used
    • Answer brevity, which should be rewarded for bio/def QA, is penalized by BLEU


BLEU Scores with Phrase-based Answer Keys

  • Sentence selection via IR and name finding surprisingly good according to BLEU

  • Relation extraction and redundancy removal promising

  • Sentence compression improves scores on phrasal reference answers



BLEU Scores with Sentence-based Answer Keys

  • BLEU scores consistently higher for longer (human generated) answer keys

  • Sentence selection via IR and name finding surprisingly good according to BLEU

  • Relation extraction and redundancy removal promising

  • Sentence compression hurts scores on sentential reference answers

    • May be due to omission of entity name in compressed answers


BLEU vs. Human Judgments

  • Too few human judgments and too little data to draw firm conclusions

  • BLEU may not be sufficiently sensitive

    • Does not fully agree with human rankings
    • But generally does


Lessons Learned

  • IR model of biographies improves sentence selection

    • Select sentences like those seen in human-generated biographies
  • Relation extraction modestly improves sentence selection

  • Bleu measure is stable with respect to length of n-grams

  • BLEU vs. subjective evaluation

    • BLEU promising for automatic evaluation of progress in system development
    • May not be accurate enough for cross-system evaluation


Summary

  • Approach to biography generation combines

    • Information retrieval
    • Linguistic analysis
    • Information extraction
    • Redundancy detection
    • Sentence compression
  • Automatic evaluation by BLEU metric

  • Much work remains

    • Demonstration tomorrow
    • Improvements in quality
    • Extension to
      • Cross-document entity tracking
      • Organization profiles
      • Definitions of things
    • Participation in TREC QA evaluation


Yüklə 263 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2025
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin