Resources

This page is no longer being maintained. If you're looking for a specific resource, my publications or my GitHub profile would be good starting points.

TREC

Raw nugget pyramids data

Released: April 13, 2006 (Last update: September 9, 2006)

Jimmy Lin and Dina Demner-Fushman. Will Pyramids Built of Nuggets Topple Over? Proceedings of the 2006 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2006), page 383-390, June 2006, New York City, New York.

Pourpre scoring script for automatically evaluating complex questions

Released: May 29, 2005 (Last update: June 13, 2007)

Jimmy Lin and Dina Demner-Fushman. Methods for Automatically Evaluating Answers to Complex Questions. Information Retrieval, 9(5):565-587, 2006. [DOI:10.1007/s10791-006-9003-7]

Jimmy Lin and Dina Demner-Fushman. Automatically Evaluating Answers to Definition Questions. Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pages 931-938, October 2005, Vancouver, Canada.

The Aranea question answering system

Released: June 11, 2005

Aranea is a Web-based factoid question answering system that uses a combination of data redundancy and database techniques. Its performance in TREC 2002, TREC 2003, and TREC 2004 was competitive. The predecessor to Aranea is the askMSR system that colleagues at Microsoft Research and I developed in 2001.

Jimmy Lin. An Exploration of the Principles Underlying Redundancy-Based Factoid Question Answering. ACM Transactions on Information Systems, 27(2):1-55, 2007.

QA test collection

Released: June 9, 2005

The question answering test collection as descibed in: Jimmy Lin and Boris Katz. Building a Reusable Test Collection for Question Answering. Journal of the American Society for Information Science and Technology, 57(7):851-861, 2006.

Java version of Brill's Part-of-Speech Tagger

Released: December 27, 2004

Eric Brill's part-of-speech tagger ported to Java via the Java Native Interface (JNI). In actuality, it's based on Benjamin Han's ePost package, which is a cleaned-up version of Brill's original code. Has been tested on both Linux and Windows (under Cygwin).

LPost: Perl version of Brill's Part-of-Speech Tagger

Released: December 27, 2004

Eric Brill's part-of-speech tagger as a Perl Module. Just like the Java version, it's based on Benjamin Han's ePost package. Has been tested on both Linux and Windows (under Cygwin with ActiveState Perl).