awesome-information-retrieval

IR resources

A curated collection of resources and references for developers interested in information retrieval technology

A curated list of awesome information retrieval resources

GitHub

1k stars

46 watching

138 forks

last commit: over 2 years ago

Linked from 5 awesome lists

Awesome Information Retrieval / Books
Introduction to Information Retrieval			C.D. Manning, P. Raghavan, H. Schütze. Cambridge UP, 2008. (First book for getting started with Information Retrieval)
Search Engines: Information Retrieval in Practice			Bruce Croft, Don Metzler, and Trevor Strohman. 2009. (Great book for readers interested in knowing how Search Engines work. The book is very detailed)
Modern Information Retrieval			R. Baeza-Yates, B. Ribeiro-Neto. Addison-Wesley, 1999
Information Retrieval in Practice			B. Croft, D. Metzler, T. Strohman. Pearson Education, 2009
Mining the Web: Analysis of Hypertext and Semi Structured Data			S. Chakrabarti. Morgan Kaufmann, 2002
Language Modeling for Information Retrieval			W.B. Croft, J. Lafferty. Springer, 2003. (Handles Language Modeling aspect of Information Retrieval. It also extensively details probabilistic perspective in this domain, which is interesting)
Information Retrieval: A Survey			Ed Greengrass, 2000. (Comprehensive survey of Conventional Information Retrieval, before Deep Learning era)
Introduction to Modern Information Retrieval			G.G. Chowdhury. Neal-Schuman, 2003. (Intended for students of library and information studies)
Text Information Retrieval Systems			C.T. Meadow, B.R. Boyce, D.H. Kraft, C.L. Barry. Academic Press, 2007 (library/information science perspective)
Awesome Information Retrieval / Courses
INF384H / CS395T / INF350E: Concepts of Information Retrieval (and Web Search)			Matthew Lease (University of Texas at Austin)
CS 276 / LING 286: Information Retrieval and Web Search			Chris Manning and Pandu Nayak (Stanford University)
CS 371R: Information Retrieval and Web Search			Raymond J. Mooney (University of Texas at Austin)
CS 172: Introduction to Information Retrieval			Vagelis Hristidis (University of California - Riverside)
SIMS 240: Principles of Information Retrieval			Ray R. Larson (UC berkeley)
11-442 / 11-642: Search Engines			Jamie Callan (CMU)
600.466: Information Retrieval and Web Agents			David Yarowsky (John Hopkins University)
CS 435: Information Retrieval, Discovery, and Delivery			Andrea LaPaugh (Princeton University)
Information Retrieval and Data Mining			Dr. Jilles Vreeken , Prof. Dr. Gerhard Weikum (MPI)
Coursera - Text Retrieval and Search Engines			Prof. ChengXiang Zhai (University of Illinois at Urbana-Champaign)
Awesome Information Retrieval / Software
Apache Lucene			Open Source Search Engine that can be used to test Information Retrieval Algorithm. Twitter uses this core for its real-time search
The Lemur Project			The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software
Awesome Information Retrieval / Software / The Lemur Project
Indri Search Engine			Another Open Source Search Engine competitor of Apache Lucene
Lemur Toolkit			Open Source Toolkit for research in Language Modeling, filtering and categorization
Awesome Information Retrieval / Datasets
DBPedia			Linked data web
Cranfield Collections			This is one of the first collections in IR domain, however the dataset is too small for any statistical significance analysis, but is nevertheless suitable for pilot runs
TREC Collections			TREC is the benchmark dataset used by most IR and Web search algorithms. It has several tracks, each of which consists of dataset to test for a specific task. The tracks along with suggested use-case are:
Awesome Information Retrieval / Datasets / TREC Collections
Blog			Explore information seeking behavior in the blogosphere
Chemical IR			Address challenges in building large chemical testbeds for chemical IR
Clinical Decision Support			Investigate techniques to link medical cases to information relevant for patient care
Confusion			Study problem
Contextual Suggestion			Investigate search techniques for complex information needs (context and user interests based)
Crowdsourcing			Explore crowdsourcing methods for performing and evaluating search
Enterprise			Study search over the organization data
Entity			Perform entity-related search (find entities and their properties) on Web data
Filtering			Binarily decide retrieval of new incoming documents given a stable information need
Federated Web Search			Study merge performance for results from various search services
Genomics			Study retrieval efficiency of genomics data and corresponding documentation
HARD			Obtain High Accuracy Retrieval from Documents by leveraging searcher's context
Interactive Track			Study user interaction with text retrieval systems
Knowledge base acceleration			Study algorithms that improve efficiency of human Knowledge Base
Legal Track			Study retrieval systems that have high recall for legal documents use case
Medical Track			Explore unstructured search performance over patients record data
Microblog Track			Examine satisfaction of real-time information need for microblogging sites
Million Query Track			Explore ad-hoc retrieval over large set of queries
Novelty Track			Investigate systems' abilities to locate new (non-redundant) information
Question Answering Track			Test systems that scale beyond document retrieval, to retrieve answers to factoid, list and definition type questions
Relevance Feedback Track			For deep evaluation of relevance feedback processes
Robust Track			Study individual topic's effectiveness
Session Track			Develop methods for measuring multiple-query sessions where information needs drift
SPAM Track			Benchmark spam filtering approaches
Tasks Track			Test if systems can induce possible tasks, users might be trying to accomplish for the query
Temporal Summarization Track			Develop systems that allow users to efficiently monitor the information associated with an event over time
Terabyte Track			Test scalability of IR systems to large scale collection
Web Track			Explore information seeking behaviors common in general web search
Awesome Information Retrieval / Datasets
GOV2 Test Collection			This is one of the largest Web collection of documents obtained from crawl of government websites by Charlie Clarke and Ian Soboroff, using NIST hardware and network, then formatted by Nick Craswel
NTCIR Test Collection			This is collection of wide variety of dataset ranging from Ad-hoc collection, Chinese IR collection, mobile clickthrough collections to medical collections. The focus of this collection is mostly on east asian languages and cross language information retrieval
Awesome Information Retrieval / Datasets / NTCIR Test Collection
CLIR Test Collections			This dataset can be used for cross lingual IR between CJKE (Chinese-Japanese-Korean-English) languages. It is suitable for the following tasks:
Cross Language Q&A (CLQA) dataset collection			It supports following bi-lingua and mono-lingua:
Advanced Cross Linugal Information Retrieval and Question Answering (ACLIA)			The dataset is used for the task of cross-lingual question answering but the complexity of the task is higher than CLQA dataset
Awesome Information Retrieval / Datasets
Conference and Labs of the Evaluation Forum (CLEF) dataset			It contains a multi-lingual document collection. The test suite includes:
Reuters Corpora			The corpora is now available through NIST. The corpora includes following:
20 Newsgroup dataset			This data set consists of 20000 newsgroup messages.posts taken from 20 newsgroup topics
English Gigaword Fifth Edition			This data set is a comprehensive archive of English newswire text data including headlines, datelines and articles
Document Understanding Conference (DUC) datasets			Past newswire/paper datasets (DUC 2001 - DUC 2007) are available upon request
CMU List
Stanford List
University of Tennesse Knoxville
Awesome Information Retrieval / Talks
Extreme Classification: A New Paradigm for Ranking & Recommendation			Manik Verma (Microsoft Research)
The next web			Tim Berners-Lee (Ted Talk) [Tim Berners-Lee invented the World Wide Web. He leads the World Wide Web Consortium (W3C), overseeing the Web's standards and development]
Is Pivot a turning point for web exploration?			Gary Flake, Technical Fellow at Microsoft (TED Talks)
Challenges in Building Large-Scale Information Retrieval Systems			Jeff Dean (WSDM Conference, 2009)
Knowledge-based Information Retrieval with Wikipedia			David Wilne (The University of Waikato, 2008)
Music Information Retrieval Using Locality Sensitive Hashing			Steve Tjoa (RackSpace Developers) [This talk shows that IR is not just text and images]
The Functional Web -- The Future of Apps and the Web			Liron Shapira (Box Tech Talk)
Information Experience - Solution to Information Overload on Web			Doug Imbruce (Techcrunch Disrupt)[Doug Imbruce is the Founder of Qwiki, Inc, a technology startup in New York, NY, acquired by Yahoo! in 2013]
Internet Privacy			Dr. Alma Whitten (Google Brussels Tech Talk)
The moral bias behind your search results			Andreas Ekström (Swedish Author & Journalist, TED Talk)
Beware online "filter bubbles"			Eli Pariser (Author of the Filter Bubble, TED Talk)
Think your email's private? Think again			Andy Yen (CERN, TED Talk) [This talk talks about privacy, which Search Engines intrude into, and how can people protect it]
Do we have the right to be forgotten?			Michael Douglas [TEDx SouthBank]
The case for anonymity online			Christopher "moot" Poole" (Ted Talks) [Christopher "moot" Poole is founder of 4chan, an online imageboard whose anonymous denizens have spawned the web's most bewildering and influential subculture]
Awesome Information Retrieval / Conferences
WSDM			Web Search and Data Mining Conference -
SIGIR			Special Interests Group on Information Retrieval -
TREC			Text REtrieval Conference -
ECIR			European Conference on Information Retrieval -
WWW			World Wide Web Conference -
CIKM			Conference on Information and Knowledge Management -
FIRE			Forum for Information Retrieval Evaluation -
CLEF			Conference and Labs of the Evaluation Forum -
NTCIR			NII Testsbeds and Community for Information access Research -
Awesome Information Retrieval / Blogs
Information Retrieval and the Web			Google Research
IR Thoughts			Dr. Edel Garcia
Deep Neural Network Learns to Judge Books by Their Covers			Information Extraction
Can Deep Learning help solve Deep Learning			Information Retrieval from Lip Reading
To reduce biases in machine learning start with openly discussing the problem			Bias in Relevance
Whoa, Google’s AI Is Really Good at Pictionary			Sketch-based search
Neural Network Learns to Identify Criminals by Their Faces			Information Extraction

awesome-information-retrieval

Awesome Information Retrieval / Books

Awesome Information Retrieval / Courses

Awesome Information Retrieval / Software

Awesome Information Retrieval / Software / The Lemur Project

Awesome Information Retrieval / Datasets

Awesome Information Retrieval / Datasets / TREC Collections

Awesome Information Retrieval / Datasets

Awesome Information Retrieval / Datasets / NTCIR Test Collection

Awesome Information Retrieval / Datasets

Awesome Information Retrieval / Talks

Awesome Information Retrieval / Conferences

Awesome Information Retrieval / Blogs

Backlinks from these awesome lists: