CS 242: Information Retrieval & Web Search
Winter 2024
General Info
Instructor: Vagelis Hristidis
Lecture time: M 5:00-6:20 pm, W 5:30-6:50 pm Location: Room A125 Office hour: Wednesday 4:30-5:30 pm (WCH 317) |
TAs: Shihab Rashid Meem Office hours: Shihab: Wednesdays 1:00 - 2:00 PM, WCH 363 Meem, Thursdays 1:00-2:00 pm, WCH 363 Reader: Pooja Patil |
Grading
25% quizzes (worst 2 quizzes will be discarded, MSOL students will have until the weekend to take the quiz; others will take the quiz during the lecture time) 25% midterm 15% assignment 35% project |
Course Description
Information
Retrieval (IR) principles including indexing and searching document
collections, Web search and advanced topics like deep learning and search in social networks.
Some of the topics which will be tentatively presented are:
Assignment
Project
Late submissions, submitted before assignments or projects are graded, will receive a 20% score reduction.
Tentative Lectures' Schedule
Date |
Topic |
Book Chapters |
supplemental material for further reading |
Jan 8 |
Class Overview, Overview of Information
Retrieval and Search Engines |
Ch. 1, 2 |
|
Jan 10, 17 |
|
Ch 7.1, 7.2, 7.3 (except 7.3.2)
|
|
Jan 22 | Hands-on Scrapy and Lucene (by TA) | Slides | |
Jan 24, 29 | Crawling, Storing |
|
(p1) Heydon, A. and Najork, M. 1999.Mercator: A scalable, extensible Web crawler. World Wide Web 2, 4 (Apr. 1999), 219-229. (slides) |
Jan 31, Feb 5 |
Indexing,
MapReduce, Query Processing |
Ch. 5 (except 5.4.2-5.4.7, 5.7.4-5.7.5), slides Ch. 5 |
(p2) R.
Fagin, Amnon Lotem and Moni Naor.
Optimal aggregation algorithms for middleware J. Computer and System
Sciences 66 (2003), pp. 614-656. Extended abstract appeared in Proc. 2001 ACM
Symposium on Principles of Database Systems (PODS '01), pp. 102-113 (p6) Jeffrey Dean and Sanjay Ghemawat.MapReduce: simplified data processing on large clusters. OSDI 2004 |
Feb 7, 12 | Link Analysis |
C slides: link-based
search |
(p4) L. Page, S. Brin, R. Motwani,
T.Winograd. The PageRank Citation
Ranking: Bringing Order to the Web. 1999 (p5) J. Kleinberg. Authoritative sources in a
hyperlinked environment. Journal of the ACM 46(1999). |
Feb 14 | Hands-on BERT with PyTorch and Faiss (by TA) | slides | https://github.com/facebookresearch/faiss/wiki/Getting-started |
Feb 21 |
Evaluation |
Ch.
8,
slides Ch. 8-short |
(p3) R.
Fagin, Ravi Kumar and D.Sivakumar: Comparing top-k lists. SIAM J.
Discrete Mathematics 17, 1 (2003) |
Feb 26 |
|
||
Feb 28 | MIDTERM | ||
Mar 4 , 6 | Deep learning and IR |
|
Lin, Jimmy, Rodrigo Nogueira, and Andrew Yates. "Pretrained transformers for text ranking: Bert and beyond." Synthesis Lectures on Human Language Technologies 14, no. 4 (2021): 1-325. Ouyang, Long, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang et al. "Training language models to follow instructions with human feedback." arXiv preprint arXiv:2203.02155 (2022). |
Mar 11 | no class | ||
Mar 13 |
Project Presentations 5-7:00 pm at SSC 229 |
|
|
Interesting topics but no time to present in class |
Ad words |
|
|
Relational DB and XML
Search |
1.
IR and DB |
(p13) Sara
Cohen, Jonathan Mamou,Yaron Kanza, Yehoshua Sagiv: XSEarch: A Semantic Search Engine for XML.
45-56, VLDB 2004 (p14) L.
Guo, F. Shao, C. Botev, J.Shanmugasundaram: XRANK:
Ranked Keyword Search over XML Documents. SIGMOD 2003 |
|
Web Search: Spam, topic-specific pagerank |
2.
Alexandros
Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly. 2006. Detecting spam
web pages through content analysis. In Proceedings of the 15th international
conference on World Wide Web (WWW '06) 3.
Taher
H. Haveliwala, "Topic-Sensitive PageRank: A
Context-Sensitive Ranking Algorithm for Web Search," IEEE
Transactions on Knowledge and Data Engineering, vol. 15,
no. 4, pp. 784-796, Jul/Aug, 2003. |
||
Text Processing, Query Refinement, Results Presentation web search advertising (if time) |
Ch. 4.1, 4.2, 4.3,
slides Ch. 4 |
G. Salton, C Buckley. Improving retrieval
performance by relevance feedback. Journal of the American Society
for Information Science, 1990
|
Other Resources
Textbook
Free download at https://ciir.cs.umass.edu/downloads/SEIRiP.pdf
Search Engines:
Information Retrieval in Practice
Bruce Croft, Donald Metzler, Trevor Strohman
Addison Wesley; 1 edition
ISBN-10: 0136072240
ISBN-13: 978-0136072249
http://www.search-engines-book.com/
Also recommended for reference:
Policies
Academic Integrity: https://conduct.ucr.edu/