Information Retrieval
Information Retrieval
Unit 1
Foundations of Information Retrieval
1. Define Information Retrieval (IR) and explain its goals.
2. Discuss the key components of an IR system.
3. What are the major challenges faced in Information Retrieval?
4. Provide examples of applications of Information Retrieval.
Retrieval Models
1. Describe the Boolean model in Information Retrieval. Discuss Boolean operators and
query processing.
2. Explain the Vector Space Model (VSM) in Information Retrieval. Discuss TF-IDF,
cosine similarity, and query-document matching.
3. What is the Probabilistic Model in Information Retrieval? Discuss Bayesian retrieval
and relevance feedback.
4. How does cosine similarity measure the similarity between queries and documents
in the Vector Space Model?
5. What is relevance feedback in the context of retrieval models? How does it enhance
search results?
Spelling Correction in IR Systems
1. What are the challenges posed by spelling errors in queries and documents?
2. What is edit distance, and how is it used in measuring string similarity? Provide
examples.
3. Discuss string similarity measures used for spelling correction in IR systems.
4. Describe techniques employed for spelling correction in IR systems. Assess their
effectiveness and limitations.
5. What is the Soundex Algorithm and how does it address spelling errors in IR
systems?
6. Discuss the steps involved in the Soundex Algorithm for phonetic matching.
Performance Evaluation
1. Define evaluation metrics used in Information Retrieval, including precision, recall,
and F-measure.
2. Explain the concept of average precision in evaluating IR systems.
3. Explain the importance of test collections and relevance judgments in evaluating
Information Retrieval systems.
4. Discuss the process of relevance judgments and their importance in performance
evaluation.
5. Describe experimental design and significance testing in the context of evaluating
IR systems.
6. Discuss significance testing in Information Retrieval and its role in performance
evaluation.
Numericals
1. Given the following document-term matrix:
Document Terms
Doc1 cat, dog, fish
Doc2 cat, bird, fish
Doc3 dog, bird, elephant
Doc4 cat, dog, elephant
Construct the posting list for each term: cat, dog, fish, bird, elephant.
Calculate the TF-IDF score for each term-document pair using the following TF and IDF
calculations:
● Term Frequency (TF) = (Number of occurrences of the term in the document) /
(Total number of terms in the document)
● Inverse Document Frequency (IDF) = log(Total number of documents / Number of
documents containing the term) + 1
5. Given the term-document matrix and the TF-IDF scores calculated from Problem 4,
calculate the cosine similarity between each pair of documents (Doc1, Doc2), (Doc1,
Doc3), (Doc1, Doc4), (Doc2, Doc3), (Doc2, Doc4), and (Doc3, Doc4).
Calculate the cosine similarity between each query and each document from the
term-document matrix in Problem 4.
8. Suppose you have a test collection with 50 relevant documents for a given query.
Your retrieval system returns 30 documents, out of which 20 are relevant. Calculate
the Recall, Precision, and F-score for this retrieval.
● Recall = (Number of relevant documents retrieved) / (Total number of relevant
documents)
● Precision = (Number of relevant documents retrieved) / (Total number of
documents retrieved)
● F-score = 2 * (Precision * Recall) / (Precision + Recall)
9. You have a test collection containing 100 relevant documents for a query. Your
retrieval system retrieves 80 documents, out of which 60 are relevant. Calculate the
Recall, Precision, and F-score for this retrieval.
10. In a test collection, there are a total of 50 relevant documents for a query. Your
retrieval system retrieves 60 documents, out of which 40 are relevant. Calculate the
Recall, Precision, and F-score for this retrieval.
11. You have a test collection with 200 relevant documents for a query. Your retrieval
system retrieves 150 documents, out of which 120 are relevant. Calculate the Recall,
Precision, and F-score for this retrieval.
12. In a test collection, there are 80 relevant documents for a query. Your retrieval
system retrieves 90 documents, out of which 70 are relevant. Calculate the Recall,
Precision, and F-score for this retrieval.
13. Construct 2-gram, 3-gram and 4-gram index for the following terms:
a. banana
b. pineapple
c. computer
d. programming
e. elephant
f. Database
14. Calculate the Levenshtein distance between the following pair of words:
a. kitten and sitting
b. intention and execution
c. robot and orbit
d. power and flower
Unit 2
Text Categorization and Filtering:
1. Define text categorization and explain its importance in information retrieval
systems. Discuss the challenges associated with text categorization.
2. Discuss the Naive Bayes algorithm for text classification. How does it work, and
what are its assumptions?
3. Explain Support Vector Machines (SVM) and their application in text categorization.
How does SVM handle text classification tasks?
4. Compare and contrast the Naive Bayes and Support Vector Machines (SVM)
algorithms for text classification. Highlight their strengths and weaknesses.
5. Describe feature selection and dimensionality reduction techniques used in text
categorization. Why are these techniques important?
6. Discuss the applications of text categorization and filtering in real-world scenarios
such as spam detection, sentiment analysis, and news categorization.
Learning to Rank
1. Explain the concept of learning to rank and its importance in search engine result
ranking.
2. Discuss algorithms and techniques used in learning to rank for Information Retrieval.
Explain the principles behind RankSVM, RankBoost, and their application in ranking
search results.
3. Compare and contrast pairwise and listwise learning to rank approaches. Discuss
their advantages and limitations.
4. Explain evaluation metrics used to assess the performance of learning to rank
algorithms. Discuss metrics such as Mean Average Precision (MAP), Normalized
Discounted Cumulative Gain (NDCG), and Precision at K (P@K).
5. Discuss the role of supervised learning techniques in learning to rank and their
impact on search engine result quality.
6. How does supervised learning for ranking differ from traditional relevance feedback
methods in Information Retrieval? Discuss their respective advantages and
limitations.
7. Describe the process of feature selection and extraction in learning to rank. What are
the key features used to train ranking models, and how are they selected or
engineered?
Numerical Questions
1. Consider a simplified web graph with the following link structure:
• Page A has links to pages B, C, and D.
• Page B has links to pages C and E.
• Page C has links to pages A and D.
• Page D has a link to page E.
• Page E has a link to page A.
Using the initial authority and hub scores of 1 for all pages, calculate the authority and
hub
scores for each page after one/two iteration(s) of the HITS algorithm.
Text Summarization:
1. Explain the difference between extractive and abstractive text summarization
methods. Compare their advantages and disadvantages.
2. Describe common techniques used in extractive text summarization, such as
graph-based methods and sentence scoring approaches.
3. Discuss challenges in abstractive text summarization and recent advancements in
neural network-based approaches.
4. Discuss common evaluation metrics used to assess the quality of text summaries,
such as ROUGE and BLEU. Explain how these metrics measure the similarity
between generated summaries and reference summaries.
Question Answering:
1. Discuss different approaches for question answering in information retrieval,
including keyword-based, document retrieval, and passage retrieval methods.
2. Explain how natural language processing techniques such as Named Entity
Recognition (NER) and semantic parsing contribute to question answering systems.
3. Provide examples of question answering systems and evaluate their effectiveness in
providing precise answers.
4. Discuss the challenges associated with question answering, including ambiguity
resolution, answer validation, and handling of incomplete or noisy queries.
Recommender Systems:
1. Define collaborative filtering and content-based filtering in recommender systems.
Compare their strengths and weaknesses.
2. Explain how collaborative filtering algorithms such as user-based and item-based
methods work. Discuss techniques to address the cold start problem in collaborative
filtering.
3. Describe content-based filtering approaches, including feature extraction and
similarity measures used in content-based recommendation systems.