0% found this document useful (0 votes)
4 views

Information Retrieval

The document outlines a midterm exam for the Advanced Software Engineering course, including questions on term-document incidence matrices, positional indexes, and F1 measures for information retrieval systems. It provides specific tasks for students to complete, such as drawing matrices and calculating performance metrics. Additionally, it includes true/false questions related to information retrieval concepts.

Uploaded by

kikofifo5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Information Retrieval

The document outlines a midterm exam for the Advanced Software Engineering course, including questions on term-document incidence matrices, positional indexes, and F1 measures for information retrieval systems. It provides specific tasks for students to complete, such as drawing matrices and calculating performance metrics. Additionally, it includes true/false questions related to information retrieval concepts.

Uploaded by

kikofifo5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Faculty of Engineering & Technology ‫ كلية الهندسة والتكنولوجيا‬-IUST

Department: Computer Exam: Mid


Date: 13/04/2024 Semester: 1st Year: 2023/2024
and Informatics Solutions
Engineering
Course No.: 306512 Course Name: Advanced Software Engineering Instructor: Dr.-Ing. Basel Hasan
Student No.: Student Name: Number of Pages: 1

Question #1 (10 Marks)


Consider the following document collection:
- D1 A D C M C A
- D2 B A M
- D3 C A R
- D4 A M C
1. Draw the term-document incidence matrix for this collection. Use the resulted matrix to process
the query (A or C) AND NOT R.
2. Draw the positional index representation for this collection. Explain a way to use this index to
process the query A NEAR/2 M.
2m

D1 D2 D3 D4
A 1 1 1 1
(1111 or 1011) and not(0010) =
D 1 0 0 0 1111 and 1101 = 1101 → d1, d2, d4 2m
C 1 0 1 1
M 1 1 0 1
B 0 1 0 0
R 0 0 1 0

3m
A 1: ( 1,6); 2:(2); 3:(2); 4:(1)
B 2: ( 1);
C 1: ( 3,5); 3:(1); 4: ( 3);
D 1: ( 2);
M 1: ( 4); 2:(3); 4: ( 2);
R 3: ( 3);
A NEAR/2 M 3m
take the postings list for A → 1: ( 1,6); 2:(2); 3:(2); 4:(1)
take the postings list for M → 1: ( 4); 2:(3); 4: ( 2);
merge on equals docIDs → we get the documents 1, 2, 4
compare the positions → D2 and D4 will be retrieved.
Question #2 (10 Marks)
Consider an information need for which there are 7 relevant documents A B
in the collection. Two IR systems (A and B) run on this collection. Their 1 N R
top 10 results are judged for relevance as shown aside. 2 N R
3 R N
1. Compute the F1 measure for each system?
4 R N
2. Based on the resulted F1, which system performs better, system A or
5 R R
system B?
6 N N
3. What is the user model behind F1 measure? 7 N
4. If we want the precision to be 2 times more important than recall, which 8 N
system performs better, system A or system B? 9 R
10 N

1.
For System A:
P = 4/10 = 0.4 R = 4/7 ≈ 0.57
F1 = 2*P*R / (P+R) = 2 * 0.4 * 0.57 / (0.4+ 0.57) ≈ 0.47 1m
For System B:
P = 3/6 = 0.5 R = 3/7 ≈ 0.43
F1 = 2*P*R / (P+R) = 2 * 0.5 * 0.43 / (0.5 + 0.43) ≈ 0.46 1m

→ System A performs better as it has higher F1 score. 1m


2. The user wants to get as much as relevant documents with as less as irrelevant documents. 2m
3.

𝛽 = 1/2
For System A:
Fᵦ = (𝛽2 + 1)*P*R / (𝛽2 ∗P+R) = (0.25+1) * 0.4 * 0.57 / (0.25*0.4+ 0.57) ≈ 0.43 2m
For System B:
Fᵦ = (𝛽2 + 1)*P*R / (𝛽2 ∗P+R) = (0.25+1) * 0.5 * 0.43 / (0.25*0.5 + 0.43) ≈ 0.48 2m
➔ System B performs better as it has higher Fᵦ score. 1m
Question #3 (5 Marks)
Answer with true or false and correct the false statements.
1. The main three IR System components are: documents, query and relevance judgments. False.
… and relevant documents.
2. Terms are the output of tokenization process. False. ..of normalization..
3. Consider N = 500 documents, each with max of 10 words, M = 300 distinct terms among these
documents. The term-document incidence matrix for this collection includes 500 1’s at maximum.
False. 5000 1’s
4. In the BOW concept, re-ordering the words in a document doesn’t destroy its topic. True.
5. Relevance has a value with respect to the information need. True.

Good Luck
Dr.-Ing. Basel Hasan

You might also like