Information Retrieval
Information Retrieval
D1 D2 D3 D4
A 1 1 1 1
(1111 or 1011) and not(0010) =
D 1 0 0 0 1111 and 1101 = 1101 → d1, d2, d4 2m
C 1 0 1 1
M 1 1 0 1
B 0 1 0 0
R 0 0 1 0
3m
A 1: ( 1,6); 2:(2); 3:(2); 4:(1)
B 2: ( 1);
C 1: ( 3,5); 3:(1); 4: ( 3);
D 1: ( 2);
M 1: ( 4); 2:(3); 4: ( 2);
R 3: ( 3);
A NEAR/2 M 3m
take the postings list for A → 1: ( 1,6); 2:(2); 3:(2); 4:(1)
take the postings list for M → 1: ( 4); 2:(3); 4: ( 2);
merge on equals docIDs → we get the documents 1, 2, 4
compare the positions → D2 and D4 will be retrieved.
Question #2 (10 Marks)
Consider an information need for which there are 7 relevant documents A B
in the collection. Two IR systems (A and B) run on this collection. Their 1 N R
top 10 results are judged for relevance as shown aside. 2 N R
3 R N
1. Compute the F1 measure for each system?
4 R N
2. Based on the resulted F1, which system performs better, system A or
5 R R
system B?
6 N N
3. What is the user model behind F1 measure? 7 N
4. If we want the precision to be 2 times more important than recall, which 8 N
system performs better, system A or system B? 9 R
10 N
1.
For System A:
P = 4/10 = 0.4 R = 4/7 ≈ 0.57
F1 = 2*P*R / (P+R) = 2 * 0.4 * 0.57 / (0.4+ 0.57) ≈ 0.47 1m
For System B:
P = 3/6 = 0.5 R = 3/7 ≈ 0.43
F1 = 2*P*R / (P+R) = 2 * 0.5 * 0.43 / (0.5 + 0.43) ≈ 0.46 1m
𝛽 = 1/2
For System A:
Fᵦ = (𝛽2 + 1)*P*R / (𝛽2 ∗P+R) = (0.25+1) * 0.4 * 0.57 / (0.25*0.4+ 0.57) ≈ 0.43 2m
For System B:
Fᵦ = (𝛽2 + 1)*P*R / (𝛽2 ∗P+R) = (0.25+1) * 0.5 * 0.43 / (0.25*0.5 + 0.43) ≈ 0.48 2m
➔ System B performs better as it has higher Fᵦ score. 1m
Question #3 (5 Marks)
Answer with true or false and correct the false statements.
1. The main three IR System components are: documents, query and relevance judgments. False.
… and relevant documents.
2. Terms are the output of tokenization process. False. ..of normalization..
3. Consider N = 500 documents, each with max of 10 words, M = 300 distinct terms among these
documents. The term-document incidence matrix for this collection includes 500 1’s at maximum.
False. 5000 1’s
4. In the BOW concept, re-ordering the words in a document doesn’t destroy its topic. True.
5. Relevance has a value with respect to the information need. True.
Good Luck
Dr.-Ing. Basel Hasan