c3-paper(1)
c3-paper(1)
45
pm to 8.15 pm MM: 25
Write down answer of all question in one or maximum two sentences in google form
1. Calculate Retrieval Status Value (RSV) for term_2 (based on given term-document
matrix) using Binary Independence Model for a query Q .(2.5)
Suppose for each term corresponding document value (1 or 0) showed that given term
is present or not. Relevant row (second row) show that Documents are relevant or not
for query Q
Releva 0 1 0 1 0 1 0 1
nt
term_1 0 1 1 1 0 1 0 0
term_2 1 0 1 0 0 1 0 1
term_3 1 1 1 1 0 0 0 1
term_ 1 1 1 0 1 0 1 0
4
term_5 0 1 0 0 0 0 0 1
term_6 0 0 1 0 0 1 0 0
3. For a given matrix A=[[2, 0, 1], [0, 1, 0], [0,0, 0] ] , identify the values of U, ∑ and V'.
(1+0.5+1 marks)
4. For a query, number of total relevant Documents are 17, number of total retrieved
relevant document are 10 and Total number of retrieved documents are 14, Calculate
Matthews Correlation Coefficient(mcc). (1.5 marks)
5. Calculate P(A/B) if P(B/A) = 0.2 P(A) = 0.3 and P(B) = 0.3 using Bayes theorem. (1 marks)
term_1 0 0 1
term_2 2 0 1
term_3 0 1 1
7. Apply edit distance (Levenshtein distance) b/n given two words. Write down edit
distance matrix of final state. (2 marks)
a. BEEGEGD b. FEABDFD.
9. Compute tf-idf vector for term_1 using steps 1: find term_frequency for documents
step 2 normalize terms_frequency for document to unit length. step 3 find idf using
formula log(n/n_i) step 4: multiply tf and idf. Term-Document graph is given below. (2.5
marks)
term_1 3 14 5 1 11
term_2 6 7 2 16 17
term_3 10 17 14 15 3
term_4 1 11 0 9 9
11. A fair coin is tossed, What is the a priori probability of landing a head? (0.5 marks)
12. What is different between contextual and global word embeddings. (1 marks)
14. Calculate modified query vector using Rocchio Algorithm, based on given vectors. (2
marks)
0 0 1 0 0
2 1 0 2 2
15. Write down all trigram index of word LXLELI . (0.5 marks)
17. Let there are 6 sets A, B, C, D, E, F length of set A len(A) = 100, len(B)=110, len(C)=120,
len(D)=130, len(E)=140, len(F)=150 What will be order of execution for query Q= (A or F)
and (B or E) and (C or B). (1 marks)
18. Suppose, you have some set of books in the IIIT Allahabad library, let’s say 5 books
are there and among them 3 books are of IR and 2 book are of Computer Vision. Each
book has book id (BookID) and some set of words where they belongs to either IR or
Computer Vision, please see the below mentioned table for more detail.
Now, from this information compute the prior of P(B) and P(B'). (1.5 marks)
19. What is Thesaurus-based Query Expansion. Write down one example. (1 marks).