Components
Select Connection: INPUT[inlineListSuggester(optionQuery(#area)):connections]
Date Created: INPUT[dateTime(defaultValue(null)):Date_Created]
Due Date: INPUT[dateTime(defaultValue(null)):Due_Date]
Priority Level: INPUT[inlineSelect(option(1 Critical), option(2 High), option(3 Medium), option(4 Low)):Priority_Level]
Status: INPUT[inlineSelect(option(1 To Do), option(2 In Progress), option(3 Testing), option(4 Completed), option(5 Blocked)):Status]
Metrics for the whole pipeline
AUC
Computing the similarity will result in two distributions of similarity scores: one for the positive samples (similar documents) and one for the negative samples (non relevant documents). We want them separated by a good threshold.
To measure how good they are separated, the ROC curve can be used, or more specifically, the AUC, which is the area under the curve.
Definitions:
- FP (False positives) → unrelated patents considered relevant
- FN (False negatives) → related patents considered irrelevant
- TN (True negatives) → random parents correctly considered as irrelevant
- TP (True positives) → correctly detected patents Compute:
- TPR (True Positive Rate or recall):
- FPR (False Positive Rate):
- FNR (False Negative Rate):
By plotting the TPR against the FPR, we can obtain the graph of the ROC curve. Finally the area under the ROC curve (AUC) is a range between 0.5 (no separation between distributions) and 1 (clear distinction).
Metrics before the threshold
Precision@k
number of relevant documents in the top-k search results
where are the relevant items in .
Mean reciprocal rank (MRR)
measures how quickly a ranking system can show the first relevant item.
AP
Average Precision (AP) for the final result. Precision (P=TP+FP) and recall can be plotted against each other for n different threshold, the area is the AP:
nDCG (Normalized Discounted Cumulative Gain)
Graded relevance scale of documents to evaluate the gain of a document based on its position in the result list.
DCG = sum the gain of the results discounted by their position in the result list
where is the graded relevance of the result at position .
→ should be normalized across queries, since the result list length may vary depending on the query:
where IDCG is the ideal discounted cumulative gain:
( is the list of ordered relevant documents up to position )
Comments
For this to work, we need a ground truth dataset, in my knowledge with human labeled data.