Components

Select Connection: INPUT[inlineListSuggester(optionQuery(#area)):connections] Date Created: INPUT[dateTime(defaultValue(null)):Date_Created] Due Date: INPUT[dateTime(defaultValue(null)):Due_Date] Priority Level: INPUT[inlineSelect(option(1 Critical), option(2 High), option(3 Medium), option(4 Low)):Priority_Level] Status: INPUT[inlineSelect(option(1 To Do), option(2 In Progress), option(3 Testing), option(4 Completed), option(5 Blocked)):Status]

Metrics for the whole pipeline

AUC

Computing the similarity will result in two distributions of similarity scores: one for the positive samples (similar documents) and one for the negative samples (non relevant documents). We want them separated by a good threshold.

To measure how good they are separated, the ROC curve can be used, or more specifically, the AUC, which is the area under the curve.

Definitions:

  • FP (False positives) unrelated patents considered relevant
  • FN (False negatives) related patents considered irrelevant
  • TN (True negatives) random parents correctly considered as irrelevant
  • TP (True positives) correctly detected patents Compute:
  • TPR (True Positive Rate or recall):
  • FPR (False Positive Rate):
  • FNR (False Negative Rate):

By plotting the TPR against the FPR, we can obtain the graph of the ROC curve. Finally the area under the ROC curve (AUC) is a range between 0.5 (no separation between distributions) and 1 (clear distinction).

Metrics before the threshold

Precision@k

number of relevant documents in the top-k search results

where are the relevant items in .

Mean reciprocal rank (MRR)

measures how quickly a ranking system can show the first relevant item.

AP

Average Precision (AP) for the final result. Precision (P=TP+FP) and recall can be plotted against each other for n different threshold, the area is the AP:

nDCG (Normalized Discounted Cumulative Gain)

Graded relevance scale of documents to evaluate the gain of a document based on its position in the result list.

DCG = sum the gain of the results discounted by their position in the result list

where is the graded relevance of the result at position .

should be normalized across queries, since the result list length may vary depending on the query:

where IDCG is the ideal discounted cumulative gain:

( is the list of ordered relevant documents up to position )

Comments

For this to work, we need a ground truth dataset, in my knowledge with human labeled data.