Components

Select Connection: INPUT[inlineListSuggester(optionQuery(#area)):connections] Date Created: INPUT[dateTime(defaultValue(null)):Date_Created] Due Date: INPUT[dateTime(defaultValue(null)):Due_Date] Priority Level: INPUT[inlineSelect(option(1 Critical), option(2 High), option(3 Medium), option(4 Low)):Priority_Level] Status: INPUT[inlineSelect(option(1 To Do), option(2 In Progress), option(3 Testing), option(4 Completed), option(5 Blocked)):Status]

Metrics for the whole pipeline

AUC

Computing the similarity will result in two distributions of similarity scores: one for the positive samples (similar documents) and one for the negative samples (non relevant documents). We want them separated by a good threshold.

To measure how good they are separated, the ROC curve can be used, or more specifically, the AUC, which is the area under the curve.

Definitions:

FP (False positives) → unrelated patents considered relevant
FN (False negatives) → related patents considered irrelevant
TN (True negatives) → random parents correctly considered as irrelevant
TP (True positives) → correctly detected patents Compute:
TPR (True Positive Rate or recall): $TPR = \frac{TP}{TP + FN}$
FPR (False Positive Rate): $FPR = \frac{FP}{FP + TN}$
FNR (False Negative Rate): $FNR = \frac{FN}{TP + FN}$

By plotting the TPR against the FPR, we can obtain the graph of the ROC curve. Finally the area under the ROC curve (AUC) is a range between 0.5 (no separation between distributions) and 1 (clear distinction).

Metrics before the threshold

Precision@k

number of relevant documents in the top-k search results

P @ k = \frac{∣ R _{k} ∣}{∣ S _{k} ∣}

where $R_{k}$ are the relevant items in $S_{k}$ .

Mean reciprocal rank (MRR)

measures how quickly a ranking system can show the first relevant item.

π

AP

Average Precision (AP) for the final result. Precision (P=TP+FP) and recall can be plotted against each other for n different threshold, the area is the AP:

A P = n \sum (R_{n} - R_{n - 1}) P_{n}

nDCG (Normalized Discounted Cumulative Gain)

Graded relevance scale of documents to evaluate the gain of a document based on its position in the result list.

DCG = sum the gain of the results discounted by their position in the result list

D C G_{p} = i = 1 \sum p \frac{re l _{i}}{lo g _{2} ( i + 1 )}

where $re l_{i}$ is the graded relevance of the result at position $i$ .

→ should be normalized across queries, since the result list length may vary depending on the query:

n D C G_{p} = \frac{D C G _{p}}{I D C G _{p}}

where IDCG is the ideal discounted cumulative gain:

I D C G_{p} = i = 1 \sum ∣ RE L ∣_{p} \frac{2 ^{re l_{i}} - 1}{lo g _{2} ( i + 1 )}

( $RE L_{p}$ is the list of ordered relevant documents up to position $p$ )

Comments

For this to work, we need a ground truth dataset, in my knowledge with human labeled data.

🌱 Enrico's Digital Garden

Explorer

Proposal