Skip to content

evaluation

ci(gold_truths, predictions)

Computes concordance index (CI) between the expected values and predictions. See Gönen and Heller (2005) for the details of the metric.

Parameters:

Name Type Description Default
gold_truths List[float]

The gold labels in the dataset.

required
predictions List[float]

Predictions of a model.

required

Returns:

Type Description
float

Concordance index.

Source code in pydebiaseddta/evaluation.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def ci(gold_truths: List[float], predictions: List[float]) -> float:
    """Computes concordance index (CI) between the expected values and predictions. 
    See [Gönen and Heller (2005)](https://www.jstor.org/stable/20441249) for the details of the metric.

    Parameters
    ----------
    gold_truths : List[float]
        The gold labels in the dataset.  
    predictions : List[float]
        Predictions of a model.

    Returns
    -------
    float
        Concordance index.
    """
    gold_combs, pred_combs = combinations(gold_truths, 2), combinations(predictions, 2)
    nominator, denominator = 0, 0
    for (g1, g2), (p1, p2) in zip(gold_combs, pred_combs):
        if g2 > g1:
            nominator = nominator + 1 * (p2 > p1) + 0.5 * (p2 == p1)
            denominator = denominator + 1

    return float(nominator / denominator)

evaluate_predictions(gold_truths, predictions, metrics=None)

Computes multiple metrics with a single call for convenience.

Parameters:

Name Type Description Default
gold_truths List[float]

The gold labels in the dataset.

required
predictions List[float]

Predictions of a model.

required
metrics List[str]

Name of the evaluation metrics to compute. Possible values are: {"ci", "r2", "rmse", "mse"}. All metrics are computed if no value is provided.

None

Returns:

Type Description
Dict[str, float]

A dictionary that maps each metric name to the computed value.

Source code in pydebiaseddta/evaluation.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
def evaluate_predictions(
    gold_truths: List[float], predictions: List[float], metrics: List[str] = None
) -> Dict[str, float]:
    """Computes multiple metrics with a single call for convenience. 

    Parameters
    ----------
    gold_truths : List[float]
        The gold labels in the dataset.
    predictions : List[float]
        Predictions of a model.
    metrics : List[str]
        Name of the evaluation metrics to compute. Possible values are: `{"ci", "r2", "rmse", "mse"}`.
        All metrics are computed if no value is provided.

    Returns
    -------
    Dict[str,float]
        A dictionary that maps each metric name to the computed value.
    """
    if metrics is None:
        metrics = ["ci", "r2", "rmse", "mse"]

    metrics = [metric.lower() for metric in metrics]
    name_to_fn = {"ci": ci, "r2": r2, "rmse": rmse, "mse": mse}
    return {metric: name_to_fn[metric](gold_truths, predictions) for metric in metrics}

mse(gold_truths, predictions)

Computes mean squared error between expected and predicted values.

Parameters:

Name Type Description Default
gold_truths List[float]

The gold labels in the dataset.

required
predictions List[float]

Predictions of a model.

required

Returns:

Type Description
float

Mean squared error.

Source code in pydebiaseddta/evaluation.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def mse(gold_truths: List[float], predictions: List[float]) -> float:
    """Computes mean squared error between expected and predicted values.

    Parameters
    ----------
    gold_truths : List[float]
        The gold labels in the dataset.
    predictions : List[float]
        Predictions of a model.

    Returns
    -------
    float
        Mean squared error.
    """
    return float(mean_squared_error(gold_truths, predictions, squared=True))

r2(gold_truths, predictions)

Compute \(R^2\) (coefficient of determinant) between expected and predicted values.

Parameters:

Name Type Description Default
gold_truths List[float]

The gold labels in the dataset.

required
predictions List[float]

Predictions of a model.

required

Returns:

Type Description
float

\(R^2\) (coefficient of determinant) score.

Source code in pydebiaseddta/evaluation.py
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
def r2(gold_truths: List[float], predictions: List[float]) -> float:
    """Compute $R^2$ (coefficient of determinant) between expected and predicted values.

    Parameters
    ----------
    gold_truths : List[float]
        The gold labels in the dataset.
    predictions : List[float]
        Predictions of a model.

    Returns
    -------
    float
        $R^2$ (coefficient of determinant) score.
    """
    return float(r2_score(gold_truths, predictions))

rmse(gold_truths, predictions)

Computes root mean squared error between expected and predicted values.

Parameters:

Name Type Description Default
gold_truths List[float]

The gold labels in the dataset.

required
predictions List[float]

Predictions of a model.

required

Returns:

Type Description
float

Root mean squared error.

Source code in pydebiaseddta/evaluation.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
def rmse(gold_truths: List[float], predictions: List[float]) -> float:
    """Computes root mean squared error between expected and predicted values.

    Parameters
    ----------
    gold_truths : List[float]
        The gold labels in the dataset.
    predictions : List[float]
        Predictions of a model.

    Returns
    -------
    float
        Root mean squared error.
    """
    return float(mean_squared_error(gold_truths, predictions, squared=False))