Metrics
accuracy
“Class for Metric Accuracy
- class mindnlp.engine.metrics.accuracy.Accuracy(name='Accuracy')[source]
Bases:
MetricCalculates accuracy. The function is shown as follows:
\[\text{ACC} =\frac{\text{TP} + \text{TN}} {\text{TP} + \text{TN} + \text{FP} + \text{FN}}\]where ACC is accuracy, TP is the number of true posistive cases, TN is the number of true negative cases, FP is the number of false posistive cases, FN is the number of false negative cases.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import nn, Tensor >>> from mindnlp.common.metrics import Accuracy >>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32) >>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32) >>> metric = Accuracy() >>> metric.update(preds, labels) >>> acc = metric.eval() >>> print(acc) 0.6666666666666666
- eval()[source]
Computes and returns the accuracy.
- Returns
acc (float) - The computed result.
- Raises
RuntimeError – If the number of samples is 0.
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, numpy.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.
labels (Union[Tensor, list, numpy.ndarray]): Ground truth value. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
ValueError – class numbers of last input predicted data and current predicted data not match.
bleu
“Class for Metric BleuScore
- class mindnlp.engine.metrics.bleu.BleuScore(n_size=4, weights=None, name='BleuScore')[source]
Bases:
MetricCalculates the BLEU score. BLEU (bilingual evaluation understudy) is a metric for evaluating the quality of text translated by machine. It uses a modified form of precision to compare a candidate translation against multiple reference translations. The function is shown as follows:
\[ \begin{align}\begin{aligned}\begin{split}BP & = \begin{cases} 1, & \text{if }c>r \\ e_{1-r/c}, & \text{if }c\leq r \end{cases}\end{split}\\BLEU & = BP\exp(\sum_{n=1}^N w_{n} \log{p_{n}})\end{aligned}\end{align} \]where c is the length of candidate sentence, and r is the length of reference sentence.
- Parameters
n_size (int) – N_gram value ranges from 1 to 4. Default: 4.
weights (Union[list, None]) – Weights of precision of each gram. Defaults to None.
name (str) – Name of the metric.
- Raises
ValueError – If the value range of n_size is not from 1 to 4.
ValueError – If the lengths of weights is not equal to n_size.
Example
>>> from mindnlp.common.metrics import BleuScore >>> cand = [["The", "cat", "The", "cat", "on", "the", "mat"]] >>> ref_list = [[["The", "cat", "is", "on", "the", "mat"], ["There", "is", "a", "cat", "on", "the", "mat"]]] >>> metric = BleuScore() >>> metric.update(cand, ref_list) >>> bleu_score = metric.eval() >>> print(bleu_score) 0.46713797772820015
- eval()[source]
Computes and returns the BLEU score.
- Returns
bleu_score (float) - The computed result.
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input cand and ref_list.
cand (list): A list of tokenized candidate sentences.
ref_list (list): A list of lists of tokenized ground truth sentences.
- Raises
ValueError – If the number of inputs is not 2.
ValueError – If the lengths of cand and ref_list are not equal.
confusion_matrix
“Class for Metric ConfusionMatrix
- class mindnlp.engine.metrics.confusion_matrix.ConfusionMatrix(class_num=2, name='ConfusionMatrix')[source]
Bases:
MetricCalculates the confusion matrix. Confusion matrix is commonly used to evaluate the performance of classification models, including binary classification and multiple classification.
- Parameters
class_num (int) – Number of classes in the dataset. Default: 2.
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.engine.metrics import ConfusionMatrix >>> preds = Tensor(np.array([1, 0, 1, 0])) >>> labels = Tensor(np.array([1, 0, 0, 1])) >>> metric = ConfusionMatrix() >>> metric.update(preds, labels) >>> conf_mat = metric.eval() >>> print(conf_mat) [[1. 1.] [1. 1.]]
- eval()[source]
Computes and returns the Confusion Matrix.
- Returns
conf_mat (np.ndarray) - The computed result.
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers and the shape of preds is \((N, C)\) or \((N,)\).
labels (Union[Tensor, list, np.ndarray]): Ground truth. The shape of labels is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
ValueError – If preds and labels do not have valid dimensions.
distinct
“Class for Metric Distinct
- class mindnlp.engine.metrics.distinct.Distinct(n_size=2, name='Distinct')[source]
Bases:
MetricCalculates the Distinct-N. Distinct-N is a metric that measures the diversity of a sentence. It focuses on the number of distinct n-gram of a sentence. The larger the number of distinct n-grams, the higher the diversity of the text. The function is shown as follows:
- Parameters
n_size (int) – N_gram value. Defaults: 2.
name (str) – Name of the metric.
Example
>>> from mindnlp.common.metrics import Distinct >>> cand_list = ["The", "cat", "The", "cat", "on", "the", "mat"] >>> metric = Distinct() >>> metric.update(cand_list) >>> distinct_score = metric.eval() >>> print(distinct_score) 0.8333333333333334
em_score
“Class for Metric EmScore
- class mindnlp.engine.metrics.em_score.EmScore(name='EmScore')[source]
Bases:
MetricCalculates the exact match (EM) score. This metric measures the percentage of predictions that match any one of the ground truth answers exactly.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.engine.metrics import EmScore >>> preds = "this is the best span" >>> examples = ["this is a good span", "something irrelevant"] >>> metric = EmScore() >>> metric.update(preds, examples) >>> em_score = metric.eval() >>> print(em_score) 0.0
f1
“Class for Metric F1Score
- class mindnlp.engine.metrics.f1.F1Score(name='F1Score')[source]
Bases:
MetricCalculates the F1 score. Fbeta score is a weighted mean of precision and recall, and F1 score is a special case of Fbeta when beta is 1. The function is shown as follows:
\[F_1=\frac{2\cdot TP}{2\cdot TP + FN + FP}\]where TP is the number of true posistive cases, FN is the number of false negative cases, FP is the number of false positive cases.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.engine.metrics import F1Score >>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]])) >>> labels = Tensor(np.array([1, 0, 1])) >>> metric = F1Score() >>> metric.update(preds, labels) >>> f1_s = metric.eval() >>> print(f1_s) [0.6666666666666666 0.6666666666666666]
- eval()[source]
Computes and returns the F1 score.
- Returns
f1_s (numpy.ndarray) - The computed result.
- Raises
RuntimeError – If the number of samples is 0.
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.
labels (Union[Tensor, list, np.ndarray]): Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
ValueError – class numbers of last input predicted data and current predicted data not match.
ValueError – If preds doesn’t have the same classes number as labels.
matthews
“Class for Metric MatthewsCorrelation
- class mindnlp.engine.metrics.matthews.MatthewsCorrelation(name='MatthewsCorrelation')[source]
Bases:
MetricCalculates the Matthews correlation coefficient (MCC). MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation. The function is shown as follows:
\[MCC=\frac{TP \times TN-FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}}\]where TP is the number of true posistive cases, TN is the number of true negative cases, FN is the number of false negative cases, FP is the number of false positive cases.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.engine.metrics import MatthewsCorrelation >>> preds = [[0.8, 0.2], [-0.5, 0.5], [0.1, 0.4], [0.6, 0.3], [0.6, 0.3]] >>> labels = [0, 1, 0, 1, 0] >>> metric = MatthewsCorrelation() >>> metric.update(preds, labels) >>> m_c_c = metric.eval() >>> print(m_c_c) 0.16666666666666666
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, numpy.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.
labels (Union[Tensor, list, numpy.ndarray]): Ground truth value. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
pearson
“Class for Metric PearsonCorrelation
- class mindnlp.engine.metrics.pearson.PearsonCorrelation(name='PearsonCorrelation')[source]
Bases:
MetricCalculates the Pearson correlation coefficient (PCC). PCC is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.engine.metrics import PearsonCorrelation >>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32) >>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32) >>> metric = PearsonCorrelation() >>> metric.update(preds, labels) >>> p_c_c = metric.eval() >>> print(p_c_c) 0.9985229081857804
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers and the shape of preds is \((N, 1)\).
labels (Union[Tensor, list, np.ndarray]): Ground truth. labels is a list of floating numbers and the shape of preds is \((N, 1)\).
- Raises
ValueError – If the number of inputs is not 2.
RuntimeError – If preds and labels have different lengths.
perplexity
“Class for Metric Perplexity
- class mindnlp.engine.metrics.perplexity.Perplexity(ignore_label=None, name='Perplexity')[source]
Bases:
MetricCalculates the perplexity. Perplexity is a measure of how well a probabilibity model predicts a sample. A low perplexity indicates the model is good at predicting the sample. The function is shown as follows:
\[PP(W)=P(w_{1}w_{2}...w_{N})^{-\frac{1}{N}}=\sqrt[N]{\frac{1}{P(w_{1}w_{2}...w_{N})}}\]Where \(w\) represents words in corpus.
- Parameters
ignore_label (Union[int, None]) – Index of an invalid label to be ignored when counting. If set to None, it means there’s no invalid label. Default: None.
name (str) – Name of the metric.
Examples
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.common.metrics import Perplexity >>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]])) >>> labels = Tensor(np.array([1, 0, 1])) >>> metric = Perplexity() >>> metric.update(preds, labels) >>> ppl = metric.eval() >>> print(ppl) 2.231443166940565
- eval()[source]
Computes and returns the perplexity.
- Returns
ppl (float) - The computed result.
- Raises
RuntimeError – If the sample size is 0.
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.
labels (Union[Tensor, list, np.ndarray]): Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
RuntimeError – If preds and labels have different lengths.
RuntimeError – If pred and label have different shapes.
precision
“Class for Metric Precision
- class mindnlp.engine.metrics.precision.Precision(name='Precision')[source]
Bases:
MetricCalculates precision. Precision (also known as positive predictive value) is the actual positive proportion in the predicted positive sample. It can only be used to evaluate the precision score of binary tasks. The function is shown as follows:
\[\text{Precision} =\frac{\text{TP}} {\text{TP} + \text{FP}}\]where TP is the number of true posistive cases, FP is the number of false posistive cases.
- Parameters
name (str) – Name of the metric.
Example
>>> from mindnlp.common.metrics import Precision >>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32) >>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32) >>> metric = Precision() >>> metric.update(preds, labels) >>> prec = metric.eval() >>> print(prec) [0.5 1. ]
- eval()[source]
Computes and returns the precision.
- Returns
prec (numpy.ndarray) - The computed result.
- update(*inputs)[source]
Updates local variables. If the index of the maximum of the predicted value matches the label, the predicted result is correct.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, numpy.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.
labels (Union[Tensor, list, numpy.ndarray]): Ground truth value. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
ValueError – If preds doesn’t have the same classes number as labels.
recall
“Class for Metric Recall
- class mindnlp.engine.metrics.recall.Recall(name='Recall')[source]
Bases:
MetricCalculates the recall. Recall is also referred to as the true positive rate or sensitivity. The function is shown as follows:
\[\text{Recall} =\frac{\text{TP}} {\text{TP} + \text{FN}}\]where TP is the number of true posistive cases, FN is the number of false negative cases.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.common.metrics import Recall >>> preds = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32) >>> labels = Tensor(np.array([1, 0, 1]), mindspore.int32) >>> metric = Recall() >>> metric.update(preds, labels) >>> rec = metric.eval() >>> print(rec) [1. 0.5]
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers in range \([0, 1]\) and the shape of preds is \((N, C)\) in most cases (not strictly), where \(N\) is the number of cases and \(C\) is the number of categories.
labels (Union[Tensor, list, np.ndarray]): Ground truth. labels must be in one-hot format that shape is \((N, C)\), or can be transformed to one-hot format that shape is \((N,)\).
- Raises
ValueError – If the number of inputs is not 2.
ValueError – If preds doesn’t have the same classes number as labels.
rouge
“Classes for Metrics RougeN and RougeL
- class mindnlp.engine.metrics.rouge.RougeL(beta=1.2, name='RougeL')[source]
Bases:
MetricCalculates the ROUGE-L score. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used for evaluating automatic summarization and machine translation models. ROUGE-L is calculated based on Longest Common Subsequence (LCS). The function is shown as follows:
\[ \begin{align}\begin{aligned}R_{l c s}=\frac{L C S(X, Y)}{m}\\p_{l c s}=\frac{L C S(X, Y)}{n}\\F_{l c s}=\frac{\left(1+\beta^{2}\right) R_{l c s} P_{l c s}}{R_{l c s}+\beta^{2} P_{l c s}}\end{aligned}\end{align} \]where X is the candidate sentence, Y is the reference sentence. m and n represent the length of X and Y respectively. LCS means the longest common subsequence.
- Parameters
beta (float) – A hyperparameter to decide the weight of recall. Defaults: 1.2.
name (str) – Name of the metric.
Example
>>> from mindnlp.common.metrics import RougeL >>> cand_list = ["The","cat","The","cat","on","the","mat"] >>> ref_list = [["The","cat","is","on","the","mat"], ["There","is","a","cat","on","the","mat"]] >>> metric = RougeL() >>> metric.update(cand_list, ref_list) >>> rougel_score = metric.eval() >>> print(rougel_score) 0.7800511508951408
- class mindnlp.engine.metrics.rouge.RougeN(n_size=1, name='RougeN')[source]
Bases:
MetricCalculates the ROUGE-N. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used for evaluating automatic summarization and machine translation models. ROUGE-N refers to the overlap of n-grams between candidates and reference summaries.
- Parameters
n_size (int) – N_gram value. Default: 1.
name (str) – Name of the metric.
Example
>>> from mindnlp.common.metrics import RougeN >>> cand_list = ["the", "cat", "was", "found", "under", "the", "bed"] >>> ref_list = [["the", "cat", "was", "under", "the", "bed"]] >>> metric = RougeN(2) >>> metric.update(cand_list, ref_list) >>> rougen_score = metric.eval() >>> print(rougen_score) 0.8
spearman
“Class for Metric Spearman
- class mindnlp.engine.metrics.spearman.SpearmanCorrelation(name='SpearmanCorrelation')[source]
Bases:
MetricCalculates the Spearman’s rank correlation coefficient (SRCC). It is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.
- Parameters
name (str) – Name of the metric.
Example
>>> import numpy as np >>> import mindspore >>> from mindspore import Tensor >>> from mindnlp.engine.metrics import SpearmanCorrelation >>> preds = Tensor(np.array([[0.1], [1.0], [2.4], [0.9]]), mindspore.float32) >>> labels = Tensor(np.array([[0.0], [1.0], [2.9], [1.0]]), mindspore.float32) >>> metric = SpearmanCorrelation() >>> metric.update(preds, labels) >>> s_r_c_c = metric.eval() >>> print(s_r_c_c) 1.0
- update(*inputs)[source]
Updates local variables.
- Parameters
inputs –
Input preds and labels.
preds (Union[Tensor, list, np.ndarray]): Predicted value. preds is a list of floating numbers and the shape of preds is \((N, 1)\).
labels (Union[Tensor, list, np.ndarray]): Ground truth. labels is a list of floating numbers and the shape of preds is \((N, 1)\).
- Raises
ValueError – If the number of inputs is not 2.
RuntimeError – If preds and labels have different lengths.
Callbacks.