# F1의 공식
F1 = 2*(Precision*Recall) / (Precision+Recall)
# F1값 해석
F1값이 높으면 Precision과 Recall 모두 좋은 결과를 보인다는 것을 의미한다.
F1값이 낮으면 False Positive가 문제가 있는 것인지
False Negative가 문제가 있는 것인지 정확히 알 수가 없다.
따라서 이 경우에는 디버깅이 필요하다.
# TP, FP, FN, TN
- TP : 내가 1로 예측을 했는데 정답도 1인 경우, 즉 양성이라고 한 예측이 정답과 일치하는 경우에 TP입니다. 진짜 true로 양성 positive라고 한 경우인 것이죠
- FP는 내가 1로 예측을 했지만 정답은 0인 경우, 즉 양성이라고 한 예측이 정답과 불일치하는 경우가 FP입니다. 가짜 false로 양성 positive라고 한 경우인 것이죠
- FN은 내가 0으로 예측을 했지만 정답은 1인 경우, 즉 음성이라고 한 예측이 정답과 불일치하는 경우가 FN입니다. 가짜 false로 음성 negative라고 한 경우인 것이죠.
- TN는 내가 0으로 예측을 했는데 정답도 0인 경우, 즉 음성이라고 한 예측이 정답과 일치하는 경우에 TN입니다. 진짜 true로 음성 negative라고 한 경우인 것이죠
# Precision
precision=TP/(TP+FP)
# Recall
recall=TP/(TP+FN)
여기서 precision의 의미는 양성 positive라고 판단한 경우 중에 진짜 true 양성 positive인 확률입니다. 즉 정확도라고 보시면 됩니다.
recall은 실제로 양성 positive 케이스들에서 진짜 true 양성 positive로 예측에 성공한 확률입니다. 즉 내가 예측했어야하는 케이스들에서 놓치지 않고 예측해서 진짜를 잡아낸 경우라고 보시면 됩니다.
Accuracy
It’s the ratio of the correctly labeled subjects to the whole pool of subjects.
Accuracy is the most intuitive one.
Accuracy answers the following question: How many students did we correctly label out of all the students?
Accuracy = (TP+TN)/(TP+FP+FN+TN)
numerator: all correctly labeled subject (All trues)
denominator: all subjects
Precision
Precision is the ratio of the correctly +ve labeled by our program to all +ve labeled.
Precision answers the following: How many of those who we labeled as diabetic are actually diabetic?
Precision = TP/(TP+FP)
numerator: +ve labeled diabetic people.
denominator: all +ve labeled by our program (whether they’re diabetic or not in reality).
Recall (aka Sensitivity)
Recall is the ratio of the correctly +ve labeled by our program to all who are diabetic in reality.
Recall answers the following question: Of all the people who are diabetic, how many of those we correctly predict?
Recall = TP/(TP+FN)
numerator: +ve labeled diabetic people.
denominator: all people who are diabetic (whether detected by our program or not)
F1-score (aka F-Score / F-Measure)
F1 Score considers both precision and recall.
It is the harmonic mean(average) of the precision and recall.
F1 Score is best if there is some sort of balance between precision (p) & recall (r) in the system. Oppositely F1 Score isn’t so high if one measure is improved at the expense of the other.
For example, if P is 1 & R is 0, F1 score is 0.
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
Specificity
Specificity is the correctly -ve labeled by the program to all who are healthy in reality.
Specifity answers the following question: Of all the people who are healthy, how many of those did we correctly predict?
Specificity = TN/(TN+FP)
numerator: -ve labeled healthy people.
denominator: all people who are healthy in reality (whether +ve or -ve labeled)
General Notes
Yes, accuracy is a great measure but only when you have symmetric datasets (false negatives & false positives counts are close), also, false negatives & false positives have similar costs.
If the cost of false positives and false negatives are different then F1 is your savior. F1 is best if you have an uneven class distribution.
Precision is how sure you are of your true positives whilst recall is how sure you are that you are not missing any positives.
Choose Recall if the idea of false positives is far better than false negatives, in other words, if the occurrence of false negatives is unaccepted/intolerable, that you’d rather get some extra false positives(false alarms) over saving some false negatives, like in our diabetes example.
You’d rather get some healthy people labeled diabetic over leaving a diabetic person labeled healthy.
Choose precision if you want to be more confident of your true positives. for example, Spam emails. You’d rather have some spam emails in your inbox rather than some regular emails in your spam box. So, the email company wants to be extra sure that email Y is spam before they put it in the spam box and you never get to see it.
Choose Specificity if you want to cover all true negatives, meaning you don’t want any false alarms, you don’t want any false positives. for example, you’re running a drug test in which all people who test positive will immediately go to jail, you don’t want anyone drug-free going to jail. False positives here are intolerable.
Bottom Line is
— Accuracy value of 90% means that 1 of every 10 labels is incorrect, and 9 is correct.
— Precision value of 80% means that on average, 2 of every 10 diabetic labeled student by our program is healthy, and 8 is diabetic.
— Recall value is 70% means that 3 of every 10 diabetic people in reality are missed by our program and 7 labeled as diabetic.
— Specificity value is 60% means that 4 of every 10 healthy people in reality are miss-labeled as diabetic and 6 are correctly labeled as healthy.
Confusion Matrix
Wikipedia will explain it better than me
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa).The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
A nice & easy how-to of calculating a confusion matrix is here.
from sklearn.metrics import confusion_matrix
>>>tn, fp, fn, tp = confusion_matrix([0, 1, 0, 1],
[1, 1, 1, 0]).ravel()
# true negatives, false positives, false negatives, true positives
>>>(tn, fp, fn, tp)
(0, 2, 1, 1)
댓글 없음:
댓글 쓰기