问题：

混淆矩阵未显示实际值的正确计数。多项式回归，因子

柴俊捷

2023-03-14

我有两个向量，实际值和预测值。两者都是因子类型，有8个级别。第八层实际观测值只有55个，预测值为0。然而，当我制作一个混乱矩阵时，8级观察值消失或以某种方式移动。实际总数的列不应该等于它们的实际计数吗？

我用了两种不同的方法来反复检查。我还尝试显式地使两个向量中的因子级别相同。到目前为止运气不好。

library(nnet); library(caret)

sc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv")

# First column is ID
sc$LeagueIndex <- as.factor(sc$LeagueIndex)
sc <- sc[, -1]

# Set missing values to NA
which_qm <- sc[, c(2,3,4)] == '?'
sc[, c(2,3,4)][which_qm] <- NA
sc[, c(2,3,4)] <- apply(sc[, c(2,3,4)], 2, as.numeric)

# Set impossible values to NA
sc$TotalHours[sc$Age < sc$TotalHours/8760] <- NA
sc$HoursPerWeek[sc$HoursPerWeek >= 168] <- NA

# Fit model and store predictions
sc_mod1 <- multinom(LeagueIndex ~ ., sc)
sc_fitted1 <- predict(sc_mod1, sc)

# sc_fitted1 is missing factor level 8
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)

# sc_fitted1 has factor level 8
levels(sc_fitted1) <- levels(sc$LeagueIndex)
confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)

# What's the problem?
table(sc$LeagueIndex)
length(sc$LeagueIndex)

table(sc_fitted1)
length(sc_fitted1)

共有1个答案

翟承志

2023-03-14

它与你产生的NA值有关，它们都是针对目标变量的8级。如果您希望将级别8考虑在内，您可能必须找到另一种方法来编码这些NAs。

请尝试以下反例：

library(nnet); library(caret)

sc <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00272/SkillCraft1_Dataset.csv")

sc$LeagueIndex <- as.factor(sc$LeagueIndex)
sc <- sc[, -1]

which_qm <- sc[, c(2,3,4)] == '?'
sc[, c(2,3,4)][which_qm] <- 20   # this is just a random numeric value (not the best one to use!)
sc[, c(2,3,4)] <- apply(sc[, c(2,3,4)], 2, as.numeric)

sc_mod1 <- multinom(LeagueIndex ~ ., sc)
sc_fitted1 <- predict(sc_mod1, sc)

confusionMatrix(data = sc_fitted1, reference = sc$LeagueIndex)
table(predicted = sc_fitted1, actual = sc$LeagueIndex)

它会给你这样的东西：

         actual
predicted   1   2   3   4   5   6   7   8
        1  52  30   9   2   0   0   0   0
        2  61 123  78  58   4   1   0   0
        3  30  77 142  79  23   4   0   0
        4  21 104 248 410 252  45   0   0
        5   2  11  60 217 343 230   1   0
        6   1   2  16  45 184 333  32   2
        7   0   0   0   0   0   5   2   0
        8   0   0   0   0   0   3   0  53

类似资料：

Seaborn热图混淆矩阵显示未按预期显示

请指导我的混淆矩阵的热图显示。我尝试了不同的图大小，但没有得到正确的显示。我的代码如下和屏幕截图
多标签混淆矩阵

我正在对实际数据和来自分类器的预测数据进行多标签分类。实际数据包括三类（c1、c2和c3），同样，预测数据也包括三类（c1、c2和c3）。数据如下在多标签分类中，文档可能属于多个类别。在上述数据中，1表示文档属于特定类，0表示文档不属于特定类。第一行Actual\u数据表示文档属于c1类和c2类，不属于c3类。类似地，第一行predicted\u数据表示文档属于类别c1、c2和c3。最初我使
Tensorflow，多标签混淆矩阵

我试图弄清楚如何使用神经网络为多标签分类任务生成混淆矩阵。我之前设法使用函数“交集”计算准确性，因为对此我不关心任何排序。然而，为了计算混淆矩阵，我确实关心预测/标签的索引顺序。由于标签的值始终相同（
混淆矩阵错误：实际值和预测值之间的数据类型不匹配

原始数据集如上图所示。CO（ppm）是因变量。对于上面显示的二进制分类问题，我试图获得混淆矩阵。我有由y_pred和y_test生成的数组，数据类型不匹配，因为y_pred输出的值范围为0到1（sigmoid激活函数），而y_test的数组仅由0和1组成。如果有人能帮我找到一种绘制混淆矩阵的方法，我将不胜感激。非常感谢。
基于SkLearning的多类多标签混淆矩阵

我正在使用分类器的多类多标签输出。类的总数为14，实例可以关联多个类。例如：我现在制作混淆矩阵的方式：输出如下：现在，我不确定sklearn的混淆矩阵是否能够处理多标签多类数据。谁能帮我一下吗？
方法在weka中未给出混淆矩阵

我想在weka中进行分类。我正在使用一些方法（随机树、随机森林、决策表、随机子空间...），但它们会给出如下结果。然而，我希望结果作为准确度和混淆矩阵。我怎样才能得到这样的结果？注意：当我使用小数据集时，它会以混淆矩阵的形式给出结果。它可以与数据集的大小相关吗？

混淆矩阵未显示实际值的正确计数。多项式回归，因子

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档