当前位置: 首页 > 知识库问答 >
问题:

解码8位邮件消息:内容传输编码:8位

濮阳研
2023-03-14

我正在开发一个可以阅读的电子邮件查看器。eml文件并在浏览器控件中显示消息。我找到了代码片段,它可以显示7位和引用的可打印消息(内容传输编码:引用的可打印/内容传输编码:base64)。我需要的是解码8位消息。

    private static AlternateView ImportText(StringReader r, string encoding, System.Net.Mime.ContentType contentType)
    {
        string line = string.Empty;
        StringBuilder b = new StringBuilder();
        while ((line = r.ReadLine())!= null)
        {
            switch (encoding)
            {
                case "quoted-printable":
                    if (line.EndsWith("="))
                    {
                        b.Append(DecodeQuotedPrintables(line.TrimEnd('='), contentType.CharSet));
                    }
                    else
                    {
                        b.Append(DecodeQuotedPrintables(line, contentType.CharSet) + "\n");
                    }
                    break;
                case "base64":
                    b.Append(DecodeBase64(line, contentType.CharSet));
                    break;

                case "8bit": // I need an 8bit decoder here!!!
                    b.Append(IneedAn8bitDecoderHere(line, contentType.CharSet));
                    break;
                default:
                    b.Append(line);
                    break;
            }
        }

        AlternateView returnValue = AlternateView.CreateAlternateViewFromString(b.ToString(), null, contentType.MediaType);
        returnValue.TransferEncoding = TransferEncoding.QuotedPrintable;
        return returnValue;
    }

我在谷歌上搜索了一个8位解码器,但找不到。我真的需要一个8位解码器吗?你知道一个好的解码器吗?

更新:

相关标题:

 MIME-Version: 1.0
 Content-Type: text/plain; charset="koi8-r";
 Content-Transfer-Encoding: 8bit

我的代码中的正文消息(字符串行):

 ����������� �� ����, � �����  ��� � ������        ��������� �������  �   ��������  �������� ��   ������� 

Outlook在现实世界中显示的内容:

 Фантастично но факт, я снова  как и раньше сделалась статной  и   красивой  примерно за  месяцок 

我想我不需要大小写“8bit”:部分。正如SLaks提到的,我需要在流程的一开始就将邮件源加载到字节数组而不是字符串中。从字节数组检查charset=in mail标头将给出适当的代码页。

共有2个答案

史和泰
2023-03-14

由于StringReader(),您的实现可能会遇到问题。在这条线的某个地方,需要有人将原始字节转换为字符串。除非你在这之前做了什么特别的事。Net将为您执行此操作,并且通常使用计算机默认值。

8位era的问题是,第8位有几十个实现(如果不是更多的话),并且没有真正的方法从字节中判断要使用哪个实现。如果您使用的是ASCII,任何设置了第8位的内容都将转换为ASCII 63-<代码> 。如果您使用的是UTF-8,任何设置为第8位的字符都将尝试读取下一个1到5个字符(有关更多信息,请参阅Wikipedia),如果不起作用,则将转换为UTF-8 65533� 这就是你看到的。如果手动指定编码,例如给定的编码koi8-r,则第8位将被正确解析。下面是显示这一点的示例代码。我是消息框,而不是转储到控制台,但只要记住更改控制台的编码,就可以切换。

var bytes = new byte[] { 226 };
var s1 = System.Text.Encoding.ASCII.GetString(bytes);//Invalid
var s2 = System.Text.Encoding.UTF8.GetString(bytes);//Invalid
var s3 = System.Text.Encoding.GetEncoding("koi8-r").GetString(bytes); //Б

MessageBox.Show(String.Format("{0} {1} {2}", s1, s2, s3));

总而言之,如果您得到的是UTF-8替换字符(您就是),这意味着您丢失了这些字节的原始值,您需要更早地修复它。任何将字节转换为字符串的操作都需要采用内容类型:text/plain;charset=“koi8-r” 考虑到,你不能事后再做。

吴同
2023-03-14

我就是这样解决这个问题的:

// My previous method:
string file = File.ReadAllText("koi8-r.eml");

// Correct method:    
Encoding efile = detectTextEncoding("koi8-r.eml", out file);

txtRaw.Text = output;

链接:检测编码()

// Function to detect the encoding for UTF-7, UTF-8/16/32 (bom, no bom, little
// & big endian), and local default codepage, and potentially other codepages.
// 'taster' = number of bytes to check of the file (to save processing). Higher
// value is slower, but more reliable (especially UTF-8 with special characters
// later on may appear to be ASCII initially). If taster = 0, then taster
// becomes the length of the file (for maximum reliability). 'text' is simply
// the string with the discovered encoding applied to the file.
public Encoding detectTextEncoding(string filename, out String text, int taster = 1000)
{
byte[] b = File.ReadAllBytes(filename);

//////////////// First check the low hanging fruit by checking if a
//////////////// BOM/signature exists (sourced from http://www.unicode.org/faq/utf_bom.html#bom4)
if (b.Length >= 4 && b[0] == 0x00 && b[1] == 0x00 && b[2] == 0xFE && b[3] == 0xFF) { text = Encoding.GetEncoding("utf-32BE").GetString(b, 4, b.Length - 4); return Encoding.GetEncoding("utf-32BE"); }  // UTF-32, big-endian 
else if (b.Length >= 4 && b[0] == 0xFF && b[1] == 0xFE && b[2] == 0x00 && b[3] == 0x00) { text = Encoding.UTF32.GetString(b, 4, b.Length - 4); return Encoding.UTF32; }    // UTF-32, little-endian
else if (b.Length >= 2 && b[0] == 0xFE && b[1] == 0xFF) { text = Encoding.BigEndianUnicode.GetString(b, 2, b.Length - 2); return Encoding.BigEndianUnicode; }     // UTF-16, big-endian
else if (b.Length >= 2 && b[0] == 0xFF && b[1] == 0xFE) { text = Encoding.Unicode.GetString(b, 2, b.Length - 2); return Encoding.Unicode; }              // UTF-16, little-endian
else if (b.Length >= 3 && b[0] == 0xEF && b[1] == 0xBB && b[2] == 0xBF) { text = Encoding.UTF8.GetString(b, 3, b.Length - 3); return Encoding.UTF8; } // UTF-8
else if (b.Length >= 3 && b[0] == 0x2b && b[1] == 0x2f && b[2] == 0x76) { text = Encoding.UTF7.GetString(b,3,b.Length-3); return Encoding.UTF7; } // UTF-7


//////////// If the code reaches here, no BOM/signature was found, so now
//////////// we need to 'taste' the file to see if can manually discover
//////////// the encoding. A high taster value is desired for UTF-8
if (taster == 0 || taster > b.Length) taster = b.Length;    // Taster size can't be bigger than the filesize obviously.


// Some text files are encoded in UTF8, but have no BOM/signature. Hence
// the below manually checks for a UTF8 pattern. This code is based off
// the top answer at: https://stackoverflow.com/questions/6555015/check-for-invalid-utf8
// For our purposes, an unnecessarily strict (and terser/slower)
// implementation is shown at: https://stackoverflow.com/questions/1031645/how-to-detect-utf-8-in-plain-c
// For the below, false positives should be exceedingly rare (and would
// be either slightly malformed UTF-8 (which would suit our purposes
// anyway) or 8-bit extended ASCII/UTF-16/32 at a vanishingly long shot).
int i = 0;
bool utf8 = false;
while (i < taster - 4)
{
    if (b[i] <= 0x7F) { i += 1; continue; }     // If all characters are below 0x80, then it is valid UTF8, but UTF8 is not 'required' (and therefore the text is more desirable to be treated as the default codepage of the computer). Hence, there's no "utf8 = true;" code unlike the next three checks.
    if (b[i] >= 0xC2 && b[i] <= 0xDF && b[i + 1] >= 0x80 && b[i + 1] < 0xC0) { i += 2; utf8 = true; continue; }
    if (b[i] >= 0xE0 && b[i] <= 0xF0 && b[i + 1] >= 0x80 && b[i + 1] < 0xC0 && b[i + 2] >= 0x80 && b[i + 2] < 0xC0) { i += 3; utf8 = true; continue; }
    if (b[i] >= 0xF0 && b[i] <= 0xF4 && b[i + 1] >= 0x80 && b[i + 1] < 0xC0 && b[i + 2] >= 0x80 && b[i + 2] < 0xC0 && b[i + 3] >= 0x80 && b[i + 3] < 0xC0) { i += 4; utf8 = true; continue; }
    utf8 = false; break;
}
if (utf8 == true) {
    text = Encoding.UTF8.GetString(b);
    return Encoding.UTF8;
}


// The next check is a heuristic attempt to detect UTF-16 without a BOM.
// We simply look for zeroes in odd or even byte places, and if a certain
// threshold is reached, the code is 'probably' UF-16.          
double threshold = 0.1; // proportion of chars step 2 which must be zeroed to be diagnosed as utf-16. 0.1 = 10%
int count = 0;
for (int n = 0; n < taster; n += 2) if (b[n] == 0) count++;
if (((double)count) / taster > threshold) { text = Encoding.BigEndianUnicode.GetString(b); return Encoding.BigEndianUnicode; }
count = 0;
for (int n = 1; n < taster; n += 2) if (b[n] == 0) count++;
if (((double)count) / taster > threshold) { text = Encoding.Unicode.GetString(b); return Encoding.Unicode; } // (little-endian)


// Finally, a long shot - let's see if we can find "charset=xyz" or
// "encoding=xyz" to identify the encoding:
for (int n = 0; n < taster-9; n++)
{
    if (
        ((b[n + 0] == 'c' || b[n + 0] == 'C') && (b[n + 1] == 'h' || b[n + 1] == 'H') && (b[n + 2] == 'a' || b[n + 2] == 'A') && (b[n + 3] == 'r' || b[n + 3] == 'R') && (b[n + 4] == 's' || b[n + 4] == 'S') && (b[n + 5] == 'e' || b[n + 5] == 'E') && (b[n + 6] == 't' || b[n + 6] == 'T') && (b[n + 7] == '=')) ||
        ((b[n + 0] == 'e' || b[n + 0] == 'E') && (b[n + 1] == 'n' || b[n + 1] == 'N') && (b[n + 2] == 'c' || b[n + 2] == 'C') && (b[n + 3] == 'o' || b[n + 3] == 'O') && (b[n + 4] == 'd' || b[n + 4] == 'D') && (b[n + 5] == 'i' || b[n + 5] == 'I') && (b[n + 6] == 'n' || b[n + 6] == 'N') && (b[n + 7] == 'g' || b[n + 7] == 'G') && (b[n + 8] == '='))
        )
    {
        if (b[n + 0] == 'c' || b[n + 0] == 'C') n += 8; else n += 9;
        if (b[n] == '"' || b[n] == '\'') n++;
        int oldn = n;
        while (n < taster && (b[n] == '_' || b[n] == '-' || (b[n] >= '0' && b[n] <= '9') || (b[n] >= 'a' && b[n] <= 'z') || (b[n] >= 'A' && b[n] <= 'Z')))
        { n++; }
        byte[] nb = new byte[n-oldn];
        Array.Copy(b, oldn, nb, 0, n-oldn);
        try {
            string internalEnc = Encoding.ASCII.GetString(nb);
            text = Encoding.GetEncoding(internalEnc).GetString(b);
            return Encoding.GetEncoding(internalEnc);
        }
        catch { break; }    // If C# doesn't recognize the name of the encoding, break.
    }
}


// If all else fails, the encoding is probably (though certainly not
// definitely) the user's local codepage! One might present to the user a
// list of alternative encodings as shown here: https://stackoverflow.com/questions/8509339/what-is-the-most-common-encoding-of-each-language
// A full list can be found using Encoding.GetEncodings();
text = Encoding.Default.GetString(b);
return Encoding.Default;

}

 类似资料:
  • 问题内容: 我有一个这样的字符串: 我想以正确的UTF-8编码获取文件名。在Maven Central的某处是否有一个库方法可以为我执行此解码,还是我需要测试模式并手动解码base64? 问题答案: 在MIME术语中,那些编码的块称为编码字。看看在JavaMail的。该方法将解码字符串中的所有编码词。 你可以用

  • 我有一些Java代码,它发送一封电子邮件,代码类似如下:实际上,我从Httprequest param获得了Mimemessage,在该Mimemessage中,我将在现有的主体中附加一些内容。 如果消息是text/plain和text/html content-type,我设置的content-transfer编码就不适用于body。 基于此文档 问:尽管JavaMail为我完成了所有的编码和解

  • 我使用javax邮件api在imap服务器上读取消息。我检查消息内容传输编码,因为如果有必要,我会进行mime解码。 最近,我遇到了一个问题:我的代码在消息上找不到任何内容传输编码。这怎么可能?

  • 我在将文本从UTF-8编码转换为UTF-8编码时遇到问题。这里有字节数组, 我正在将其转换为UTF-8字符串并返回字节数组, 据我所知,这是一个3字节的数组。正当但这是我得到的。 这是什么原因?据我所知,在UTF-8 Specials中,2391189组合被称为替换字符。 这也是一个更大问题的一部分。

  • 下面的代码为我提供了一个 UnicodeDecodeError:'utf-8'编解码器无法解码位置1的字节0xdb:无效的延续字节 类似的帖子也无济于事。

  • 问题内容: 我想从请求中获取参数(带有重音符的字符),但是它不起作用。我尝试使用,但也没有用。 我知道返回正确的字符,但不起作用! 有人有主意吗? 问题答案: Paul的建议似乎是最好的做法,但如果要解决此问题,则根本不需要URLEncoder或URLDecoder: 更新: 由于获得了很多选票,我想强调BalusC的观点,即这绝对不是解决方案。充其量是一种解决方法。人们不应该这样做。 我不知道是