C++——【USACO 5.4.2】——Character Recognition

濮彬

2023-12-01

Character Recognition

This problem requires you to write a program that performs character recognition.

Each ideal character image has 20 lines of 20 digits. Each digit is a `0' or a `1'. See Figure 1a (way below) for the layout of character images in the file.

The file font.in contains representations of 27 ideal character images in this order:

_abcdefghijklmnopqrstuvwxyz

where _ represents the space character. Each ideal character is 20 lines long.

The input file contains one or more potentially corrupted character images. A character image might be corrupted in these ways:

at most one line might be duplicated (and the duplicate immediately follows)
at most one line might be missing
some 0's might be changed to 1's
some 1's might be changed to 0's.

No character image will have both a duplicated line and a missing line. No more than 30% of the 0's and 1's will be changed in any character image in the evaluation datasets.

In the case of a duplicated line, one or both of the resulting lines may have corruptions, and the corruptions may be different.

Write a program to recognize the sequence of one or more characters in the image provided in the input file using the font provided in file font.in.

Recognize a character image by choosing the font character images that require the smallest number of overall changed 1's and 0's to be corrupted to the given font image, given the most favourable assumptions about duplicated or omitted lines. Count corruptions in only the least corrupted line in the case of a duplicated line. You must determine the sequence of characters that most closely matches the input sequence (the one that requires the least number of corruptions). There is a unique best solution for each evaluation dataset.

A correct solution will use precisely all of the data supplied in the input file.

PROGRAM NAME: charrec

INPUT FORMAT (both input files)

Both input files begin with an integer N (19 <= N < 1200) that specifies the number of lines that follow:

N
(digit1)(digit2)(digit3) ... (digit20)
(digit1)(digit2)(digit3) ... (digit20)
...

Each line of data is 20 digits wide. There are no spaces separating the zeros and ones.

The file font.in describes the font. It will always contain 541 lines. It may differ for each evaluation dataset.

SAMPLE INPUT (file charrec.in)

Incomplete sample showing the beginning of `font.in` (space and `a`).	Sample `charrec.in`, showing an `a` corrupted
font.in	charrec.in
540 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000011100000000000 00000111111011000000 00001111111001100000 00001110001100100000 00001100001100010000 00001100000100010000 00000100000100010000 00000010000000110000 00000001000001110000 00001111111111110000 00001111111111110000 00001111111111000000 00001000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000	19 00000000000000000000 00000000000000000000 00000000000000000000 00000011100000000000 00100111011011000000 00001111111001100000 00001110001100100000 00001100001100010000 00001100000100010000 00000100000100010000 00000010000000110000 00001111011111110000 00001111111111110000 00001111111111000000 00001000010000000000 00000000000000000000 00000000000001000000 00000000000000000000 00000000000000000000
Figure 1a	Figure 1b

OUTPUT FORMAT

Your program must produce an output file that contains a single string of the characters recognized. Its format is a single line of ASCII text. The output should not contain any separator characters. If your program does not recognize a particular character, it must output a ? in the appropriate position.

SAMPLE OUTPUT (file charrec.out)

Note that the 'sample output' formerly displayed a blank followed by an 'a', but that seems wrong now.

字符识别
这个问题要求您编写一个识别字符的程序。

每个理想的字符图像都有20行 * 20列。每一个数字都是0或1。请参阅图1a，以查看文件中字符图像的布局。
文件字体。包含27个理想字符形象:
_abcdefghijklmnopqrstuvwxyz
表示字符的位置。每一个理想的角色都有20行长。
输入文件包含一个或多个可能损坏的字符映像。一个字符可能会以这样的方式被破坏:
在大多数情况下，可能会重复一行
最多的一行可能是缺失的
一些0的值可能会变成1
一些1的值可能会变成0。
任何字符图像都不会同时有重复的行和缺失的行。在评估数据集的任何字符图像中，0和1的误差值都不会超过30%。
在重复的一行中，一个或两个结果行可能会产生影响，而导致的结果可能是不同的。
编写一个程序来识别输入文件中所提供的一个或多个字符的序列，使用文件font.in提供的字体。
通过选择字体字符图像来识别一个字符图像，这个图像需要最小数量的整体变化1和0被损坏到给定的字体图像，考虑到对重复或省略的行最有利的假设。在重复行的情况下，只计算最不坏的行数。您必须确定与输入序列最匹配的字符序列(需要最少数量的不同)。对于每个评估数据集，都有一个最佳解决方案。
正确的解决方案将使用输入文件中提供的所有数据。
项目名称:charrec
输入格式(两个输入文件)
两个输入文件都以一个整数N(19<=N<=1200)开头，它指定了下面的行数:
N
(digit1)(digit2)(digit3)……(digit20)
(digit1)(digit2)(digit3)……(digit20)
…
每一行的数据都是20位数。0和1之间没有空格。
文件字体。在描述的字体。它总是包含541行。对于每个评估数据集可能会有所不同。

输出格式
您的程序必须生成一个输出文件，其中包含所识别的字符的单个字符串。它的格式是一行ASCII文本。输出不应该包含任何分隔符。如果您的程序不能识别某个特定的字符，那么它必须输出一个字符 ? 在适当的位置。

——这道题要注意的地方非常多，我卡了5天，~~然后看了题解~~。。。

——开始我考虑漏了 ? 的情况，后来看翻译才知道，，真是作死，，，然而其实并没有要输出 ? 的情况。

/*
ID : mcdonne1
LANG : C++
TASK : charrec
*/

#pragma GCC optimize("O3")

#include <fstream>
#include <algorithm>
#include <bitset>
#include <cstring>

#define get tot > 120 ? 0 : ch[i / 20]

using namespace std;

int n, m;
char c[24];
char r[1300][3];
signed f[1300];
signed Min[1300][3];
signed dif[600][1300];
bitset <24> s[1300];
bitset <24> a[600];
string ans[1300];
const char *ch = " abcdefghijklmnopqrstuvwxyz";

int main () {
	FILE *fon = fopen ("font.in", "r");
	fscanf (fon, "%d\n", &m);
	for (int i = 1; i <= m; i++) {
		fscanf (fon, "%s\n", c + 1);
		for (int j = 1; j <= 20; j++) a[i][j] = c[j] - '0';
	}
	fclose (fon);
	FILE *fin = fopen ("charrec.in", "r");
	fscanf (fin, "%d\n", &n);
	for (int i = 1; i <= n; i++) {
		fscanf (fin, "%s\n", c + 1);
		for (int j = 1; j <= 20; j++) s[i][j] = c[j] - '0';
	}
	fclose (fin);
	for (int i = 1; i <= m; i++)
		for (int j = 1; j <= n; j++)
			for (int k = 1; k <= 20; k++)
				dif[i][j] += a[i][k] ^ s[j][k];
	memset (Min, 32, sizeof(Min));
	for (int i = 1; i <= m; i += 20) {
		for (int j = 1; j <= n; j++) {
			if (j + 19 <= n) {
				signed tot = 0;
				for (int k = 0; k < 20; k++) tot += dif[i + k][j + k];
				if (tot < Min[j][1]) {
					Min[j][1] = tot;
					r[j][1] = get;
				}
			}
			for (int k = 0; k < 21; k++) {
				if (k < 20 and j + 18 <= n) {
					signed tot = 0;
					for (int p = 0; p < k; p++) tot += dif[i + p][j + p];
					for (int p = k; p < 19; p++) tot += dif[i + p + 1][j + p];
					if (tot < Min[j][0]) {
						Min[j][0] = tot;
						r[j][0] = get;
					}
				}
				if (j + 20 <= n) {
					signed tot = 0;
					for (int p = 0; p < k; p++) tot += dif[i + p][j + p];
					for (int p = k; p < 21; p++) tot += dif[i + p][j + p + 1];
					if (tot < Min[j][2]) {
						Min[j][2] = tot;
						r[j][2] = get;
					}
				}
			}
		}
	}
	fill (f + 1, f + 19, 0x3f3f);
	for (int i = 19; i <= n; i++) {
		f[i] = f[i - 19] + Min[i - 18][0];
		ans[i] = ans[i - 19] + r[i - 18][0];
		if (i >= 20 and f[i] > f[i - 20] + Min[i - 19][1]) {
			f[i] = f[i - 20] + Min[i - 19][1];
			ans[i] = ans[i - 20] + r[i - 19][1];
		}
		if (i >= 21 and f[i] > f[i - 21] + Min[i - 20][2]) {
			f[i] = f[i - 21] + Min[i - 20][2];
			ans[i] = ans[i - 21] + r[i - 20][2];
		}
	}
	FILE *fout = fopen ("charrec.out", "w");
	fprintf (fout, "%s\n", ans[n].c_str());
	fclose (fout);
	return 0;
}

C++——【USACO 5.4.2】——Character Recognition

PROGRAM NAME: charrec

INPUT FORMAT (both input files)

SAMPLE INPUT (file charrec.in)

OUTPUT FORMAT

SAMPLE OUTPUT (file charrec.out)

相关阅读

相关文章

相关问答

相关文档