usaco5.4.2 Character Recognition

朱啸

2023-12-01

一原题

Character Recognition

This problem requires you to write a program that performs character recognition.

Each ideal character image has 20 lines of 20 digits. Each digit is a `0' or a `1'. See Figure 1a (way below) for the layout of character images in the file.

The file font.in contains representations of 27 ideal character images in this order:

_abcdefghijklmnopqrstuvwxyz

where _ represents the space character. Each ideal character is 20 lines long.

The input file contains one or more potentially corrupted character images. A character image might be corrupted in these ways:

at most one line might be duplicated (and the duplicate immediately follows)
at most one line might be missing
some 0's might be changed to 1's
some 1's might be changed to 0's.

No character image will have both a duplicated line and a missing line. No more than 30% of the 0's and 1's will be changed in any character image in the evaluation datasets.

In the case of a duplicated line, one or both of the resulting lines may have corruptions, and the corruptions may be different.

Write a program to recognize the sequence of one or more characters in the image provided in the input file using the font provided in file font.in.

Recognize a character image by choosing the font character images that require the smallest number of overall changed 1's and 0's to be corrupted to the given font image, given the most favourable assumptions about duplicated or omitted lines. Count corruptions in only the least corrupted line in the case of a duplicated line. You must determine the sequence of characters that most closely matches the input sequence (the one that requires the least number of corruptions). There is a unique best solution for each evaluation dataset.

A correct solution will use precisely all of the data supplied in the input file.

PROGRAM NAME: charrec

INPUT FORMAT (both input files)

Both input files begin with an integer N (19 <= N < 1200) that specifies the number of lines that follow:

N
(digit1)(digit2)(digit3) ... (digit20)
(digit1)(digit2)(digit3) ... (digit20)
...

Each line of data is 20 digits wide. There are no spaces separating the zeros and ones.

The file font.in describes the font. It will always contain 541 lines. It may differ for each evaluation dataset.

SAMPLE INPUT (file charrec.in)

Incomplete sample showing the beginning of `font.in` (space and `a`).	Sample `charrec.in`, showing an `a` corrupted
font.in	charrec.in
540 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000011100000000000 00000111111011000000 00001111111001100000 00001110001100100000 00001100001100010000 00001100000100010000 00000100000100010000 00000010000000110000 00000001000001110000 00001111111111110000 00001111111111110000 00001111111111000000 00001000000000000000 00000000000000000000 00000000000000000000 00000000000000000000 00000000000000000000	19 00000000000000000000 00000000000000000000 00000000000000000000 00000011100000000000 00100111011011000000 00001111111001100000 00001110001100100000 00001100001100010000 00001100000100010000 00000100000100010000 00000010000000110000 00001111011111110000 00001111111111110000 00001111111111000000 00001000010000000000 00000000000000000000 00000000000001000000 00000000000000000000 00000000000000000000
Figure 1a	Figure 1b

OUTPUT FORMAT

Your program must produce an output file that contains a single string of the characters recognized. Its format is a single line of ASCII text. The output should not contain any separator characters. If your program does not recognize a particular character, it must output a ? in the appropriate position.

SAMPLE OUTPUT (file charrec.out)

Note that the 'sample output' formerly displayed a blank followed by an 'a', but that seems wrong now.

二分析

题意：给定27个字母F[27][20][21]，每个字母由20*20的字符数组表示，可以对每个字母做4类的变换（见原题描述）；给定一个N行20列的字符数组S，要求他是由那些字母组成的，使得所需变换最少。

思路：动态规划，要求记录动规的具体方案。核心的变量有四个：

Dp[i]：得到S前i行所需的最少变换数

Ffrom[i]：设字符序列a1,a2,...,ak是Dp[i]对应的最佳序列，那么删去ak，需要从S的第i行向前删去多少行（19，20或21）

C[i][j]：从第i行起，匹配S的前19，20，21行所需的最少变换数

From[i][j]：从第i行起，匹配S的前19，20，21行所对应的字母

我们要求的就是Dp[N-1]对应的字母序列，递推方程其实就是枚举从S的第i行向前数，是前19行对应一个字母，还是20行，还是21行。

Dp[i] = min{Dp[i-19] + C[i][19], Dp[i-20] + C[i][20], Dp[i-21] + C[i-21]}

最后根据Ffrom和From数组得到对应的字母序列。

坑爹的是这题的实例Input/Output并不对=。= 根据题目链接的font.in，实例Output应当是d。。因为这个浪费了2个小时。。

三代码

运行结果：

USER: Qi Shen [maxkibb3]
TASK: charrec
LANG: C++

Compiling...
Compile: OK

Executing...
   Test 1: TEST OK [0.022 secs, 6976 KB]
   Test 2: TEST OK [0.011 secs, 6976 KB]
   Test 3: TEST OK [0.022 secs, 6976 KB]
   Test 4: TEST OK [0.032 secs, 6976 KB]
   Test 5: TEST OK [0.076 secs, 6976 KB]
   Test 6: TEST OK [0.151 secs, 6976 KB]
   Test 7: TEST OK [0.194 secs, 6976 KB]
   Test 8: TEST OK [0.194 secs, 6976 KB]
   Test 9: TEST OK [0.162 secs, 6976 KB]

All tests OK.

YOUR PROGRAM ('charrec') WORKED FIRST TIME! That's fantastic -- and a rare thing. Please accept these special automated congratulations.

AC代码：

/*
ID:maxkibb3
LANG:C++
PROB:charrec
*/

#include<cstdio>

const int INF = 1e6;
const int LETTERNUM = 27;
const int LINENUM = 20;
const int MAXLINE = 1205;
const int MAGICNUM = 22;
const char A[] = " abcdefghijklmnopqrstuvwxyz";

int N;
char F[LETTERNUM][LINENUM][LINENUM + 1];
char S[MAXLINE][LINENUM + 1];
// For DP
int D[LETTERNUM][LINENUM][MAXLINE];
int C[MAXLINE][MAGICNUM]; int From[MAXLINE][MAGICNUM];
int Dp[MAXLINE]; int Ffrom[MAXLINE];

void init() {
    freopen("font.in", "r", stdin);
    int tmp;
    scanf("%d", &tmp);
    for(int i = 0; i < tmp; i++) {
        getchar();
        scanf("%s", F[i / LINENUM][i % LINENUM]);
    }
    freopen("charrec.in", "r", stdin);
    scanf("%d", &N);
    for(int i = 0; i < N; i++) {
        getchar();
        scanf("%s", S[i]);
    }
    freopen("charrec.out", "w", stdout);
}

void set_diff() {
    for(int i = 0; i < LETTERNUM; i++)
    for(int j = 0; j < LINENUM; j++)
    for(int k = 0; k < N; k++) {
        int tot = 0;
        for(int l = 0; l < LINENUM; l++)
            tot += (F[i][j][l] != S[k][l]);
        D[i][j][k] = tot;
    }
}

void set_cost() {
    for(int i = 0; i < N; i++)
        C[i][19] = C[i][20] = C[i][21] = INF;
    int i, j, k, l;
    // for 19-case
    for(i = 18; i < N; i++)
    for(j = 0; j < LETTERNUM; j++)
    for(k = 0; k < LINENUM; k++) {// choose the deleted line
        int tot = 0;
        for(l = 0; l < LINENUM - 1; l++) {
            if(l >= k) tot += D[j][l + 1][i - 18 + l];
            else tot += D[j][l][i - 18 + l];
        }
        if(tot < C[i][19]) C[i][19] = tot, From[i][19] = j;
    }
    // for 20-case
    for(i = 19; i < N; i++)
    for(j = 0; j < LETTERNUM; j++) {
        int tot = 0;
        for(k = 0; k < LINENUM; k++)
            tot += D[j][k][i - 19 + k];
        if(tot < C[i][20]) C[i][20] = tot, From[i][20] = j;
    }
    // for 21-case
    for(i = 20; i < N; i++)
    for(j = 0; j < LETTERNUM; j++)
    for(k = 0; k < LINENUM; k++) { // choose the duplicated line
        int tot = 0;
        for(int l = 0; l < LINENUM; l++) {
            if(l >= k) tot += D[j][l][i - 20 + l + 1];
            else tot += D[j][l][i - 20 + l];
        }
        if(tot < C[i][21]) C[i][21] = tot, From[i][21] = j;
    }
}

void dp() {
    for(int i = 0; i < N; i++) Dp[i] = INF;
    for(int i = 18; i < N; i++) {
        int prev = (i < 19) ? 0 : Dp[i - 19];
        if(prev + C[i][19] < Dp[i]) {
            Dp[i] = prev + C[i][19];
            Ffrom[i] = 19;
        }
        prev = (i < 20) ? 0 : Dp[i - 20];
        if(prev + C[i][20] < Dp[i]) {
            Dp[i] = prev + C[i][20];
            Ffrom[i] = 20;
        }
        prev = (i < 21) ? 0 : Dp[i - 21];
        if(prev + C[i][21] < Dp[i]) {
            Dp[i] = prev + C[i][21];
            Ffrom[i] = 21;
        }
    }
}

void print(int _idx) {
    if(_idx == -1) return;
    print(_idx - Ffrom[_idx]);
    printf("%c", A[From[_idx][Ffrom[_idx]]]);
}

void solve() {
    set_diff();
    set_cost();
    dp();
    print(N - 1);
    printf("\n");
}

int main() {
    init();
    solve();
    return 0;
}

usaco5.4.2 Character Recognition

一原题

PROGRAM NAME: charrec

INPUT FORMAT (both input files)

SAMPLE INPUT (file charrec.in)

OUTPUT FORMAT

SAMPLE OUTPUT (file charrec.out)

二分析

三代码

相关阅读

相关文章

相关问答

usaco5.4.2 Character Recognition

一 原题

PROGRAM NAME: charrec

INPUT FORMAT (both input files)

SAMPLE INPUT (file charrec.in)

OUTPUT FORMAT

SAMPLE OUTPUT (file charrec.out)

二 分析

三 代码

相关阅读

相关文章

相关问答

一原题

二分析

三代码