QOJ.ac

QOJ

実行時間制限: 4 s メモリ制限: 512 MB 満点: 100

#15680. Find Successor Word

統計

When evaluating this problem on QOJ, you can read the files sample.txt and markov.in. You can also read from standard input, which is redirected to the file markov.in.

We now need some random readable English text, but a computer's ability to use language is about the same as a baby's; randomly generated text is meaningless. Of course, we could give it a dictionary and let it pick words randomly, but that doesn't help because the computer doesn't understand grammar. Actually, we can use a fairly simple method to let the computer write a decent-looking article. This technique is called the "Markov Chain Algorithm". It uses a sample text to generate an article with a structure similar to the sample.

We can think of a piece of English text as a combination of overlapping phrases. The algorithm divides each phrase into two parts: a prefix consisting of more than one word, and a suffix consisting of exactly one word. This algorithm can be explained in one sentence: randomly select a suffix in the sample based on a specific prefix. Practice has shown that using 3-word phrases (i.e., a prefix of 2 words and a suffix of 1 word) works best. The specific process of the algorithm is as follows:

  1. Randomly select two consecutive words $w_1, w_2$ in the sample as the beginning of the text.
  2. Output $w_1, w_2$.
  3. Randomly select a suffix $w_3$ of $w_1, w_2$ in the sample. If no such $w_3$ exists, terminate the algorithm.
  4. Output $w_3$.
  5. Let $w_1 = w_2, w_2 = w_3$.
  6. Go to step 3.

This algorithm can also be stopped at any time according to your needs. To help understand the algorithm, here is an example:

"Show your flowcharts and conceal your tables and I will be mystified. Show your tables and your flowcharts will be obvious."

Below are some input prefixes and their suffixes:

Prefix Suffix
Show your flowcharts, tables
your flowcharts and, will
flowcharts and conceal
flowcharts will be
your tables and, and
will be mystified., obvious.
be mystified. Show

If "Show your" is chosen as the beginning of the text, it has two suffixes: "flowcharts" and "tables". One is randomly selected. Suppose "flowcharts" is selected as the suffix, then the prefix becomes "your flowcharts". This process is repeated until there are no suffixes or enough words have been generated.

Your task is to complete the core part of this algorithm, which is: for a series of given prefixes, output all of their possible suffixes.

Input Format

The sample file is named sample.txt. It contains at most 10,000 words. These words are randomly generated or a randomly selected article, which means you can assume the randomness of the word selection. Words are separated by a single space. Note: Punctuation marks are also considered part of a word. The sample will not contain any delimiters other than spaces. The file size does not exceed 200KB.

The input file is named markov.in. The first line contains an integer $n$, representing the number of given prefixes, $1 \le n \le 2,000,000$. The next $n$ lines each contain two words, representing the given prefix, separated by a single space. The length of each word does not exceed 255 characters.

Output Format

The output file is named markov.out. It contains $n$ lines. The $i$-th line contains all suffixes of the $i$-th prefix, separated by a single space. If the prefix does not exist in the sample or no suffix can be found, output an empty line. If a suffix of a prefix appears more than once in the text, the number of times it is output should be equal to the number of times it appears. (This allows the word to be selected with higher frequency when randomly selecting).

Examples

Sample Text (sample.txt)

and my heart dances to the distant 
music of the ancient xylophone

Input 1

1
the ancient

Output 1

xylophone

Discussions

About Discussions

The discussion section is only for posting: General Discussions (problem-solving strategies, alternative approaches), and Off-topic conversations.

This is NOT for reporting issues! If you want to report bugs or errors, please use the Issues section below.

Open Discussions 0
No discussions in this category.

Issues

About Issues

If you find any issues with the problem (statement, scoring, time/memory limits, test cases, etc.), you may submit an issue here. A problem moderator will review your issue.

Guidelines:

  1. This is not a place to publish discussions, editorials, or requests to debug your code. Issues are only visible to you and problem moderators.
  2. Do not submit duplicated issues.
  3. Issues must be filed in English or Chinese only.
Active Issues 0
No issues in this category.
Closed/Resolved Issues 0
No issues in this category.