Crossover: 2014
Chapter 126
Chapter 126
……
Eve Carly not only expressed her curiosity about Lin Hui's generative text summarization algorithm in the email.
She also explained to Lin Hui some of the difficulties she encountered in the research of text summarization.
To be honest, Lin Hui is not very good at answering other people's puzzles.
However, as the saying goes, if you study alone, you have no friends, you are lonely, and you are ignorant.
Lin Hui was quite curious about what bottlenecks Eve Carly encountered during the research process.
Perhaps some insights from Eve Carly will inspire Lin Hui.
It took quite a while for Lin Hui to get an overview of the main content of the email.
Lin Hui often read English papers in his work in his previous life.
Otherwise, it would not be easy to understand the email from Eve Carly.
No way, the way Eve Carly expresses her confusion is a bit too hardcore.
Lin Hui thinks that if the main content of the email sent by Eve Carly is slightly changed, it can be regarded as a review and directly sent to the irrigation journal.
Of course, this is just a metaphor. In fact, even if you can pour water, don’t pour water easily. Paper flooding can easily become a stain on your academic career.
What surprised Lin Hui was.
In the email, Eve Carly gave Lin Hui a general introduction to the research on text summarization in Western academic circles.
This is especially valuable for Lin Hui at this time.
After all, the public content of various journal papers basically only talks about progress and not setbacks.
Judging from the content introduced by Eve Carly.
In this time and space, the progress of research on text summarization in the West is somewhat different from the research on text summarization in the West in that time and space in the previous life.
But objectively speaking, the difference is not too big, it's just that the rhythm of time and space is two years slower than that of the previous life.
It is also understandable that research involving text summarization has a long history.
Whether it is past life or present life, East and West will spend a lot of effort on text summarization.
Why do people who talk about two time and space put a lot of effort into text summarization.
In fact, there is a reason. Text is an important carrier of information. Research on the highly condensed text information is of great significance for people to quickly and accurately obtain the desired content.
The research on text summarization has a long history, but in fact, the in-depth research on text summarization was carried out on a large scale only in the new century.
The reason why the study of text summarization is suddenly taken seriously.
Because of the rapid development of Internet technology, massive amounts of information are constantly emerging on the Internet.
If the ability to process text is not improved, even if a large amount of information emerges, this information is destined to be submerged in the ocean of information and become worthless garbage information.
A few years ago, the concept of big data began to rise.
The degree of emphasis on the study of text summarization has reached a new level.
Measuring the significance of text summarization cannot be limited to the text itself.
You must know that when it comes to text processing, on the surface it is only related to the text language.
But in fact, this kind of knowledge is not only related to texts, but also includes the in-depth exploration of information, material and culture.
Such in-depth research has made the entire civilization progress.
From this level, text summarization has a great influence on people.
Even if most ordinary people don't notice the impact of this thing on human beings in their lives.
But it does not mean that text summarization is not important.
The more we study information, the more we can learn about the world.
In addition, in-depth research on text summaries can open up people's minds to some extent.
The in-depth exploration of text summarization allows us to have a stronger control over information.
It is precisely because of the above reasons, no matter which time and space.
Many countries in the world are exploring the text.
The progress of the recording methods of human society, to some extent, is concentrated in the different condensed forms of texts.
Text exploration is also an extremely important task for some large enterprises.
The development of text summarization determines the release of one product after another.
The exploration of texts not only greatly promotes the in-depth study of literature, but also greatly promotes the advancement of science and technology.
All in all, it is not too much to put some effort into text summarization.
After all, this is Lin Hui's first step in the technical field.
Speaking of the confusion encountered by Eve Carly.
Lin Hui did not expect Eve Carly's confusion to be mainly focused on the construction of the LH text summarization accuracy measurement model.
Lin Hui remembered that he had explained clearly enough about the construction of this model at that time.
To build a model, first use the language model to evaluate the fluency of the language generated by the algorithm, then use the similarity model to evaluate the semantic correlation between the text and the abstract, and finally introduce the original text information in order to effectively evaluate the recurrence degree of entities and proper words Quantitative model to evaluate.
Although in order to prevent the apprentice from starving the master to death, Lin Hui deliberately omitted some trivial steps between these steps.
But this kind of thing is to scientific researchers what trenches are to tanks.
Although there will be some impact, it should not be a big problem.
Really publish all the technical details.
That can't be called publishing technical routes, it's called compiling textbooks.
For Lin Hui's "Using Language Models to Evaluate the Fluency of Language Generated by Algorithms"
Eve Carly was confused. How did Lin Hui get the corpus for language model training?
This problem will not really be a problem in the next few years.
Because there are a lot of ready-made corpora.
As for the corpus of Simplified Chinese only, there are several resources such as the Modern Chinese Corpus of the State Language Commission, the Peking University Corpus, and Corpus Linguistics Online.
However, at this time and space node, Lin Hui obviously cannot tell other researchers that he is using a ready-made prediction library.
After all, some ready-made corpora are basically only available in about 16 years.
Nevertheless, the question of how to interpret the source of the corpus is not a problem for Lin Hui.
In fact, even if there is no ready-made corpus, it is not too complicated to construct a corpus that can be used to tune/teach early generative summarization algorithms.
The easiest way - the text corpus can be automatically built with the help of the Internet.
When using this method to construct a corpus, the user only needs to provide the required text category system.
Then collect a large number of websites from the Internet, extract and analyze the content hierarchy of the website and the webpage content information corresponding to each keyword.
The texts required by users are screened out from each website as candidate corpus.
This process is actually not complicated, it is somewhat similar to the process of crawlers crawling web pages.
What is more difficult is how to denoise the corpus formed by this method.
But this is not a problem for Lin Hui.
It only needs to merge the candidate corpora of the same text category matched from multiple websites into a candidate corpus for each category.
Then denoising the text under each category in the candidate corpus can improve the quality of the corpus.
After the denoising is completed, the corpus can be output.
Although this process is still not easy to achieve.
But in the academic field, except for a few isolated elites who like to dig into the horns.
In most cases, as long as the logic is self-consistent, no one will die.
In addition to being curious about how Lin Hui constructed the corpus.
Relates to "Assessing Semantic Relatedness Between Texts and Abstracts Using Similarity Models"
Eve Carly is more curious about what kind of similarity model Lin Hui uses to evaluate the semantic correlation between text abstracts and abstracts.
Well, this question is more related to the core of the text summarization accuracy model that Lin Hui has done.
The answer to this question is not clear in a few words.
(End of this chapter)
……
Eve Carly not only expressed her curiosity about Lin Hui's generative text summarization algorithm in the email.
She also explained to Lin Hui some of the difficulties she encountered in the research of text summarization.
To be honest, Lin Hui is not very good at answering other people's puzzles.
However, as the saying goes, if you study alone, you have no friends, you are lonely, and you are ignorant.
Lin Hui was quite curious about what bottlenecks Eve Carly encountered during the research process.
Perhaps some insights from Eve Carly will inspire Lin Hui.
It took quite a while for Lin Hui to get an overview of the main content of the email.
Lin Hui often read English papers in his work in his previous life.
Otherwise, it would not be easy to understand the email from Eve Carly.
No way, the way Eve Carly expresses her confusion is a bit too hardcore.
Lin Hui thinks that if the main content of the email sent by Eve Carly is slightly changed, it can be regarded as a review and directly sent to the irrigation journal.
Of course, this is just a metaphor. In fact, even if you can pour water, don’t pour water easily. Paper flooding can easily become a stain on your academic career.
What surprised Lin Hui was.
In the email, Eve Carly gave Lin Hui a general introduction to the research on text summarization in Western academic circles.
This is especially valuable for Lin Hui at this time.
After all, the public content of various journal papers basically only talks about progress and not setbacks.
Judging from the content introduced by Eve Carly.
In this time and space, the progress of research on text summarization in the West is somewhat different from the research on text summarization in the West in that time and space in the previous life.
But objectively speaking, the difference is not too big, it's just that the rhythm of time and space is two years slower than that of the previous life.
It is also understandable that research involving text summarization has a long history.
Whether it is past life or present life, East and West will spend a lot of effort on text summarization.
Why do people who talk about two time and space put a lot of effort into text summarization.
In fact, there is a reason. Text is an important carrier of information. Research on the highly condensed text information is of great significance for people to quickly and accurately obtain the desired content.
The research on text summarization has a long history, but in fact, the in-depth research on text summarization was carried out on a large scale only in the new century.
The reason why the study of text summarization is suddenly taken seriously.
Because of the rapid development of Internet technology, massive amounts of information are constantly emerging on the Internet.
If the ability to process text is not improved, even if a large amount of information emerges, this information is destined to be submerged in the ocean of information and become worthless garbage information.
A few years ago, the concept of big data began to rise.
The degree of emphasis on the study of text summarization has reached a new level.
Measuring the significance of text summarization cannot be limited to the text itself.
You must know that when it comes to text processing, on the surface it is only related to the text language.
But in fact, this kind of knowledge is not only related to texts, but also includes the in-depth exploration of information, material and culture.
Such in-depth research has made the entire civilization progress.
From this level, text summarization has a great influence on people.
Even if most ordinary people don't notice the impact of this thing on human beings in their lives.
But it does not mean that text summarization is not important.
The more we study information, the more we can learn about the world.
In addition, in-depth research on text summaries can open up people's minds to some extent.
The in-depth exploration of text summarization allows us to have a stronger control over information.
It is precisely because of the above reasons, no matter which time and space.
Many countries in the world are exploring the text.
The progress of the recording methods of human society, to some extent, is concentrated in the different condensed forms of texts.
Text exploration is also an extremely important task for some large enterprises.
The development of text summarization determines the release of one product after another.
The exploration of texts not only greatly promotes the in-depth study of literature, but also greatly promotes the advancement of science and technology.
All in all, it is not too much to put some effort into text summarization.
After all, this is Lin Hui's first step in the technical field.
Speaking of the confusion encountered by Eve Carly.
Lin Hui did not expect Eve Carly's confusion to be mainly focused on the construction of the LH text summarization accuracy measurement model.
Lin Hui remembered that he had explained clearly enough about the construction of this model at that time.
To build a model, first use the language model to evaluate the fluency of the language generated by the algorithm, then use the similarity model to evaluate the semantic correlation between the text and the abstract, and finally introduce the original text information in order to effectively evaluate the recurrence degree of entities and proper words Quantitative model to evaluate.
Although in order to prevent the apprentice from starving the master to death, Lin Hui deliberately omitted some trivial steps between these steps.
But this kind of thing is to scientific researchers what trenches are to tanks.
Although there will be some impact, it should not be a big problem.
Really publish all the technical details.
That can't be called publishing technical routes, it's called compiling textbooks.
For Lin Hui's "Using Language Models to Evaluate the Fluency of Language Generated by Algorithms"
Eve Carly was confused. How did Lin Hui get the corpus for language model training?
This problem will not really be a problem in the next few years.
Because there are a lot of ready-made corpora.
As for the corpus of Simplified Chinese only, there are several resources such as the Modern Chinese Corpus of the State Language Commission, the Peking University Corpus, and Corpus Linguistics Online.
However, at this time and space node, Lin Hui obviously cannot tell other researchers that he is using a ready-made prediction library.
After all, some ready-made corpora are basically only available in about 16 years.
Nevertheless, the question of how to interpret the source of the corpus is not a problem for Lin Hui.
In fact, even if there is no ready-made corpus, it is not too complicated to construct a corpus that can be used to tune/teach early generative summarization algorithms.
The easiest way - the text corpus can be automatically built with the help of the Internet.
When using this method to construct a corpus, the user only needs to provide the required text category system.
Then collect a large number of websites from the Internet, extract and analyze the content hierarchy of the website and the webpage content information corresponding to each keyword.
The texts required by users are screened out from each website as candidate corpus.
This process is actually not complicated, it is somewhat similar to the process of crawlers crawling web pages.
What is more difficult is how to denoise the corpus formed by this method.
But this is not a problem for Lin Hui.
It only needs to merge the candidate corpora of the same text category matched from multiple websites into a candidate corpus for each category.
Then denoising the text under each category in the candidate corpus can improve the quality of the corpus.
After the denoising is completed, the corpus can be output.
Although this process is still not easy to achieve.
But in the academic field, except for a few isolated elites who like to dig into the horns.
In most cases, as long as the logic is self-consistent, no one will die.
In addition to being curious about how Lin Hui constructed the corpus.
Relates to "Assessing Semantic Relatedness Between Texts and Abstracts Using Similarity Models"
Eve Carly is more curious about what kind of similarity model Lin Hui uses to evaluate the semantic correlation between text abstracts and abstracts.
Well, this question is more related to the core of the text summarization accuracy model that Lin Hui has done.
The answer to this question is not clear in a few words.
(End of this chapter)
You'll Also Like
-
The original god's plan to defeat the gods is revealed, starting with the God of Fire saving th
Chapter 117 18 hours ago -
The end of the world: My refuge becomes a land of women
Chapter 430 18 hours ago -
Return to Immortality: One point investment, a billion times critical hit!
Chapter 120 18 hours ago -
Steel, Guns, and the Industrial Party that Traveled to Another World
Chapter 764 1 days ago -
The Journey Against Time, I am the King of Scrolls in a Hundred Times Space
Chapter 141 2 days ago -
Start by getting the cornucopia
Chapter 112 2 days ago -
Fantasy: One hundred billion clones are on AFK, I am invincible
Chapter 385 2 days ago -
American comics: I can extract animation abilities
Chapter 162 2 days ago -
Swallowed Star: Wish Fulfillment System.
Chapter 925 2 days ago -
Cultivation begins with separation
Chapter 274 2 days ago