Crossover: 2014

Chapter 129

Chapter 129 The Pursuing Chaser (Part [-])
Harley Price continued: "In short, I think the LH text summarization accuracy measurement model is very unfavorable for us.

Maybe we can refer to the ideas of LIN HUI to create our own measurement standard..."

Eclair Kilkaga: "I have also imagined the question you mentioned.

However, it is not easy to construct the model according to the construction standard process of LIN HUI.

If a similar standard is constructed according to the idea of ​​LIN HUI.

First we need to use a language model to evaluate the fluency of the language generated by the algorithm, and then...

If we follow the same steps for model building.

It is likely to get stuck directly on the construction of the language model.

After all, our corpus is too poor...

The report given by the NLP side of the Massachusetts Institute of Technology that we cooperated with before

It also proves that it is not feasible to construct a language model according to the idea of ​​LIN HUI. "

Harley Price: "What the MIT folks think isn't feasible doesn't necessarily mean it isn't feasible.

Most likely they are just shirking their responsibilities.

Anyway, I think we can try to learn from the ideas of LIN HUI to create a new measurement standard. "

Eclair Kilkaga: "Are you sure we can come up with a new model along the lines of LIN HUI?
How can you guarantee that the model we make won't be exactly the same as the one he made? "

Harley Price: "We need to go down that road anyway.

If we can't even replicate his model for measuring accuracy.

How do we know if he is tricky in this model? "

Harley Price continued: "Our corpus may have been very low before.

But there's nothing wrong with the corpus we're using now.

Now the Natural Language Center at UC Berkeley is working with us.

When we tested the X1 verification algorithm, we used a corpus of 10 text-summary sequences as the training set..."

Eclair Kilkaga retorted: "No no no, it's not enough!

To achieve the level of text processing by the LIN HUI algorithm, we need at least a corpus consisting of millions of text-summary sequences as a training set.

And that's just the tip of the iceberg.

We also need to construct a 10^4-level text-summary sequence with human scoring labels as a validation set.

And a 10^3-level human cross-scored consistent text-summary sequence as the test set.

Otherwise, our measurement model may not be able to achieve the confidence level of the model made by LIN HUI. "

Harley Price: "Your words do make sense!
The most practical way to reduce the margin of error is to increase the sample size.

A corpus composed of millions of text-summary sequences is easy to say.

This is compared to a [-]-level corpus.

Build difficulty only increases linearly.

But are you sure we're going to build as large human-labeled validation and test sets as you say?

Just a conservative estimate of the text-summary sequence verification set with manual scoring labels will take us nearly a month to build.

This has to be the case that we cooperate with other linguistics majors without any rift.

It is even more difficult when it comes to the text-summary sequence test set with 10^3 levels of manual cross-scoring.

We have only built 10^2 levels before.

The construction of the test set increases by an order of magnitude, and the corresponding construction difficulty increases exponentially.

The 150-text cross-scoring consistent test set we built earlier for testing the extractive summarization algorithm took nearly two months. "

And why do we have to introduce an artificial element?
Isn't this equivalent to going back to the old way of developing the subjective accuracy evaluation standard? "

Eclair Kilkaga: "That's exactly what I mean.

Originally, I also thought it was impossible to come up with a new measurement standard based on the ideas of LIN HUI.

Even if we can follow the technical route of LIN HUI.

It will also be faced with an excessively large workload. "

Listened to Eclair Kilkaga.

Harley Price is desperate: "So just the initial work in establishing an accuracy measure is going to take a lot of our time?
But those high-level people in charge of decision-making can't just sit back and watch us waste too much time on this algorithm.

They are likely to directly seek the algorithm authorization of LIN HUI.

For those business elites, technology is an addition to the capital game.

When they get the new technology of LIN HUI we expect to be miserable...

What should we do? "

Eclair Kilkaga: "Who knows? Maybe it's time for us to pack up and go to Ydu."

Harley Price: "It's not bad to be able to go to degree y, I heard that the Google Africa Research Center is being planned recently.

If we're unlucky, we're probably going to Africa. "

Eclair Kilkaga: . . .

Of course, these words are just jokes.

At any rate, he is also a researcher of a top research institution.

Eclair Kilkaga has not lost his morale so easily.

After a while, Eclair Kilkaka added: "It's not entirely hopeless.

I don't think we should follow the technical route of LIN HUI.

This LIN HUI is too cunning!
It is likely that the information he made public was left to mislead us.

What we have to do now is to clarify some conclusions that we have drawn by ourselves. "

Eclair Kilkaga continued: "According to the laws obtained from our previous research.

There is no relationship between the previous input and the next input of the neural network.

There is no way to deal with sequential data, such as data with associated information.

However, in the technical route mentioned in the generative summary algorithm of LIN HUI, it clearly stated that the text information should be serialized and marked through vectors before further processing.

In this case, I think the algorithm proposed by LIN HUI is definitely not a general neural network.

The most probable application of LIN HUI in the generative summarization algorithm is the recurrent neural network.

After all, the structure of the recurrent neural network is very suitable for processing sequence information. "

Harley Price's eyes lit up with Eclair Kilkaga's words, but new doubts arose immediately.

Harley Price asks, "Recurrent neural networks don't just have to be fed the current sequence of data.

Also input the information of the hidden layer parameters of the recurrent neural network at the last moment.

Only in this way can the correlation information between sequences be handled well.

But it gives me the feeling that although the neural network used in the LIN HUI algorithm has the shadow of a recurrent neural network.

But it seems to be a little different from the traditional cyclic neural network? "

Eclair Kilkaga muttered: "It is true that the general recurrent neural network is suitable for processing sequence structures, but it is not good at processing long sequence structures..."

While pondering, Eclair Kilkaga suddenly thought of something and shouted:
"I see, it must be the LSTM neural network!"

Harley Price was startled by Eclair Kilkaga's sudden call.

However, the LSTM neural network mentioned by Eclair Kilkaga also made him shine.

 Although the villain in this book may not be very strong, he must have an IQ.

  
 
(End of this chapter)

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like