Crossover: 2014

Chapter 153 Are You Eager To Push That Door Open?

Chapter 153 Are You Eager to Open That Door (Part [-])
Wait, it seems that the paper is ready to be published!

Lin Hui suddenly thought of this.

In fact, Lin Hui thought about publishing a paper before.

But at that time, Lin Hui felt that publishing a thesis as a high school student might be a bit shocking, so she shelved this idea.

But today is different, now Lin Hui has transformed into a prospective freshman majoring in computer science at the Massachusetts Institute of Technology.

With this level of identity, it seems that it is not too much for Lin Hui to poke a hole in the sky at the academic level.

Is MIT good at computing?

Of course it is hanging!

Take the content of professional courses learned by computer majors around the world.

Almost half of the content is inextricably related to MIT's computer science major.

At this time point, MIT's computer science major is well-deserved No. [-] in the world.

Of course, it's the same as having two top universities in the country.

The world's number one computer major also has two universities.

In addition to the Massachusetts Institute of Technology, the other is Stanford University.

The two schools can be said to be evenly divided in terms of computing.

(Generally speaking, UC Berkeley's computers are not weak.

However, in the eyes of Chinese people, branch schools generally have no sense of existence)
Despite this, MIT remains one of the most competitive universities in the world for admission.

The threshold for such a school is naturally extremely high.

After receiving the admission notice from MIT.

On the official website of the Massachusetts Institute of Technology, Lin Hui saw the publicity content from the Massachusetts Institute of Technology:

This year, the Massachusetts Institute of Technology received a total of 18356 undergraduate and 19446 graduate applications for the 2014-15 academic year.

Only 1447 (7.9%) and 2991 (15.4%) applicants were admitted, respectively.

The actual number of students may be even less, after all, the choice is two-way.

Although some people were able to be admitted, they refused to go.

Of course it is very arrogant to do so, but Lin Hui still can't do this willfulness in a short time.

There is no need to be too willful, steady development is the last word.

At this moment, Lin Hui is very happy to be one of the 1447 lucky ones.

After all, Lin Hui's current level is already a height that many people can hardly reach in their lifetime.

As far as the present moment is concerned.

It is indeed an honor for Lin Hui to be a member of MIT's computer science program.

But at the same time, Lin Hui also knew it clearly.

It won't be long before he's the pride of MIT.

Entering the Massachusetts Institute of Technology is just the beginning for Lin Hui, and there is still a long way to go.

In addition, thinking about the future is still a bit far away, the most important thing is to grasp the present.

A thousand miles begins with a single step.

In the short term, Lin Hui still needs to move forward step by step.

This may not go very fast.

But it will definitely go steadily.

Lin Hui must also walk steadily!
Even though Lin Hui is not involved in politics, but in the case of carrying the big secret of coming across time and space.

There is no room for missteps on the road that Lin Hui is going to take.

At this time, the safest way for Lin Hui to move forward academically is to publish a few papers.

Even publishing papers is not something out of thin air.

After all, Lin Hui had previously obtained the relevant patents for generative text summarization.

It is also very reasonable to publish several papers after applying for a patent as a supplementary explanation of the previous patent.

But which subfield paper should be published?

This is a problem.

This shouldn't have been a problem in the first place.

According to Lin Hui's original idea, it would be nice to publish a paper in a journal with a relatively high impact factor as a result of the entire generative text summarization.

But Lin Hui never expected that in this time and space, people's research on natural language processing is really slow.

Judging from the email Eve Carly sent to Lin Hui earlier.

Although this time and space Western academic circles have devoted a lot of effort to the research of text summarization.

However, there are still some differences between the progress of research on text summarization in the West in this time and space and the research on text summarization in the West in that time and space in the previous life.

Although objectively speaking the difference is not too great.

But on a comprehensive basis, the study of text summaries in the western world in this time and space is two years slower than that in the previous life.

(As for the ~ country, needless to say, the academic world at this time is still used to crossing the river by feeling the eagle sauce.

To be honest, this approach is not completely wrong, and it can avoid wasting resources.

But it's always too passive to say that.

If you want to be the boss, you have to dare to be the first in the world)

Although this space-time research in related fields is only two years slower in pace.

But two years is enough to change a lot of things.

What's more, Lin Hui originally had a seven-year information advantage.

Under the ebb and flow, Lin Hui has an information advantage of nearly ten years.

Some people may wonder why Lin Hui can give full play to his information advantages after only three years of work.

Although he only worked for three years in his previous life, Lin Hui said that six years of work experience is not too much.

As for the extra three years of work experience?

It's tears if I say too much, and I worked overtime.

It has to be said that this is all "good news".
This overtime work is such a beauty.

If it weren't for such crazy overtime, how could Lin Hui have a chance to be reborn?

Even if there is a chance of rebirth.

How could Lin Hui remember those boring things so deeply without working overtime crazy?

But these are the past.

Because of all the experiences in the past, in this time and space, Lin Hui is a well-deserved strongman.

As for other researchers in the same field, Lin Hui respects their efforts.

But I have to say: Sorry, you guys are really weak!

It wasn't that Lin Hui was talking nonsense.

Lin Hui previously worked out all the technologies involved in the generative text summarization algorithm.

If it is thoroughly understood by the research team in this time and space.

At least it can accelerate the research progress of natural language processing and neural network learning in this space-time world for nearly a year.

Of course, this is to say that if you understand it immediately, it can speed up the time by nearly a year.

If it took these research teams two or three years to make the corresponding progress, it would be a drag on their normal progress.

Leaving aside the patent of generative text summarization.

Just the LH text summarization accuracy measurement model that Lin Hui took care of when he was working on generative text summarization is amazing enough.

If this technology can be mastered by this space-time research team, it will also help their research.

Although Lin Hui had expressed clearly enough how to build the model at the beginning, it was only a matter of teaching him how to build the model.

(To build a model, first use the language model to evaluate the fluency of the language generated by the algorithm, then use the similarity model to evaluate the semantic correlation between the text and the abstract, and finally, in order to effectively evaluate the recurrence of entities and proper words, introduce the original text information model to evaluate)

But researchers at this point still seem curious about how Lynch constructed the metric.

Lin Hui remembered that Eve Carly had expressed his confusion about how the "LH text summarization accuracy measurement model" was constructed in his email.

Lin Hui remembered that Eve Carly was curious about how Lin Hui got the corpus.

The confusion mainly focused on what method Lin Hui used to construct the similarity model.

Lin Hui was still surprised when he knew that researchers from research institutions affiliated with the world's top universities were curious about this.

Lin Hui smugly built a "gorgeous house".

Originally thought that people in this time and space would be curious about how Lin Hui built this house.

Unexpectedly, I was first asked where the wood for building the house was mined from?

This was Lin Hui's intuitive feeling when he received Eve Carly's email.

However, as Eve Carly introduced in the email, Lin Hui could also understand why Eve Carly was confused.

Architectures involving similarity models are generally calculated.

The semantic similarity of the two texts is measured by calculating the semantic text similarity.

Generally speaking, the smaller the semantic similarity value, the greater the semantic difference between two texts, and the lower their similarity at the semantic level;
Conversely, the larger the value, the more similar the semantics of the two text expressions.

Perhaps in people's eyes, it is very simple to distinguish similar texts?

Isn't this easy to read?

But you need to know that distinguishing similar texts is not for humans, but for machines to distinguish similar texts.

It is really not easy to build a similarity model, after all, human language expression is extremely complex.

Not to mention that there are many synonyms, abbreviations, specific words and variable syntactic structures in the text of most professional articles.

These have greatly increased the difficulty of computing text semantic similarity.

But this problem cannot be solved. Lin Hui knows that computing the semantic similarity of text is a very important branch field.

In the field of information retrieval, semantic text similarity calculation plays an important role in tasks such as text classification, text clustering and entity disambiguation;

In the field of artificial intelligence, semantic text similarity algorithms are also needed to support tasks such as question answering systems and intelligent retrieval.

In addition, semantic text similarity computation is also widely used in natural language processing tasks such as plagiarism detection, text summarization, and machine translation.

In a word, the research on the similarity model represented by the semantic text similarity algorithm has important application value.

If you don't solve the problem of calculating the semantic similarity of text, don't mention how to further text processing.

Leaving aside the question of getting machines to distinguish between similar texts.

It is extremely difficult to just want a machine to recognize text.

Natural language generally refers to a language that humans can understand. For example, the text you see is natural language.

But when we need machines or computers to process natural language.

Machines/computers cannot directly understand these symbols (Chinese characters, letters, punctuation marks, etc.).

These symbols must be digitized before they can be fed into a computer for subsequent processing.

It is not very useful after just digitizing.

Some other content must be introduced to reflect the attributes of words.

Just like we can't know from an ordinary code name whether this string of numbers indicates subscription, collection or reward.

In short, just a code name cannot tell the attributes corresponding to each string of numbers.

This problem is also one of the research hotspots in computing text semantic similarity.

How to represent the attributes corresponding to the numerical natural language?

The general practice of researchers is to vectorize or vectorize the numerical language.

A vector is a directional quantity compared to a scalar quantity.

In fact, this research direction is not new.

Lin Hui remembered that in his previous life, as early as 1975, some researchers proposed the Vector Space Model (VSM) for the first time, trying to use this model to process numerical natural language.

Lin Hui searched for relevant information and found that although this space-time is a bit slower, the method of VSM vector space model was also proposed in 1977.

The so-called VSM model may sound quite tall.

It's actually not that complicated.

The main idea is to assume that the semantics of a text is only related to the words in the text, while ignoring the relationship between its word order and words, and then map the text into vectors through the method based on word frequency statistics, and finally pass the distance between vectors Computed to characterize the similarity between texts.

Calculate the distance between two vectors?
This thing is the content of high school textbooks.

It is estimated that general candidates who have not forgotten the knowledge in their minds after the college entrance examination can use this model to calculate text similarity.

However, many high school students may not even know that the thing they learn can do this when they learn it.

(ps: ... the things learned in high school are very useful, don't give up if you don't see the use temporarily)

Of course, it is also because the model is simple and efficient.

For a long time after the model was proposed, it was the mainstream method in the field of text similarity calculation.

But the model is not without its drawbacks.

The VSM-based method still has two drawbacks:
On the one hand, when the amount of text is large, the generated text vectors are very sparse, which leads to a waste of space and computing resources;
On the other hand, VSM ignores the relationship between words in order to achieve the effect of simplifying the model, but in many cases there is a connection between words, so it is unreasonable to simply think that words are independent of each other.

These two flaws are particularly fatal.

The first directly affects the efficiency of processing similarity, and the second directly affects the accuracy of word sense similarity discrimination.

In this case, the VSM model was used for a while before the researchers discarded the model.

What people now use to calculate text similarity is not very clear.

However, Lin Hui noticed that the email Eve Carly sent him earlier did not mention anything about vectors.

Researchers these days seem to have forgotten about vectorization.

Perhaps now it seems that it is a very old research direction to use vectorization for natural language text processing.

But in fact, the direction of vectorization still has potential to be tapped.

The application of distributed word vectors can completely calculate the text similarity.

But it's normal for people in this time and space not to know.

Lin Hui remembers that many important achievements related to natural language processing in his previous life were blown out in the past two years in 2013 and 2014.

The previous life involved the architecture of the text similarity model.

The technology of distributed word vectors for computing semantic text similarity was born in 2013.

In the previous life, it was after the advent of distributed word vectors that breakthroughs in semantic text similarity were made.

This space-time rhythm has been slow for two years, and it is normal that the application of distributed word vectors to calculate text similarity has not been proposed.

One step behind, one step behind.

If the rhythm is slow in the past two years, this time and space will undoubtedly fall behind in many aspects.

These are undoubtedly good news for Lin Hui.

Applying distributed word vectors to construct a method for calculating text similarity is easy to say.

However, this issue is actually quite complex.

Therefore, Lin Hui did not reply to Eve Carly in the email.

If this space-time research involves the text similarity model architecture, it is short-legged.

Isn't Lin Hui very obliged to help?
It seems that the porter across time and space is going to be online again.

Of course, this transfer is not free.

Right now, Lin Hui is more concerned about the thesis.

In the case of directional deviation in related research, wouldn't it be easy for Lin Hui to publish several papers if he really wanted to write them?

It is easy to write a paper of this level.

Although Lin Hui didn't go too far in his academic career in his previous life, he has published about seven or eight papers in total.

Several papers are still in English.

In short, things like publishing papers are already familiar to Lin Hui.

Under such circumstances, Lin Hui felt that he could easily fill up the additional points required for a bachelor's degree from the Massachusetts Institute of Technology.

Despite this, Lin Hui decided to meet and communicate with Eve Carly first before doing anything related to the thesis.

After all, Lin Hui is not very clear about the specific progress of text similarity research in the Western world, and it would be embarrassing if he accidentally crashes a car.

Commercial collisions can be euphemistically called commercial competition.

An academic crash is the stain of a lifetime.

Now Lin Hui only hopes to meet Eve Carly soon.

Fortunately, the meeting Lin Hui was looking forward to happened not long after.

Lin Hui met "Eve Carly" at Beiyu Yubei International Airport.

Eve Carly was afraid that Lin Hui would not believe her identity, so she attached a bunch of proofs to prove her identity in the email.

Lin Hui had seen the photo of Eve Carly before.

I have to say that Eve Carly's appearance is very recognizable.

He has long golden curly hair, his height is around [-] by visual estimation, his body proportion is very good, and his curve is very S.

Although from a critical point of view, Lin Hui felt that the figure and appearance of "Eve Carly" in front of him seemed to be above 90 points.

The most important thing is to give people a very pure feeling, giving people a feeling of being free from dust.

Uh, how should I say this feeling, anyway, I am very protective.

However, Lin Hui was not so restless.

It's just a woman, it will only affect the speed of his liver paper/typing code.

"Eve Carly" seems to have not found Lin Hui yet.

Lin Hui walked up to meet her, and took the initiative to greet her in English, "Are you Eve Carly? I am Lin Hui, welcome to China."

Uh, Lin Hui can still handle these few sentences in English.

However, the person in front of him obviously hesitated.

Lin Hui felt very strange, could it be a mistake?

Just when Lin Hui was struggling, a voice came from behind him suddenly.

"Are you LIN Hui? I'm Eve Carly, nice to meet you!"

Lin felt discouraged, this is embarrassing.

The first time I picked up the person, I also recognized the wrong person.

However, it shouldn't be, the person in front of me has a very recognizable western face, and it is exactly the same as the woman in the ID photo sent by Eve Carly earlier.

Lin Hui turned around in confusion, looked at the source of the sound, and saw another "Eve Carly".

The one who spoke just now also has long golden curly hair, is also tall, has a great body proportion, and has a face value of more than 90.

The two people in the front and back are exactly the same, which is outrageous.

The most outrageous thing is that the temperament of the two people is very similar, both of which are very pure.

Lin Hui: Σ(っ°Д°;)っ

what's the situation? ? ?
Could it be twins?
Lin Hui looked back and found that the two were very similar in appearance.

But the temperament is actually slightly different.

Although both of them are very pure temperament.

But one is the kind that is innocent and lovable.

The other is the kind of bookishness that makes people respect.

Such a similar pair of twins came to the door.

Lin Hui had a very bold idea at that time!
……

……

It's a pity that a pair of twin sisters who look so similar will not be used as test samples when developing face recognition algorithms in the future...

(End of this chapter)

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like