Big data in China

Chapter 7 The Great Transformation Opportunity for the World in the Big Data Era

Chapter 7 The Great Transformation Opportunity for the World in the Big Data Era (1)
what can we do what will we do

●In any case, the only thing we need to do now is to welcome the big data era with open arms.

●Today, if you are or plan to try to use cloud technology, such as cloud sharing or cloud computing, then congratulations, you have become a member of the era of big data, or a victim of it.

●The emergence of big data is firstly an opportunity, and secondly it brings great challenges.We should not only enjoy the benefits it produces, but also be alert to the potential ills behind it.

☆ Grasp its core problem first
Big data has a multi-layered structure, which means it comes in many forms and types.Some people think that the increasing use of the Internet to search is the main reason for the formation of data diversity. Formed by data types such as mobile phone call records and sensor networks.Data sensors can be placed in more places, such as cars, airplanes, satellites or mobile phones, all of which increase the diversity of data.

Compared with traditional business data, big data has irregular and ambiguous characteristics, so it is difficult or even impossible for people to use traditional software or methods for analysis, and sometimes even collection becomes impossible.With the evolution of traditional business data, its format has been able to be recognized by standard intelligent software. The challenge we are currently facing is to process and mine value from complex data presented in various forms.

A survey of the rate at which data is being created suggests that by 2020, the world will have 220 billion internet-connected devices.In the era of big data, the speed at which data is created and moved is very fast. Creating real-time data streams is a popular trend, because of the existence of high-speed computers and servers, it is not difficult.On this basis, we must also know how to quickly process and analyze data and meet the real-time needs of users.

It is an indisputable fact that we (both businesses and individuals) face massive growth in data volumes.In another 15 years, the amount of data in the world will expand to 50 to 60 times that of today.Its size is an ever-changing indicator, and no one can predict the extent of future technological leaps.But what is certain is that the growth of data volume will only get faster and faster, and will never slow down.In addition, data can be generated and stored from all kinds of unexpected sources.

☆Think about what you can do?
In the future, where will our competitive advantage (the advantage over strong players) come from?Think about this question, and you will understand the mission that big data entrusts to human beings.Future competitive advantage is difficult to extract from the "warehouse" of manufacturing or industrial resources, but from data, and the corresponding ability to collect, analyze and use it.

In the future era of big data, only companies that can provide the data platform with the most functions and the largest amount of data can win in the competition of enterprises; only countries that can have the most powerful big data industry can win in the competition of countries Have the last laugh.

Big data scientist Schoenberg said: "Now there are more and more data, and people can collect and analyze more information related to the problems they want to study. Through these data, people can get a lot of insights to help them do more research. make choices and decisions.”

He believes that only when we analyze all relevant phenomena, all or most of the data, can we discover problems and options that have not been seen before.Therefore, people must learn to make good use of more data.Under this premise, Schoenberg pointed out to us: the biggest change in the era of big data is no longer a strong desire for causality, but more attention to correlation. (I disagree with this point of view, and we will focus on it in the following pages.)
That is to say, Schoenberg believes that in the era of big data, as long as we know the "what", we don't need to know the "why" to achieve more ambitious goals.This is new thinking, and that's exactly what we're going to do.We must create new ways of communication and establish new cognitions in order to keep up with the pace of big data and become a new type of modern people.

☆Recognize the value of data: reuse
What is the value of data?The key point is that it is always changing, never static.In the past (the era of small data), data often lost its meaning after being used once, but today, data can be reused.You can recall it and use it at any time without worrying about it being damaged or losing function.

The real value is that it can be used over and over again.The value of this "reuse" has made the importance of data hundreds of times or even thousands of times higher than in the past.

Due to this new feature, the role of the Internet has been infinitely expanded, and finally gave birth to the big data industry in every industry, because everyone has the need to reuse data.Businesses have it, and individuals have it.For the whole world, this may mean that the big data industry will lead the development of the economy and affect our lives in all aspects.

☆Is it time to build a big data era thinking for yourself?
However, for ordinary people, what do we need to do to be more adaptable to the times, or to be at the forefront of this era?
Think about it carefully, do you have the opportunity to lead your own big data era?In the United States, there is an innovative company Dekde, which can help people make purchase decisions, tell consumers when to buy which products, and when to buy the cheapest.It is always able to shrewdly predict the price trend of the product.

How does it do it?The powerful driving force behind it is the support of big data.They have collected billions of data on global websites, and then helped hundreds of thousands of users save money, find the best time for their purchases, improve productivity, reduce transaction costs, and provide services for those terminals. Consumers bring more value.

Under this type of model, although the profits of some retailers will be further squeezed, from a commercial point of view, more money can be put back into the pockets of consumers, making people's shopping more rational. It will not cost a lot of money to do small things, and can reduce the probability of buying fake products.

This is a brand new industry that relies on big data.This company, which saves money for hundreds of thousands of customers, was acquired by a super enterprise at a high price not long ago.

Another example is related to SWIFT, which is the world's largest payment platform, and every transaction on this platform can be analyzed with big data.They can predict the health and growth of an economy.For example, if you provide the economic index of a certain region in the world, you can get accurate statistics, calculations and forecasts for different regions in real time and on the spot.

Data can tell us the consumption propensity of each customer: what do they want?what do you like?What are the differences in each individual's needs?Which ones can be integrated together for taxonomy analysis?Companies with forward-looking vision have already laid out based on this, and realized data-based analysis, services and predictions for consumers and users.

Most people do not have the ability to start such a company, but we can find roles in the big data industry that we can play, such as data engineers, people who provide ideas or develop programs, and of course people who collect and organize data.We can successfully establish this kind of thinking in our lives, become a well-deserved "data control" and take care of our lives.

☆Our future - development and full use of data
You can think carefully about how the collection, analysis and processing of data should be carried out.We will introduce and discuss one by one in order, and put forward some views that are different from some widely popular common sense.

The first step: data collection.

Collection is the first link in the big data supply chain.Data is the raw material of the big data industry. Without raw materials, no industry can develop.From a broad perspective, information is data, and we can obtain information through various public or private channels.This information comes in all shapes and sizes and comes from different places, all of which we put together.

As the cost of collecting data becomes lower and lower (because the market for collecting data is increasingly developed), it is possible for us to obtain almost all valuable data at a relatively low and acceptable price.This information includes all fields, even the entire discipline of human civilization history that you can't understand in a lifetime-from social networks, emotions, military politics to weather forecasts, economic indicators and tedious public information. The raw material of "big data processing factory".

You can collect information from the Internet, click the mouse to reach any website, check what you are interested in, and then record;
You can collect information from smartphones, iPads or other mobile data platforms, and they can always faithfully provide you with information services according to your preferences;
You can collect information through email or traffic statistics tools, which is data related to a specific organization.For example, the number of consumer visits, product recall and customer loyalty index, etc., you can get them at a very low cost.Now that collection has become an easily achievable goal, the discussion of legality has been put on the table when technical conditions permit. "Can I just take information over here? Are there no restrictions?" Of course there is.Certain data is strictly regulated, such as medical information, personal property, and marital information.Under different circumstances, the collection of information may face both legal and illegal determinations.If you're involved in the exploitation of personally identifiable information, it's probably illegal; if not, there's legal murkiness.

Worldwide, our judicial system does not have a unified opinion on whether online information represents personal identity (privacy), and this includes IP addresses.However, recently some district courts in the United States have begun to legislate and clarify some management regulations. For example, the Supreme Court of California ruled that zip codes are personal information, and made mandatory restrictions on which agencies can collect relevant data.In the age of computers and the Internet, everyone has become a potential source of data.Take mobile phones as an example. After entering the era of smart phones, mobile phones have become an excellent information collection and transmission device. It can sense light, sound, movement, location, nearby networks, computers, other mobile phones (users and their locations), etc. .This is an ideal data collector. If mobile phone users install the manufacturer's software, they will automatically join the data supply chain.Sometimes they lack awareness because people focus more on usage features and convenient services (including software upgrades and information access functions).

This means that regardless of legal and illegal judgments, information is becoming massive and ubiquitous.It is a very challenging job to achieve matching collection speed.To accomplish this work, we need to use new technologies and platforms to promote technological innovation, thereby promoting a series of industries.

The second step: data extraction and cleaning.

Data collection is good, does not mean everything will be fine.On the contrary, the work has only just begun.Once collected, they must be extracted and sorted.In the intelligence field, this is called "extract, transform and load", and it involves storing data in a designed database, performing certain processing, and then making it easy to retrieve and use.

One of the most notable features of big data is unstructured.It does not have a natural structure, and information is often chaotic, messy and irregular at the initial stage of collection, and there are all kinds of sources and properties of information.This shows that we do not know the internal structure of this information until the extraction and analysis work is carried out.

It's a headache, isn't it?Next, the need for information transformation arises.We need to quickly analyze the data and define different structures while maintaining the source data.

The third step: the development of hardware.

At this time, the development of hardware was put on the agenda.Without upgraded hardware, upgraded software cannot be carried, and huge analysis projects cannot be satisfied.Any data we collect and extract needs to be analyzed by humans or machines, more by machines than by humans.

Here, hardware exists in the form of computing, storage, and networking, mostly using computers as carriers and becoming part of data servers.Big data isn't going to change that, but it's repurposing traditional hardware and making cloud computing a favorite.Because cloud computing makes data virtual and real-time, it can not only accept massive data for analysis, but also clear the data at any time to achieve on-demand analysis, which makes it possible to accurately analyze massive data.

Step [-]: The importance of the platform.

We need to create platforms and frameworks that can be used to quickly process massive amounts of information without which the aforementioned work would not be possible.On this platform, the way we speed up data analysis is to decompose the data and analyze several parts separately.Of course, there is another way, which is to establish a path of document processing steps, each step is optimized for a specific task.

The platform must also have an important feature: to produce results quickly, instead of only processing a large amount of data but not being able to guarantee real-time performance.This is important because people need both real-time information and repeated analysis of the data.For example, when providing web search results, it is impossible for Baidu to display the search page after 24 hours. It must be presented instantly to meet user needs; flight and hotel information must also be presented in real time.The only way to achieve these goals is if the platform has the function of dispatching tasks, which is why large Internet companies have hundreds of servers.Finally, the platform must also meet the needs of people for repeated use, which requires higher technology.

Step Five: Machine Intelligence.

In the big data supply chain, the intelligence of the machine is quite critical.Because the data is too large to be processed manually.Especially for most of the data we want to analyze today-the entire big data industry, it is difficult to move without the help of machines.The intelligence of machines is an inevitable trend. Whoever occupies the highest position of machine intelligence will take the lead in the big data industry. With the core technology, he will not be controlled by others, but will reach the realm of "controlling others".

In the stage of data and information collection and extraction, machines have already stepped in to help.For example, deduce a large amount of information and summarize the meaning of the data; summarize the daily and weekly service satisfaction of thousands of customer service personnel; conduct statistics on the booking volume of train tickets and air tickets.You can't get humans involved because they're too slow to be real-time.

Not only is the machine involved, but its ability to learn is also important.If we want to analyze information, we must try to try faster speeds in more difficult environments, and naturally we must continue to improve the intelligence of the machine.In other words, in the era of big data, our machines will become smarter and smarter.They will gradually become able to think more deeply, possess certain emotional patterns and logical judgment.While we cannot yet predict the future of intelligent machines, they are already behaving like the infancy of human intelligence.

Step Six: Human Role.

Although the intelligence of machines is very important for data analysis, it can never replace humans.The human eyes, ears and brain are still (probably always) the most intelligent tools in the world.No matter how advanced the machine is, it is ultimately just to extend the dimension of vision and provide data in a human-readable form.

Therefore, what is important is not the machine or the human side, but "human-computer interaction".Most analysts clearly know that people are the masters of data, and machines are just wage earners.Creve is a pioneer in human-computer interaction research. He has designed a system that utilizes dozens of independent data sources. The function is very powerful. It can not only display the system in an operable 3D environment, but also can complement to sound and other signals.His research showed that if people entered data this way, analysts could find answers in minutes instead of hours.

The role of human beings is to control machines and become the masters of data, on this basis to increase the speed and parallelism of human-computer interaction.Of course, humans also need to design new interfaces and multi-sensing environments for machines to facilitate data analysts and machines to work hard together and process data efficiently.

The seventh step: data storage.

We have to think about the storage of data.In fact, this problem will become a key part of people's design from the very beginning, because the storage space occupied by big data is too large.

In the huge data, in addition to some source information, there are also a large number of changed data.We collect, arrange, modify, and process them; there are also summary tables and tables obtained through analysis, and many format files are generated from this.In order to provide as much space as possible, we need to develop new technologies to allow data to have a more spacious "home".

In general, what does storage mean?A data expert said: "Storage is to use traditional flat files and related data sets plus post-SQL storage system to save cloud data and initial data." If in the big data supply chain Without this link, we would not be able to back up everything, and the database would be difficult to meet the standard and cannot support the huge workload.It is like a person who is hungry but has a small stomach.

Step [-]: Reach the goal of sharing data and coordinating actions.

(End of this chapter)

Tap the screen to use advanced tools Tip: You can use left and right keyboard keys to browse between chapters.

You'll Also Like