Riding the wind of rebirth

Chapter 1549 Liberal Arts Dog Database

Chapter 1549 Liberal Arts Dog Database

In addition, there is another type of database. Compared with relational databases, the data logic layer is quite flexible in expression. There are four main forms: First, the key-value model. This model is relatively simple in expression, but has a strong scalability. The second is the column model. Compared with the key-value model, this model can support more complex data, but its scalability is relatively poor. The third is the document model, which has great advantages in supporting complex data and scalability. The fourth is the graph model. There are not many usage scenarios for this model, and it is usually customized based on the data of the graph data structure. The reason why such a database is needed is that some work scenarios are not clear about the logical structure of the database, and the expansion speed and amount of data are also unclear. If a relational database is used for storage, it will face instability at any time. Row and row adjustment, in a relational database that already stores massive amounts of data, adding or deleting columns at any time will be a difficult or even impossible disaster.

Therefore, non-relational databases are used to deal with application scenarios that require easy expansion, huge amounts of data, extremely high performance requirements, extremely high availability, and flexible data models.

Because this type of database stores data in a discrete way, it is called a non-relational database. They are basically born to solve the practical application problems of massive data and high-growth data, so they are also called "engineering dog databases".

There is a special type of data block in non-relational databases, and its data logic layer is a data management system based on graph theory.

A graph is a collection of points and edges, where "points" represent entities and "edges" represent relationships between entities. In a graph database, the relationships between data are as important as the data itself, and they are stored as part of the data.

Such an architecture enables the graph database to quickly respond to complex association queries because the relationships between entities have been stored in the database in advance.

Graph databases can intuitively visualize relationships and are the best way to store, query, and analyze highly interconnected data.

Such a data structure directly stores the dependencies between nodes. In addition to storing the correlation between data as part of the data characteristics, labels, directions and attributes can also be added to the correlation. This is also compared with graph databases in relational queries. Reasons why other types of databases have huge performance advantages.

To give an example, a point represents an entity or instance, which could be a person, business, account, or any other item that you want to track. They are roughly equivalent to records, relationships, or rows in a relational database, or documents in a document storage database.

Edges are also called relationships, which can be understood as lines connecting nodes to other nodes; for example, these people belong to this company, this company has opened these accounts, and so on.

When exploring the connections and interconnections of nodes, attributes and edges, you often get unexpected value insights. For example, discovering abnormal transactions between upstream and downstream personnel of an enterprise on a member of the enterprise is an analysis of an unreasonable "edge" ".

Edges can be directed or undirected. In an undirected graph, an edge connecting two nodes has a single meaning. In a directed graph, edges connecting two different nodes have different meanings depending on their direction.

For example, among family members, the relationship between father and son is an edge constructed by two nodes, with different meanings in two directions.

Such a database is most suitable for processing and analyzing the liberal arts knowledge system, so Zhou Zhi is determined to develop it, and even directly named it the "liberal science dog database".

But this reason actually accounts for at most half, and the other half is because the graph database has become the most common best tool for social network data storage and analysis in Zhou Zhi's time, and has become the best tool for finding coincidences and deep traversal. The best tool for large amounts of complex and interconnected data.

With the explosive development of social networks, e-commerce, resource retrieval and other fields, it has become particularly effective to use graph databases, a storage technology that can handle complex associations, to further organize storage, compute and analyze mining of low-structured and interconnected data. Therefore, it quickly developed vigorously and extended to five research directions: graph matching, keyword query, graph classification, graph clustering and frequent subgraph mining. The benefit is that it can optimize the retrieval of data up to one billion levels, greatly improve the data traversal speed and traversal stability, greatly reduce the server pressure during the retrieval process, reduce system overhead, and is not affected by the massive growth of data. Complete In the Internet era, especially relational databases are simply incapable of doing the job.

However, graph databases have entered everyone's field of vision along with high-end tools such as distributed storage, big data analysis, and AI retrieval in later generations, which leads many people to think that it is a new thing.

In fact, this is a misunderstanding. For example, the mathematical theory of big data was actually completed in four years. It was simply because the application scenarios and software and hardware tools at the time could not realize it.

The development of graph databases actually has a very long history. As early as the 1960s, IBM's IMS navigation database already supported hierarchical models and tree structures, but in a special form.

In the late 1960s, network model databases were born and could already support graph structures.

CODASYL (Committee on Data Systems Language) defined COBOL in 1959 and Network Database Language in 1969.

It's just that the performance of the hardware cannot support complex query requirements, and it has not been widely promoted and used.

In the past thirty years, graph databases have actually been evolving and developing with the continuous improvement of computer performance. However, it is still in the laboratory research stage. Everyone is playing with it as a high-end academic, and for a while, they can’t clearly see its applications. Scenes.

Therefore, under the guidance of Zhou Zhi, the Information Intelligence Research Institute of AXA Fund found a lot of leaks and purchased a lot of intellectual property rights as technical reserves from various research institutes, computer departments, large information companies, etc. around the world.

It will have to wait until two thousand years later, with the generation of a large amount of related data in the Internet era, the widespread application of the RDF resource description framework in network exchange resources, and the emergence of graph databases with ACID transaction guarantees, that graph data will return to the center of the historical stage. .

By then, companies that own the core intellectual property rights of graph databases will undoubtedly become the darlings of the times.

But everything must have a reason, that is, the underlying logic. Without the reasonable and strong background support of a development graph database, Zhou Zhi wants to get Li Laosan’s understanding, the company’s resources, and even the school and the country’s support. Strong support is almost impossible.

The digital library project is the best anchor that Zhou Zhi has found for developing graph databases.