Data Science 👩💻 | Getting started with Neo4j and Gephi Tool
Neo4j Tool
Neo4j stores and manages data in its more natural, connected state, maintaining data relationships that deliver lightning-fast queries, deeper context for analytics, and a pain-free modifiable data model. It is a graph database management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing.
In a simple word, Neo4j is the MySQL of the graph databases. It provides a graph database management system, a language to query the database, a.k.a CYPHER, and a visual interface with the neo4j browser.
Let’s start the demo,
- Download neo4j Desktop, and install it
- After the installation,
For the example I am running hello world query which will create the 2 nodes called Neo4j and Hello world and 1 relation called says.
You can see that the 2 nodes is created and one relation called says is created using the query.
In the below image you can see the table view of nodes and relations.
Here I have used Movies database for demo purpose only, you can create by yourself just by clicking Create new. Start the Movies database and see the database in the Neo4j browser.
After that load the movie database to the neo4j and it will show the data in graph format.
In this database,There are 9 person nodes and 8 movies nodes and total 18 relationships between nodes. use below command to find total nodes.
In this database,There are 9 person nodes and 8 movies nodes and total 18 relationships between nodes.
MATCH (n) RETURN count(n)
//find labels in database
CALL db.labels()
// Find types of relationship between tables
CALL db.relationshipTypes()
By using this query we can know that how the person is connected to the movie,who is producer of movie, which role person acted in the movie.
Find movies released in the 1990s…
// query for the movies released in 1990s..MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title
Here the list of movie released in 1990s,
List all Tom Hanks movies,
//query for list all tom hanks movieMATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies
Who directed “Cloud Atlas”?
query for this is,MATCH (cloudAtlas {title: "Cloud Atlas"})<-[:DIRECTED]-(directors) RETURN directors.name
Advantages of Neo4j databases
- Performance : In relational databases, performance suffers as the number and depth of relationships increases. In graph databases like Neo4j, performance remains high even if the amount of data grows significantly.
- Flexibility : Neo4j is flexible, as the structure and schema of a graph model can be easily adjusted to the changes in an application. Also, you can easily upgrade the data structure without damaging existing functionality.
- Agility : The structure of a Neo4j database is easy-to-upgrade, so the data store can evolve along with your application.
Gephi Tool
Gephi is an open-source network analysis and visualization software package. It is mainly used for visualizing, manipulating, and exploring networks and graphs from raw edge and node graph data. It is an excellent tool for data analysts and data science enthusiasts to explore and understand graphs.
In this demo I have chosen a simple karate.gml dataset and performed some basic gephi operations on it. So lets get started.
- Open Gephi and click on New Project. Then choose File->Open and load the dataset of your choice as shown below. On loading the dataset it would show the number of nodes and edges present in the dataset as well as the type of the graph.
2. Below is how all the nodes and edges are displayed when initially dat is loaded.
3. Now we can represent the data in various layout. In he left pane choose the layout option and choose the layout of your choice and click on Run. In the below image I have chosen the ForceAtlas layout which displays the data in the following form.
4. Next we can differentiate the nodes based on various ranking like there In-Degree, Out-Degree or Degree and show them in different color. For this in the left pane on top side choose Nodes->Ranking there choose the ranking like in below image In-Degree is chosen, where red color nodes have lower in-degree compared to white and Dark grey node has highest in-degree rankings.
5. More clear visualizations can also be made by displaying the nodes in various sizes. For instance in the below image nodes having higher degree are larger in size compared to nodes having less degree i.e nodes in Dark grey have high value of degree compared to nodes in white and red color.
For displaying in various size in left pane in Appearance section select the Size option and then mention minimum and maximum size of nodes you want to display. I have given the Min size to be 10 and Max size to be 30.
6. Next we generate a Degree Distribution graph for Degree, In-Degree and Out-Degree and also get the Average Degree value for all the nodes. To generate the graph simply in the right pane choose Statistics tab and there run Average Degree in the Network Overview section.
A report will be generated as well the column for degree will be added to the dataset table.
To see the Data Table in the top Menu Bar select Window->Data Table and you would be able to see your table like as in above image where after running the Average Degree function columns for In-degree, Out-Degree and Degree is added for each node present.
8. Now we can try and different functionalities as well as try various layouts in the Gephi tool. In the below image I have used the Noverlap Layout.
That’s all for this introduction to Neo4j tool. You can easily visualize ll the info in this tool. Hope you get what you want.
Thank You!!