Exploring Historical Relations: Social Network Analysis with Gephi

Tuba Nur Saraçoğlu
Nov 12, 2023
12 min read

Updated: Nov 19, 2023

Blog Post: Tuba Nur Saraçoğlu, Ph.D. -Mardin Artuklu University

Network analysis, which has been used in fields such as space science, biology, the internet, and transportation networks, has also provided new possibilities for social scientists for a long time. Based on the analysis and visualization of interconnected structures, this method is gaining popularity with different software packages used by researchers. This blog post will introduce the Gephi platform for researchers interested in the field. We will first submit the pages in Gephi on a ready-made dataset and then attempt to demonstrate how to work with a fictional historical dataset.

Let's get started!

First, download the format suitable for your Windows, Mac OS, or Linux computer from their website.

After installing and running Gephi on your computer, you will see a section on the startup screen where you can load the file you will work on. Here, you will see "Open recent" (recently opened and processed files), "New Project" (where you will load your new data set), and "Samples" (sample working files). In the first step, let's start by exploring the features of Gephi using the "Les Miserables" project marked below. Then, we will create a project from scratch with a fictional dataset.

"Les Miserables" is an example project that visualizes the relationships between characters in the novel "Les Misérables." When you click on it, you will see information about this project that you will load in the open window.

In the open window, you will find information about an undirected, connected structure with 77 nodes and 254 edges. The goal is to display the 77 characters included in this data set from the novel and the 254 connections between them. Since the relationships between individuals are mutual, no direction is specified for the relationships, and undirected links have been chosen. If you want to create a network where the relationship is one-way from one person to another, select the "Graph Type" as directed at this stage and specify this in the data set you prepare (You will see an example shortly). After clicking "OK" without making any changes in the open window, you will generally be automatically taken to the "Overview" section. Let the means set start by getting to know each element of this section in general, as indicated by the numbers in the image below.

In this page, you can expand and contract the columns on the right and left of the "Graph" section in the middle with the help of the mouse. You can also Ultimately minimize the "Statistics," "Appearance," and "Layout" sections using the minimized tab in the top right corner of each. (Attention! If you close these sections by clicking the 'x,' you will need to reopen them by selecting the completed area from the menu (Part 1) by clicking on "Window" in the menu that opens.)

Let's" start by introducing the numbered sections more specifically:

This is the menu section of Gephi.
It refers to three main screens that you can use for each workspace.
If you are working with multiple datasets simultaneously, each opens as a separate "workspace." In this case, you view the workspace for the "Les Miserables" file.
The "Appearance" section is where you can adjust the fundamental elements of network analysis, namely "nodes" and "edges."
The "Layout" section visualizes the overall network structure and offers different options.
These are shortcuts that make your work more convenient while performing operations.
Shortcuts relateGraphnodes and a shortcut button for moving the graph to the center of the graphic area.
This is the section where labels assigned to nodes and edges are edited, and it also contains shortcuts for taking screen captures and changing the graphic background after all operations.
This button allows you to perform the operations in the eighth item within a larger workspace.
On the data set used, it automatically calculates centrality measures.

Gephi provides a practical workspace where you can make various changes in the Overview screen and see their Graphctions in your graph. Now, let's delve into more detail and examine the sections we classified into ten groups above, one by one.

1. First, let's take a look at the menu section:

File: The "File" tab in the menu section is where you can create a new project, open previously used projects, and load data set files you want to work with, just like the project loading/opening page that appears when you first open Gephi. If you've already created a workspace and want to open another one, you can do it from the "File" tab. Note: The "Open" section under "File" opens the file where your Gephi-related work is saved on your computer.

Workspace: This section in the menu area opens a blank workspace for you and can also be used for deleting and renaming existing workspaces.

Window: This section helps you activate the windows you want to use in your workspace. You can also use this tab to reopen windows you may have accidentally closed. Hover over the desired section and click to activate it.

Among the other tabs in the Gephi menu, the "Tools" tab in the"" "Options" section allows you to make certain custom settings (such as adding various shortcuts)" The "View" tab —the"your"workspace to full-screen. The "Help" tab allows you to check for updates.

2. After the File menu, let's introduce the three main screens: "Overview," "Data Laboratory," and "Preview." Each of these screens serves different purposes in network analysis and visualization, and actions in one affect the other two screens within the workspace. Overview is the section where you perform statistical analysis and visual adjustments of the dataset you use in your workspace. Dspecificata Laboratory is where you can view, load, and open window your dataset. Preview, on the other hand, is the section where you perform the final visual processes after all the analyses and edits. Now, let's continue to introduce the windows in the Overview section.

3. Workspace: This shows which dataset you are currently working with.

4. "Appearance" is the section where we make all visual interventions on "nodes" and "edges". After the statistical analyses, by clicking on the nodes tab, you can adjust the colour and size of the nodes and the size and color of the labels on the nodes according to their centrality. When you click on the Edges tab, it is possible to do the same for the edges/connections connecting the nodes

For example:

In the Appearance section, when you select the area you want to edit, it will turn into a dark gray background. Here, the "partition" section is active, which determines the color settings for nodes and the colors based on centrality measures. The "Nodes" and "Edges" tabs are related to nodes and edges; whichever is clicked on determines which element you are working with. When the "Nodes" tab is active, the tabs that appear on the right side allow you to change the color (in a palette format), size (displayed with nested circles), label color (marked with a color line below "A"), and label size (written in large and small scales) for nodes. To make changes in a specific section, you need to select "nodes" first and then the relevant subsection. While "Nodes" or "Edges" is active, the "unique" tab that appears at the bottom allows you to make adjustments according to your preferences based on any centrality measure. Clicking on this tab allows you to display all nodes or connections in a single color. The "Partition" tab allows you to adjust changes following cells and color, size, and label settings for nodes based on centrality measures. In the "Ranking" tab, you can adjust the sizes of nodes and labels based on the groups in which nodes are located.

5. Layout: General graphic adjustments:

In the Layout section, you can select your graphic layout and click "Run" to let the graph reach your desired shape. After that, you need to click the "Stop" button.

6. Here are some of the shortcuts

7. Here are some of the shortcuts

8 and 9. Additional Settings for Nodes and Edges

The strip-like section at the bottom of the Graph, marked with the rightmost tab and opened in a downward window, is where you adjust labels applied to nodes and edges. The boxes next to Node and Edge expressions are clicked to activate and make the necessary adjustments. The Configure section opens a separate window to help with this. You can also make all these adjustments using the red-marked sections on the first row of the strip shown in the image without expanding the window downward. From left to right, dark-colored "T" activates the labels on nodes, light-colored "T" activates the labeling on edges. The colored and colorless arrow symbols in between are used to hide or colorize edges. The unit after the light "T" indicates the label size on edges. The part starting with the letter "A" indicates the font used for nodes. The red-marked section that follows is the Configure section.

Regarding the green, a size the color automatically takes a screenshot of the prepared graph, while the blue sets the background color to completely black.

10. Statistical Calculations

Statistics in Gephi is the section where you perform network analysis and measure the network properties of each node. To use this section, you need to click on the "Run" next to each measurement and confirm the open windows. This way, the numerical values for each measurement will automatically appear next to it. The calculations made in this section for each node are then incorporated into the "Data Laboratory" section.

It's important to note that centrality measures, which we mentioned earlier and can be used for visualizations in the "Nodes" and "Edges" sections, will only be available after these statistical analyses. Thus, without these measurements, it wouldn't be possible to determine the value of nodes in the network and create visualizations based on that.

Now that we've completed our journey through the Overview page, let's move on to the Data Laboratory.

The Data Laboratory is a page that may be used less frequently but forms the foundation of the entire work. In our example project using the "Les Misérables" dataset, nodes and edges information were automatically added to this page when we opened the file. As seen in the "Workspace 1", a "Data Table" has been created, and on the left, we have our "Nodes" and "Edges" tabs. You can manually create the Data Table using the "Add Node" and "Add Edges" tabs. To load your own prepared data, you can use the "Import Spreadsheet" tab, and to download the data from this page, you can use the "Export Table" tab. In this screenshot, the "Nodes" tab is active, and it contains columns such as "Id" (code), "Label" (label), "Interval," and "Modularity Class." "Id" represents the code we assigned to each node, "Label" is the label that should be displayed on each node, "Interval" is typically left empty, and "Modularity Class" is a section used to group nodes into different categories. Now, let's explore the Data Laboratory page by clicking on the "Edges" tab.

When you click on the "Edges" tab, the columns change. In this section, you can find information about the connections between nodes. You will see columns like "Source" (the source node), "Target" (the target node), "Type" (whether the connection is directional or undirected), "Id" (the number of labels you want to appear on connections), "Label" (the labels to be shown on the connections), and "Weight" (the number of connections between two nodes).

As we mentioned earlier, when we performed statistical analyses on the "Les Misérables" dataset in the Overview section, it was incorporated into the Data Laboratory section. After performing statistical analyses in the Overview section, the Data Laboratory tab will look like the one below. Here, you can see the values each node has obtained from statistical calculations.

In addition to the information in the columns Id, Label, Interval, and Modularity Class, you will also encounter various centrality measures. The numerical values of the network analysis results shown here provide the opportunity to answer many questions related to your study.

After introducing the Data Laboratory section in this manner, let's move on to the Preview winlet'sn GeGraphIn this window, we can make final visual adjustments to our graph. All changes made in the Overview and Data Laboratory windows can be directly seen here. In this window, you can adjust the visibility of nodes and edgGraphhe font and size of labels, and whether connections in the graph are straight or curved. As seen below, we are still working in Workspace 1.

The "Preview Settings" window is where we make adjustments regarding nodes, edges, and node labels. To make these adjustments, the box next to the relevant field should be active. To view the graph, you should click on "Refresh" in the bottom left corner and click here to refresh all changes. You can change the background color of the graph to black using the "Background" section. Right next to it, the "reset zoom" section allows you to position the graph up or down within the area. You can use the + and - signs to zoom in and out on the graph. After completing all your adjustments, you can click "export" to download the graph to your computer.

Now that we have learned how to use Gephi with a ready data set, let's prepare a fictional one. Let's consider creating a fictional data set representing teacher (Hoca)-student (Talebe) relationships in a historical period. For this network, it is necessary to determine the names and their relationships historically. Here, we will model a small network of 8 teachers and 42 students, totaling 50 individuals. While creating the data set, it's important to code the individuals and the ID numbers we assign to them and then express the relationships based on these ID numbers mathematically. This means assigning a unique Id to each person, coding the relationships based on these Ids, and creating the data set in the format compatible with the Nodes (with Id and Label columns) and Edges (with Sources, Target, Type, Id, and Label columns) in the Data Laboratory section. For example, if “Student 1” has studied under “Teacher 1”, “Teacher 1”s Id number (1) would be the Source, and “Student 1”s Id number (9) would be the Target. It is important to accurately determine historical information and add it to the data set

When uploading your dataset into the system, go to the Data Laboratory and use the "+Import" option. To ensure that there are no issues when uploading your dataset, make sure to:

Have your prepared file in .csv format.
Ensure that there are "Id" and "Label" headers for Nodes.
For Edges, make sure there are "Source" and "Target" headers. Additionally, check that the file contains "Type," "Id," and "Label" headers as shown in the image above.

You can convert a data set prepared in Excel to the CSV format by following these steps: "File + Export + Change File Type + CSV (Comma Separated).

Uploading Nodes:

For loading Edges data, ensure that the "Edges" tab is active in the Data enable allows"aboratory section, and use the "Import Spreadsheet" option to select whicheverraphEdges data in CSV format from your computer.

After uploading the file, in the subsequent window, ensure that "Semicolon," "Edges table," and "UTF-8" are active and confirm the next page.

When you reach this window, choose whether your graph is "Directed" or "Undirected." You will see the numbers of nodes and edges displayed here. Ensure "Append to an existing workspace" is selected so you can add this data set to the same workspace.

Once you've completed all these steps, your data will be visible in the "Edges" tab, as shown in the image above.

After the data upload phase is complete, you can proceed to the Overview section to work on the layout of your graph. In the Overview window, the first image that will appear is as follows:

To make the desired changes, you should first activate node labels by clicking on the red "T" as indicated. Then, click the hidden window on the right side called "Statistics" to perform your measurements. After that, determine the overall appearance of the graph from the "Layout" tab. Once you've completed these steps, your graph will look like the image shown.

Following this, let's continue our operations in the "Appearance" section, where you can make adjustments to nodes and edges. To set the color of nodes, ensure that you've selected the areas marked in red below. You can also access different color options by clicking on "Palette." Don't forget to apply all these operations by clicking "Apply" for them to be active.

In this short relational network of 50 individuals in our fictional dataset, we can represent teachers in different colors and sizes based on the number of students and students in different numbers and dimensions based on the number of teachers and their relationships. Visualizing such scattered information collectively and according to their relationships will open up new avenues for asking questions. Analyzing data with Gephi, using different algorithms and centrality measures, provides a new way of enriching historical research. Visualizing data based on different algorithms and centrality measures introduces a new perspective beyond traditional methods. Gephi, with its rich functionality, allows you to enhance your research with these innovative methods.

After completing these steps, you can go to the Preview window to make final adjustments and download your graph to your computer.

The visual created using the "Fruchterman Reingold" algorithm shows that the most active node is located in the center. It allows us to see which teachers and students are more active than others within this group of 50 individuals. This seems meaningful for identifying central and peripheral individuals within this small network. What do you think? You can obtain the visual shown below by rearranging the same data with the "Yifan Hu" algorithm and positioning the nodes where you want.

The final appearance of this graph in the Preview window is as follows:

In this small relational network of 50 individuals in our fictional dataset, we can represent teachers in different colors and sizes based on the number of students and students in different numbers and dimensions based on the number of teachers and their relationships. Visualizing such scattered information collectively and according to their relationships will open up new avenues for asking questions. Analyzing data with Gephi, using different algorithms and centrality measures, provides a new way of enriching historical research. Visualizing data based on different algorithms and centrality measures introduces a new perspective beyond traditional methods. Gephi, with its rich functionality, allows you to enhance your research with these innovative methods.