AI

Production Style NetworKit 11.2.1 Coding Tutorial for Large Graph Statistics, Communities, Primitives, and Sparsification

In this tutorial, we use the production range, a large graph analysis pipeline NetworkKitfocused on speed, memory efficiency, and version-safe APIs in NetworKit 11.2.1. We create a large-scale free network, extract the large connected component, and combine the core signals of the structure with k-core decomposition and the average area level. We also find communities with PLM and measure quality using modularity; measure the distance structure using effective and measured diameters; and, finally, we split the graph to reduce cost while preserving important properties. We export the partitioned graph as an edge list so we can reuse it in downstream workflows, estimation, or ML graph preprocessing.

!pip -q install networkit pandas numpy psutil


import gc, time, os
import numpy as np
import pandas as pd
import psutil
import networkit as nk


print("NetworKit:", nk.__version__)
nk.setNumberOfThreads(min(2, nk.getMaxNumberOfThreads()))
nk.setSeed(7, False)


def ram_gb():
   p = psutil.Process(os.getpid())
   return p.memory_info().rss / (1024**3)


def tic():
   return time.perf_counter()


def toc(t0, msg):
   print(f"{msg}: {time.perf_counter()-t0:.3f}s | RAM~{ram_gb():.2f} GB")


def report(G, name):
   print(f"n[{name}] nodes={G.numberOfNodes():,} edges={G.numberOfEdges():,} directed={G.isDirected()} weighted={G.isWeighted()}")


def force_cleanup():
   gc.collect()


PRESET = "LARGE"


if PRESET == "LARGE":
   N = 120_000
   M_ATTACH = 6
   AB_EPS = 0.12
   ED_RATIO = 0.9
elif PRESET == "XL":
   N = 250_000
   M_ATTACH = 6
   AB_EPS = 0.15
   ED_RATIO = 0.9
else:
   N = 80_000
   M_ATTACH = 6
   AB_EPS = 0.10
   ED_RATIO = 0.9


print(f"nPreset={PRESET} | N={N:,} | m={M_ATTACH} | approx-betweenness epsilon={AB_EPS}")

Set up a Colab environment with NetworKit and monitoring tools, and we’re locking in a stable random seed. We optimize thread usage to match runtime and define timing and RAM tracking helpers for each major section. We choose a scale preset that controls the size of the graph and the approach nodes so that the pipeline remains large but manageable.

t0 = tic()
G = nk.generators.BarabasiAlbertGenerator(M_ATTACH, N).generate()
toc(t0, "Generated BA graph")
report(G, "G")


t0 = tic()
cc = nk.components.ConnectedComponents(G)
cc.run()
toc(t0, "ConnectedComponents")
print("components:", cc.numberOfComponents())


if cc.numberOfComponents() > 1:
   t0 = tic()
   G = nk.graphtools.extractLargestConnectedComponent(G, compactGraph=True)
   toc(t0, "Extracted LCC (compactGraph=True)")
   report(G, "LCC")


force_cleanup()

We construct a large Barabási–Albert graph and immediately include its size and running time steps. We count the connected parts to understand the separation and quickly identify the topology. We remove the main connected part and assemble it to improve the overall performance and reliability of the pipeline.

t0 = tic()
core = nk.centrality.CoreDecomposition(G)
core.run()
toc(t0, "CoreDecomposition")
core_vals = np.array(core.scores(), dtype=np.int32)
print("degeneracy (max core):", int(core_vals.max()))
print("core stats:", pd.Series(core_vals).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())


k_thr = int(np.percentile(core_vals, 97))


t0 = tic()
nodes_backbone = [u for u in range(G.numberOfNodes()) if core_vals[u] >= k_thr]
G_backbone = nk.graphtools.subgraphFromNodes(G, nodes_backbone)
toc(t0, f"Backbone subgraph (k>={k_thr})")
report(G_backbone, "Backbone")


force_cleanup()


t0 = tic()
pr = nk.centrality.PageRank(G, damp=0.85, tol=1e-8)
pr.run()
toc(t0, "PageRank")


pr_scores = np.array(pr.scores(), dtype=np.float64)
top_pr = np.argsort(-pr_scores)[:15]
print("Top PageRank nodes:", top_pr.tolist())
print("Top PageRank scores:", pr_scores[top_pr].tolist())


t0 = tic()
abw = nk.centrality.ApproxBetweenness(G, epsilon=AB_EPS)
abw.run()
toc(t0, "ApproxBetweenness")


abw_scores = np.array(abw.scores(), dtype=np.float64)
top_abw = np.argsort(-abw_scores)[:15]
print("Top ApproxBetweenness nodes:", top_abw.tolist())
print("Top ApproxBetweenness scores:", abw_scores[top_abw].tolist())


force_cleanup()

We calculate the core decomposition to quantify the degradation and identify the densest network core. We extract the core subgraph using a high core-percentile threshold to focus on structurally important nodes. We use PageRank and balance between ranking nodes by influence and bridge-like behavior on the scale.

t0 = tic()
plm = nk.community.PLM(G, refine=True, gamma=1.0, par="balanced")
plm.run()
toc(t0, "PLM community detection")


part = plm.getPartition()
num_comms = part.numberOfSubsets()
print("communities:", num_comms)


t0 = tic()
Q = nk.community.Modularity().getQuality(part, G)
toc(t0, "Modularity")
print("modularity Q:", Q)


sizes = np.array(list(part.subsetSizeMap().values()), dtype=np.int64)
print("community size stats:", pd.Series(sizes).describe(percentiles=[0.5, 0.9, 0.99]).to_dict())


t0 = tic()
eff = nk.distance.EffectiveDiameter(G, ED_RATIO)
eff.run()
toc(t0, f"EffectiveDiameter (ratio={ED_RATIO})")
print("effective diameter:", eff.getEffectiveDiameter())


t0 = tic()
diam = nk.distance.EstimatedDiameter(G)
diam.run()
toc(t0, "EstimatedDiameter")
print("estimated diameter:", diam.getDiameter().distance)


force_cleanup()

We find communities that use PLM and record the number of communities found on a large graph. We calculate modularity and summarize population size statistics to confirm structure rather than simply rely on classification. We estimate the distance behavior of the earth using the effective diameter and width measured in the secure API of NetworKit 11.2.1.

t0 = tic()
sp = nk.sparsification.LocalSimilaritySparsifier(G, 0.7)
G_sparse = sp.getSparsifiedGraph()
toc(t0, "LocalSimilarity sparsification (alpha=0.7)")
report(G_sparse, "Sparse")


t0 = tic()
pr2 = nk.centrality.PageRank(G_sparse, damp=0.85, tol=1e-8)
pr2.run()
toc(t0, "PageRank on sparse")
pr2_scores = np.array(pr2.scores(), dtype=np.float64)
print("Top PR nodes (sparse):", np.argsort(-pr2_scores)[:15].tolist())


t0 = tic()
plm2 = nk.community.PLM(G_sparse, refine=True, gamma=1.0, par="balanced")
plm2.run()
toc(t0, "PLM on sparse")
part2 = plm2.getPartition()
Q2 = nk.community.Modularity().getQuality(part2, G_sparse)
print("communities (sparse):", part2.numberOfSubsets(), "| modularity (sparse):", Q2)


t0 = tic()
eff2 = nk.distance.EffectiveDiameter(G_sparse, ED_RATIO)
eff2.run()
toc(t0, "EffectiveDiameter on sparse")
print("effective diameter (orig):", eff.getEffectiveDiameter(), "| (sparse):", eff2.getEffectiveDiameter())


force_cleanup()


out_path = "/content/networkit_large_sparse.edgelist"
t0 = tic()
nk.graphio.EdgeListWriter("t", 0).write(G_sparse, out_path)
toc(t0, "Wrote edge list")
print("Saved:", out_path)


print("nAdvanced large-graph pipeline complete.")

We specify the graph using local similarity to reduce the number of edges while maintaining a useful structure for stream analysis. We also use PageRank, PLM, and effective range in the segmented graph to check if the important signals remain constant. We export the partitioned graph as a list so we can reuse it across sessions, tools, or further tests.

In conclusion, we developed an end-to-end NetworKit workflow, which demonstrates the analysis of a real large network: we started from the generation, we stabilized the topology by extracting the LCC, we showed the structure by using cores and centralities, we found communities and confirmed them by modularity, and we captured the behavior of the global distance by using distance measurements. We then use sparsification to reduce the graph while keeping it mathematically reasonable and saving it for iterative pipelines. The tutorial provides a practical template that we can reuse on real datasets for generating an edgelist reader, while maintaining the same analysis stages, performance tracking, and export steps.


Check it out Full Codes here. Also, feel free to follow us Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Check Also
Close
Back to top button