Tutorial 1: Benchmark Generation ================================ This tutorial provides a detailed guide for creating customized benchmark datasets with the ``dyn-benchmark`` package, focusing on configuring all available parameters to control community evolution and network structure. Prerequisites ------------- .. code-block:: python # Install the package if you haven't already # pip install dyn-benchmark # Import required modules from dyn.benchmark.generator.groundtruth_generator import GroundtruthGenerator from dyn.benchmark.generator.communities_generator import CommunitiesGenerator, RelativeOverlap, Match, Overlap from dyn.benchmark.generator.edges_generator import SBM, BPAM, FastBPAM, PAM from dyn.benchmark.generator.nodes_generator import RandomMemberGenerator import networkx as nx import numpy as np import matplotlib.pyplot as plt 1. Community Generation Parameters ---------------------------------- The :class:`CommunitiesGenerator ` is responsible for creating evolving communities and operates on two distinct levels: 1. **Global generation parameters**: These are set as attributes in the constructor and control the overall properties of the generation process. 2. **Evolving Communities shape parameters**: These are controlled by overriding specific methods that define probability distributions for community attributes like size, lifetime, and growth patterns. Global generation parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following parameters can be set when instantiating a :class:`CommunitiesGenerator `: .. code-block:: python # Create a communities generator with fully customized parameters community_generator = CommunitiesGenerator( # Basic parameters community_count=15, # Number of evolving communities to generate snapshot_count=10, # Number of snapshots in the temporal network community_size_min=5, # Minimum size of any static community core_nodes_ratio=0.7, # Ratio of members that stay in their community between snapshots matching_metric_type=RelativeOverlap, # Algorithm for matching communities across snapshots seed=42 # Seed for reproducibility ) Matching metrics help identify when a community at time t corresponds to a community at time t+1. As defined in :cite:t:`Aynaud2013`, matching metrics compare the intersection of nodes between communities to determine their continuity over time. The package implements three matching metrics: .. code-block:: python # Match: min(|C0 ∩ C1| / |C0|, |C0 ∩ C1| / |C1|) # Compares the relative size of the intersection to both community sizes # Used when community sizes vary significantly match_generator = CommunitiesGenerator(matching_metric_type=Match) # RelativeOverlap: |C0 ∩ C1| / |C0 ∪ C1| # Compares the intersection to the union (Jaccard coefficient) # Balanced approach (default) relative_overlap_generator = CommunitiesGenerator(matching_metric_type=RelativeOverlap) # Overlap: |C0 ∩ C1| # Simply uses the raw intersection size # Used when community sizes are stable overlap_generator = CommunitiesGenerator(matching_metric_type=Overlap) These matching metrics have different properties suitable for different community evolution patterns. For example, Match is appropriate when communities can significantly vary in size, as it takes the minimum of the relative intersections. RelativeOverlap (the default) uses the Jaccard coefficient to balance the comparison, while Overlap simply looks at the raw number of shared nodes. Evolving Communities shape parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The generator creates communities based on several probability distributions that you can customize by subclassing :class:`CommunitiesGenerator ` and overriding its methods: .. code-block:: python class CustomCommunitiesGenerator(CommunitiesGenerator): """Custom generator with specific community evolution patterns""" def draw_community_size(self, *args, **kwargs): """Controls the initial size of communities Returns a float (will be rounded to integer)""" return self.rng.normal(loc=50, scale=20) # Normal distribution centered at 50 def draw_community_lifetime(self, *args, **kwargs): """Controls how long communities exist Returns a float (will be rounded to integer)""" # Truncated normal distribution between 3-7 snapshots return np.maximum(3, np.minimum(7, self.rng.normal(loc=5, scale=2))) def draw_community_start(self, *args, **kwargs): """Controls when communities are born Returns a float between 0-1 (scaled to valid snapshot range)""" return self.rng.random() # Uniform distribution def draw_change_ratio(self, *args, **kwargs): """Controls community size changes over time Returns a float (negative = shrink, positive = grow)""" return self.rng.normal(loc=0, scale=0.2) # Normal distribution centered at 0 2. Network Structure Parameters ------------------------------- The package offers several graph generation models to create the underlying network structure at a given snapshot. Stochastic Block Model (SBM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Stochastic Block Model (SBM) is a classic approach for generating networks with community structure. It creates edges between nodes based on their community membership, with higher probability for intra-community connections than inter-community connections. .. code-block:: python # SBM parameters (classic community-based model) sbm_generator = SBM( p_in=0.7, # Probability of edge between nodes in same community p_out=0.05, # Probability of edge between nodes in different communities max_iter=10, # Maximum number of attempts to generate a connected graph seed=42 # Seed for reproducibility ) The SBM was originally introduced by :cite:t:`holland1983stochastic` as a statistical model for social networks with block structures, making it particularly suitable for generating synthetic networks with well-defined community structures. For more details check the :class:`SBM ` class. Preferential Attachment Model (PAM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Preferential Attachment Model (PAM) generates networks with a power-law degree distribution, simulating the "rich get richer" phenomenon observed in many real-world networks. .. code-block:: python # PAM parameters (scale-free networks) pam_generator = PAM( m=5, # Number of edges to add for each new node self_loop=False, # Whether self-loops are allowed seed=42 # Seed for reproducibility ) This implementation is based on the efficient algorithms described by :cite:t:`tonelli2010three` for implementing preferential attachment mechanisms in evolving networks. For more details check the :class:`PAM ` class. Block Preferential Attachment Model (BPAM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Block Preferential Attachment Model (BPAM) combines the community-aware structure of SBM with the scale-free properties of PAM, creating networks with both community structure and realistic degree distributions. .. code-block:: python # BPAM parameters (community-aware preferential attachment) bpam_generator = BPAM( gamma_in=0.8, # Intra-community interaction strength gamma_out=0.1, # Inter-community interaction strength m=5, # Number of edges to add for each new node self_loop=False, # Whether self-loops are allowed seed=42 # Seed for reproducibility ) BPAM was introduced by :cite:t:`tang2020buckley` as an extension of the Buckley-Osthus model, incorporating block structures to create networks with both community organization and power-law degree distributions. For more details check the :class:`BPAM ` class. Fast Block Preferential Attachment Model (FastBPAM) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Fast Block Preferential Attachment Model (FastBPAM) builds upon the original PAM by incorporating features from BPAM and optimizing performance. This model generates networks with both community-aware structures and realistic degree distributions, while being computationally efficient. .. code-block:: python # FastBPAM - optimized version of BPAM with the same parameters fast_bpam = FastBPAM( gamma_in=0.8, gamma_out=0.1, m=5, self_loop=False, seed=42 ) Fore more details check the :class:`FastBPAM ` class. 3. Ensuring Reproducibility --------------------------- To generate identical benchmarks, it's essential to use consistent seed values across all components of the generation process: .. code-block:: python # First generator with seed 42 generator1 = GroundtruthGenerator(seed=42) benchmark1 = generator1.generate() # Second generator with seed 42 generator2 = GroundtruthGenerator(seed=42) benchmark2 = generator2.generate() # Verify the benchmarks are identical same_nodes = all( set(benchmark1.graphs[t].nodes) == set(benchmark2.graphs[t].nodes) for t in benchmark1.graphs.keys() ) print(f"Benchmarks have identical nodes: {same_nodes}") When creating a custom generator with multiple components, ensure that each component receives a consistent seed value: .. code-block:: python # Create a reproducible custom generator custom_generator = GroundtruthGenerator( community_generator=CustomCommunitiesGenerator(seed=42), node_generator=RandomMemberGenerator(seed=42), edge_generator=FastBPAM(seed=42), seed=42 # Master seed for the entire generation process ) The :class:`GroundtruthGenerator ` distributes child seeds to its components, so providing a master seed is usually sufficient, but explicitly setting component seeds ensures maximum control over reproducibility. 4. Exploring the Generated Benchmark ------------------------------------ Once you have generated a benchmark, you can explore its properties through various attributes and methods: .. code-block:: python # Access different components of the benchmark print(f"Generated {len(groundtruth.tcommlist)} member assignments") print(f"Across {len(groundtruth.graphs)} snapshots") print(f"With {len(groundtruth.events)} community events") # Inspect the first snapshot first_snapshot = min(groundtruth.graphs.keys()) graph = groundtruth.graphs[first_snapshot] print(f"First snapshot has {len(graph.nodes)} nodes and {len(graph.edges)} edges") # List community events event_types = set(event.label for event in groundtruth.events) print(f"Event types: {event_types}") For deeper analysis and visualization of your generated benchmark, refer to the following tutorials: - :doc:`Tutorial 3: Comprehensive Metrics Computation and Analysis <3_metrics_computation_analysis>` explains how to compute various metrics and analyze events for evolving communities and temporal networks. - :doc:`Tutorial 4: Visualizing Evolving Communities <4_visualizing_evolving_communities>` provides comprehensive methods for visualizing community evolution, including Sankey diagrams, network snapshots, and animated visualizations. 5. Complete Example ------------------- Here's a complete example that demonstrates how to generate a customized benchmark. This implementation: 1. Creates custom community generators with specific distributions 2. Sets up different network structure generators (SBM, PAM, BPAM, FastBPAM) 3. Ensures reproducibility with consistent seeds 4. Explores the generated benchmark properties .. code-block:: python """ Complete example of benchmark generation using dyn-benchmark """ import numpy as np import networkx as nx import pandas as pd from dyn.benchmark.generator.groundtruth_generator import GroundtruthGenerator from dyn.benchmark.generator.communities_generator import CommunitiesGenerator, RelativeOverlap, Match, Overlap from dyn.benchmark.generator.edges_generator import SBM, BPAM, FastBPAM, PAM from dyn.benchmark.generator.nodes_generator import RandomMemberGenerator from dyn.core.communities import Membership from dyn.drawing.sankey_drawing import plot_sankey # 1. Define a custom communities generator with specific distributions class CustomCommunitiesGenerator(CommunitiesGenerator): """Custom generator with specific community evolution patterns""" def draw_community_size(self, *args, **kwargs): """Controls the initial size of communities""" return self.rng.normal(loc=50, scale=20) # Normal distribution centered at 50 def draw_community_lifetime(self, *args, **kwargs): """Controls how long communities exist""" # Truncated normal distribution between 3-7 snapshots return np.maximum(3, np.minimum(7, self.rng.normal(loc=5, scale=2))) def draw_community_start(self, *args, **kwargs): """Controls when communities are born""" return self.rng.random() # Uniform distribution def draw_change_ratio(self, *args, **kwargs): """Controls community size changes over time""" return self.rng.normal(loc=0, scale=0.2) # Normal distribution centered at 0 # 2. Set up a custom generator with all components community_generator = CustomCommunitiesGenerator( community_count=15, # Number of evolving communities snapshot_count=10, # Number of snapshots in the temporal network community_size_min=5, # Minimum size of any static community core_nodes_ratio=0.7, # Ratio of members that stay in their community matching_metric_type=RelativeOverlap, # Algorithm for matching communities seed=42 # Seed for reproducibility ) # 3. Create network structure generators with Fast Block Preferential Attachment Model (FastBPAM) fast_bpam_generator = FastBPAM( gamma_in=0.8, # Intra-community interaction strength gamma_out=0.1, # Inter-community interaction strength m=5, # Number of edges to add for each new node self_loop=False, # Whether self-loops are allowed seed=42 # Seed for reproducibility ) # 4. Create a node generator node_generator = RandomMemberGenerator(seed=42) # 5. Assemble the final generator with all components custom_generator = GroundtruthGenerator( community_generator=community_generator, node_generator=node_generator, edge_generator=fast_bpam_generator, # Using FastBPAM for this example seed=42 # Master seed for the entire generation process ) # 6. Generate the benchmark print("Generating benchmark...") groundtruth = custom_generator.generate() # 7. Explore the generated benchmark properties print(f"\nGenerated benchmark with:") #print(f"- {len(groundtruth.tcommlist)} member assignments") print(f"- {len(groundtruth.graphs)} snapshots") print(f"- {len(groundtruth.events)} community events") # 8. Inspect the first snapshot first_snapshot = min(groundtruth.graphs.keys()) graph = groundtruth.graphs[first_snapshot] print(f"\nFirst snapshot (t={first_snapshot}):") print(f"- {len(graph.nodes)} nodes") print(f"- {len(graph.edges)} edges") print(f"- Density: {2*len(graph.edges)/(len(graph.nodes)*(len(graph.nodes)-1)):.4f}") # 9. Extract membership information membership = Membership.from_tcommlist(groundtruth.tcommlist) print(f"\nMembership details:") print(f"- {len(membership.members)} unique members") print(f"- {len(membership.evolving_communities)} evolving communities") print(f"- {len(membership.static_communities)} static communities") # 10. List community events event_types = {} for event in groundtruth.events: event_types[event.label] = event_types.get(event.label, 0) + 1 print("\nCommunity events:") for event_type, count in event_types.items(): print(f"- {event_type}: {count} occurrences") # 10. Create a visualization of community evolution using Sankey diagram plot_sankey(membership.community_graph) print("\nBenchmark generation and analysis complete!") This tutorial has demonstrated the complete process of generating customized benchmarks using ``dyn-benchmark``, explaining all parameters and their effects on community evolution and network structure. For more advanced analyses and visualizations, refer to the other tutorials in this documentation. References ---------- .. bibliography:: :filter: False holland1983stochastic tang2020buckley tonelli2010three Aynaud2013