Clustering a DSM Using Simulated Annealing

In this blog post, we will explore how to use simulated annealing to cluster a Design Structure Matrix (DSM). We will also discuss how this approach differs from the implementation developed by Ronnie Thebeau as part of his master’s thesis.

What is a Design Structure Matrix (DSM)?

A Design Structure Matrix (DSM) is a compact, matrix representation of a system or project. It is used to model the relationships between elements in a system, such as tasks in a project or components in a product. Each row and column in the matrix represents an element, and the cells indicate the presence and strength of relationships between elements.

I’ve been using the ideas of a DSM to model a microservice architecture which allows us to help understand and manage complexity, and find an optimal way of organizing teams and services.

Clustering a DSM to Improve Software Architecture

Each of the components in the DSM matrix may be a microservice, or it may be a code module developed by a team. One of the basic assumptions we try to make when designing components and software teams is minimizing the coordination overhead between groups to improve overall efficiency of the group (the so-called Inverse Conway’s maneuver). A DSM can help us design an optimal organization by finding the subsets of the matrix that are mutually exclusive or minimally interacting. These “clusters” are groups of elements that are interconnected among themselves while being little connected to the rest of the system.

In other words, clusters absorb most, if not all, of the interactions (i.e. DSM marks) internally and the interactions or links between separate clusters are eliminated or at least minimized.
— DSM Web: Clustering a DSM

As a simple example, consider a system that includes seven services and the interactions between those services as shown in the DSM below (example adapted from DSM Web). If we were to form several development teams to deliver this, how many teams should we have and what services should they be responsible for?

A system with eight services and various connections.

Clustering the DSM for this system will provide us with insights into optimal team formations based on the degree of interactions among services. If the above DSM was rearranged and systems grouped by responsible team, we could come up with the following arrangement.

Arranging the DSM and clustering with teams.

Team	Services
Team 1	services 1, 5 and 6
Team 2	services 4 and 5
Team 3	services 2, 3, 4 and 7

Services 4 and 5 provide an interesting insight — each of these services belongs to two logical subsystems. These services will naturally have a higher coordination overhead and will likely require the involvement of an Architect and the support of Delivery Management and Engineering Management to execute on effectively. Drawing out these services clustered by the team assignments highlights the criticality of services 4 and 5 to the success of the overall system.

Sample directed graph with team assignments.

Simulated Annealing for Clustering

Clustering a DSM can be viewed as an optimization problem — we want to find the optimal assignment of components to clusters that minimizes communication or coordinate cost. Simulated annealing is a probabilistic technique for approximating the global optimum of a solution space, and it is particularly useful for solving optimization problems with a large search space. The algorithm is inspired by the annealing process in metallurgy, where a material is heated and then slowly cooled to remove defects and improve its structure.

In the context of clustering a DSM, simulated annealing is used to find an optimal grouping of elements that minimizes the coordination cost, where the coordination cost is a measure of the complexity of interactions between elements within and across clusters.

My Implementation

My implementation of the clustering algorithm using simulated annealing is a simplified version of the original Matlab code developed by Ronnie Thebeau. Here is an overview of the key steps in this implementation:

Initialization: We start with an initial cluster matrix where each element is its own cluster. The initial temperature and cooling rate for the simulated annealing algorithm are also set.
Main Loop: The main loop runs for a fixed number of iterations or until the temperature drops below a threshold. In each iteration, we perform the following steps:
- Select an Element: Randomly select an element from the DSM.
- Select a Cluster: Randomly select a cluster to place the element in.
- Calculate Coordination Cost: Calculate the new coordination cost for the DSM if we assume the assignment of the selected element to the selected cluster.
- Accept or Reject the new Assignment: If the new cluster assignment lowers coordination cost, we accept the new assignment, if not, we may still accept the new assignment but with a probability that depends on the current annealing temperature. This helps us break out of potential local maxima — during the beginning of the algorithm we are more likely to jump around the solution space searching for an optimal assignment, but as the temperature cools we settle on a maxima.
- Update Cluster Assignments: Update the cluster assignment if we accept the new assignment.
- Cooling: Gradually reduce the temperature.
Termination: The algorithm terminates when the maximum number of iterations is reached or the temperature is below a threshold. The final cluster matrix represents the optimal grouping of elements.

Differences from Thebeau’s Implementation

While our implementation is inspired by Ronnie Thebeau’s work, there are some key differences:

Simplified Coordinate Cost Calculation: Our implementation uses a simplified coordination cost that depends only on the number of elements in the cluster, whereas the original algorithm used a weighting parameter that penalized higher cost assignments. This change eliminates parameters from the algorithm.
Temperature and Cooling Rate: We use fixed values for the initial temperature and cooling rate, whereas the original implementation includes parameters that adapt during operation.
Cluster Pruning: Our implementation includes a pruning step to remove duplicate or empty clusters, which is not present in the original code.
Code Structure: Our implementation is written in Rust, leveraging the language’s safety and concurrency features, with the hope that this implementation will be more easily extendable with additional algorithms or approaches.

Together, these differences simplify the original implementation by removing weighting and loop parameters.

Example Code

Here is an example of the core of our implementation in Rust. It is considerably simpler and easier to follow than the original code, but it has not yet been optimized for large DSM sizes:

pub fn cluster(
    dsm: &Dsm,
    initial_temperature: f64,
    cooling_rate: f64,
) -> Result<(Dsm, CostHistory)> {
    let rng = &mut rand::thread_rng();

    let mut clustering = Clustering::new(dsm.len());

    // Calculate the initial starting coordination cost of the clustering
    let mut curr_coord_cost = coord_cost(dsm, &clustering);

    // Initialize the best solution to the initial solution
    let mut best_coord_cost = curr_coord_cost;
    let mut best_clustering = clustering.clone();

    let mut cost_history: CostHistory = vec![];
    let mut temperature = initial_temperature;
    while temperature > 1e-3 {
        // Pick a random element from the DSM to put in a new cluster
        let element = rng.gen_range(0..dsm.len());

        // Pick a random new cluster for the chosen element
        let cluster = rng.gen_range(0..clustering.cluster_count());

        let new_clustering = clustering.assign(element, cluster);

        // Calculate the coordination cost of the new cluster assignment
        let new_coord_cost = coord_cost(dsm, &new_clustering);

        // Accept the new cluster assignment if it has a
        // lower coord cost.  If the cost is higher, accept it with a
        // probability determined by the annealing temperature
        // Initially, we have a high likelihood of accepting worse solutions
        let accept = if new_coord_cost <= curr_coord_cost {
            true
        } else {
            let acceptance_probability = ((curr_coord_cost - new_coord_cost) / temperature).exp();
            rng.gen_bool(acceptance_probability)
        };

        if accept {
            // Update the cluster with the accepted values
            curr_coord_cost = new_coord_cost;
            clustering = new_clustering;

            // Record the solution cost
            cost_history.push(curr_coord_cost);

            // If we have a new best, update it
            if curr_coord_cost < best_coord_cost {
                best_coord_cost = curr_coord_cost;
                best_clustering = clustering.clone();
            }
        }

        temperature *= cooling_rate;
    }

    // Delete empty or duplicate clusters
    let pruned_clustering = best_clustering.prune();
    let ordered_dsm = dsm.reorder(pruned_clustering.clone());

    Ok((ordered_dsm, cost_history))
}

Full code for the annealing algorithm is available on Github.

Conclusion

Simulated annealing is a powerful technique for clustering a DSM, allowing us to find an optimal grouping of elements that minimizes coordination cost. Our implementation, while inspired by Ronnie Thebeau’s work, includes several simplifications that remove the need to specify parameters, and it includes adaptations to leverage Rust’s features to improve readability and maintainability.

For more details on the original implementation, you can refer to Ronnie Thebeau’s master’s thesis and the Matlab code.

What is a Design Structure Matrix (DSM)?#

Clustering a DSM to Improve Software Architecture#

Simulated Annealing for Clustering#

My Implementation#

Differences from Thebeau’s Implementation#

Example Code#

Conclusion#