API Reference

Classes

Host

class transmission_models.classes.host.host(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]

Bases: object

Represents a host that has been infected with a virus.

A host object contains information about an infected individual, including their genetic data, infection time, sampling time, and other attributes.

Variables:
  • index (int) – The index of the host.

  • sampled (bool) – Indicates whether the host has been sampled or not.

  • genetic_data (list) – The genetic data of the host.

  • dict_attributes (dict) – A dictionary to store additional attributes.

  • t_inf (int) – Time of infection.

  • t_sample (int, optional) – The time the host was sampled.

  • id (str) – The identifier of the host.

t_inf : property

Getter and setter for the time of infection attribute.

get_genetic_str() : str

Returns the genetic data as a string.

__str__() : str

Returns a string with the id of the host.

__int__() : int

Returns the index of the host.

Examples

>>> h = host('host1', 1, ['A', 'T', 'C', 'G'], 10, t_sample=15)
>>> print(h.t_inf)
10
>>> h.t_inf = 20
>>> print(h.t_inf)
20
>>> print(h.get_genetic_str())
ATCG
>>> print(h)
host1

Notes

This class follows the Python naming convention for class names (using PascalCase).

__init__(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]

Initialize a new instance of the Host class.

Parameters:
  • id (str) – The id of the host.

  • index (int) – The index of the host.

  • genetic_data (list, optional) – The genetic data of the host. Defaults to an empty list.

  • t_inf (int, optional) – Time of infection. Defaults to 0.

  • t_sample (int, optional) – The time the host was sampled. Defaults to None.

property t_inf

Getter for the time of infection attribute.

Returns:

The time of infection.

Return type:

int

get_genetic_str()[source]

Return the genetic data of the host as a string.

Returns:

The genetic data as a string.

Return type:

str

__str__()[source]

Return a string with the id of the host.

Returns:

The id of the host.

Return type:

str

__int__()[source]

Return the index of the host.

Returns:

The index of the host.

Return type:

int

transmission_models.classes.host.create_genome(chain_length)[source]

Create a random genome sequence of specified length.

Parameters:

chain_length (int) – The length of the genome sequence to create.

Returns:

A list of random nucleotides (A, G, C, T) of length chain_length.

Return type:

list

Examples

>>> genome = create_genome(10)
>>> print(genome)
['A', 'T', 'C', 'G', 'A', 'T', 'C', 'G', 'A', 'T']
transmission_models.classes.host.binom_mutation(chain_length, p, genome)[source]

Perform binomial mutation on a given genome.

This function generates changes in a genome by randomly selecting ‘k’ positions to mutate, where ‘k’ follows a binomial distribution with parameters ‘chain_length’ and ‘p’. The elements at the selected positions are replaced with new randomly chosen nucleotides.

Parameters:
  • chain_length (int) – The length of the genome chain.

  • p (float) – The probability of mutation for each element in the chain.

  • genome (str or list) – The original genome sequence.

Returns:

The mutated genome sequence.

Return type:

list

Notes

The function operates as follows:

  1. Calculates the number of positions to mutate, ‘k’, by sampling from a binomial distribution with ‘chain_length’ trials and success probability ‘p’.

  2. Randomly selects ‘k’ positions from the range [0, chain_length) without replacement.

  3. Creates a new list ‘new_genome’ from the original genome.

  4. Iterates over the selected positions and replaces the corresponding elements in ‘new_genome’ with randomly chosen nucleotides based on the original nucleotide at that position:

    • If the original nucleotide is ‘A’, it is replaced with a randomly chosen nucleotide from ‘CTG’.

    • If the original nucleotide is ‘C’, it is replaced with a randomly chosen nucleotide from ‘ATG’.

    • If the original nucleotide is ‘T’, it is replaced with a randomly chosen nucleotide from ‘ACG’.

    • If the original nucleotide is ‘G’, it is replaced with a randomly chosen nucleotide from ‘ACT’.

  5. Returns the mutated genome sequence as ‘new_genome’.

Examples

>>> genome = ['A', 'T', 'C', 'G', 'G', 'A', 'T', 'C', 'G', 'A']
>>> mutated_genome = binom_mutation(len(genome), 0.2, genome)
>>> print(mutated_genome)
['A', 'T', 'C', 'A', 'G', 'A', 'T', 'C', 'G', 'A']

See also

one_mutation

Perform a single mutation on a genome

transmission_models.classes.host.one_mutation(chain_length, p, genome)[source]

Perform one mutation on a given genome.

This function generates a single mutation in a genome by randomly selecting one position to mutate. The selected position is replaced with a new randomly chosen nucleotide.

Parameters:
  • chain_length (int) – The length of the genome chain.

  • p (float) – The probability of mutation for each element in the chain.

  • genome (str or list) – The original genome sequence.

Returns:

The mutated genome sequence.

Return type:

list

Notes

The function operates as follows:

  1. Randomly selects one position from the range [0, chain_length) to mutate.

  2. Creates a new list ‘new_genome’ from the original genome.

  3. Checks the original nucleotide at the selected position and replaces it with a randomly chosen nucleotide based on the following rules:

    • If the original nucleotide is ‘A’, it is replaced with a randomly chosen nucleotide from ‘CTG’.

    • If the original nucleotide is ‘C’, it is replaced with a randomly chosen nucleotide from ‘ATG’.

    • If the original nucleotide is ‘T’, it is replaced with a randomly chosen nucleotide from ‘ACG’.

    • If the original nucleotide is ‘G’, it is replaced with a randomly chosen nucleotide from ‘ACT’.

  4. Returns the mutated genome sequence as ‘new_genome’.

Examples

>>> genome = ['A', 'T', 'C', 'G', 'G', 'A', 'T', 'C', 'G', 'A']
>>> mutated_genome = one_mutation(len(genome), 0.2, genome)
>>> print(mutated_genome)
['A', 'T', 'C', 'A', 'G', 'A', 'T', 'C', 'G', 'T']

See also

binom_mutation

Perform binomial mutation on a genome

transmission_models.classes.host.average_mutations(mu, P_mut, tau, Dt, host_genetic)[source]

Generate a list of mutations proportional to a time interval.

The number of mutations is proportional to a given time interval (Dt) where the proportion factor is the mutation rate (mu).

Parameters:
  • mu (float) – The mutation rate.

  • P_mut (float) – The probability of mutation.

  • tau (float) – The current time.

  • Dt (float) – The time interval.

  • host_genetic (list) – The genetic sequence of the host.

Returns:

A tuple containing:

  • mutationslist

    List of mutated genetic sequences.

  • t_mutationslist

    List of mutation times.

Return type:

tuple

Notes

The function calculates the number of mutations as int(mu * Dt / P_mut) and generates that many mutations using the one_mutation function.

Didelot Unsampled

class transmission_models.classes.didelot_unsampled.didelot_unsampled(sampling_params, offspring_params, infection_params, T=None)[source]

Bases: object

Didelot unsampled transmission model.

This class implements the Didelot et al. (2017) framework for transmission tree inference with unsampled hosts. It provides methods for building transmission networks, computing likelihoods, and performing MCMC sampling.

The model incorporates three main components: 1. Sampling model: Gamma distribution for sampling times 2. Offspring model: Negative binomial distribution for offspring number 3. Infection model: Gamma distribution for infection times

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

Variables:
  • T (networkx.DiGraph) – The transmission tree.

  • host_dict (dict) – Dictionary mapping host IDs to host objects.

  • log_likelihood (float) – Current log likelihood of the model.

  • genetic_prior (genetic_prior_tree, optional) – Prior for genetic data.

  • same_location_prior (same_location_prior_tree, optional) – Prior for location data.

References

Didelot, X., Gardy, J., & Colijn, C. (2017). Bayesian inference of transmission chains using timing of events, contact and genetic data. PLoS computational biology, 13(4), e1005496.

Core Methods

__init__(sampling_params, offspring_params, infection_params, T=None)[source]

Initialize the Didelot unsampled transmission model.

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

  • T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.

Raises:

KeyError – If any required parameter is missing from the input dictionaries.

add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]

Add the root host to the transmission tree.

Parameters:
  • t_sampl (float) – Sampling time of the root host.

  • id (str, optional) – Identifier for the root host. Default is “0”.

  • genetic_data (list, optional) – Genetic data for the root host. Default is empty list.

  • t_inf (float, optional) – Infection time of the root host. Default is 0.

  • t_sample (float, optional) – Sampling time of the root host. Default is None.

Returns:

The root host object.

Return type:

host

successors(host)[source]

Get the successors (children) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose successors are to be returned.

Returns:

An iterator over the successors of the host.

Return type:

iterator

parent(host)[source]

Get the parent (infector) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose parent is to be returned.

Returns:

The parent host object.

Return type:

host

out_degree(host)[source]

Get the out-degree (number of children) of a host in the transmission tree.

Parameters:

host (host) – The host node whose out-degree is to be returned.

Returns:

The out-degree of the host.

Return type:

int

choose_successors(host, k=1)[source]

Choose k unique successors of a given host.

Parameters:
  • host (host) – Host whose successors will be chosen.

  • k (int, optional) – Number of successors to choose. Default is 1.

Returns:

List of k randomly chosen successors of the host.

Return type:

list

Tree Structure Methods

get_root_subtrees()[source]

Retrieve the root subtrees of the transmission tree.

This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.

Returns:

A list of root subtrees.

Return type:

list

get_unsampled_hosts()[source]

Get the list of unsampled hosts in the transmission tree (excluding the root).

Returns:

List of unsampled host nodes.

Return type:

list

get_candidates_to_chain()[source]

Get the list of candidate hosts for chain moves in the transmission tree.

Returns:

List of candidate host nodes for chain moves.

Return type:

list

get_N_candidates_to_chain(recompute=False)[source]

Get the number of candidate hosts for chain moves, optionally recomputing the list.

Parameters:

recompute (bool, optional) – If True, recompute the list of candidates. Default is False.

Returns:

Number of candidate hosts for chain moves.

Return type:

int

Likelihood Methods

get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]

Compute the likelihood of the sampling model.

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

get_infection_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

log_likelihood_host(host, T=None)[source]

Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object

Returns:

log_likelihood – The log likelihood of the host in the transmission network

Return type:

float

log_likelihood_hosts_list(hosts, T)[source]
log_likelihood_transmission_tree(T)[source]
get_log_likelihood_transmission()[source]

Delta Methods (for MCMC)

Delta_log_sampling(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the sampling model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the sampling model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the sampling model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

Delta_log_offspring(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the offspring model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the offspring model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the offspring model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

Delta_log_infection(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the infection model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the infection model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the infection model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for a host.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the host at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

MCMC Step Methods

infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]

Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.

This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.

Parameters:
  • selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.

  • metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.

  • verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.

Returns:

  • t_inf_new (float) – The proposed new infection time.

  • gg (float) – Proposal ratio for the Metropolis-Hastings step.

  • pp (float) – Likelihood ratio for the Metropolis-Hastings step.

  • P (float) – Acceptance probability for the Metropolis-Hastings step.

  • selected_host (host) – The host whose infection time was proposed to change.

infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]

Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.

Parameters:
  • selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.

  • metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.

  • Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.

  • verbose (bool, default=False) – If True, prints the results of the step.

add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]

Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

selected_host: host object

Host to which the unsampled host will be added. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of the time of infection of the new host.

unsampled: host object

Unsampeld host to be added to the transmission tree.

added: bool

If True, the host was added to the transmission tree.

remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]

Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.

Parameters:

selected_host: host object

Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of proposing the time of the selected_host.

added: bool

If True, the host was added to the transmission tree. Else, the node have been removed

add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]

Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

P_add: float

Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

metHast: bool

If True, the Metropolis Hastings algorithm is used to accept or reject the change.

verbose: bool

If True, prints the results of the step.

Returns:

MCMC_step(N_steps, verbose=False)[source]

Prior Methods

add_genetic_prior(mu_gen, gen_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • mu_gen (float) – Mutation rate

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

add_same_location_prior(P_NM, tau, loc_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • log_K (float) – Log probability of two hosts not being in the same location

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

compute_Delta_loc_prior(T_new)[source]

Compute the change in the location prior log-likelihood for a new tree.

Parameters:

T_new (networkx.DiGraph) – The new transmission tree.

Returns:

(Delta log prior, new log prior, old log prior, old correction log-likelihood)

Return type:

tuple

Utility Methods

create_transmision_phylogeny_nets(N, mu, P_mut)[source]

N: Number of hosts mu: Mutation rate P_mut: Prob of mutation

get_newick(lengths=True)[source]
save_json(filename)[source]

Save the transmission tree to a JSON file.

Parameters:

filename (str) – Path to the output JSON file.

show_log_likelihoods(hosts=None, T=None, verbose=False)[source]

Print and return the log-likelihoods for the sampling, offspring, and infection models.

Parameters:
  • hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.

  • T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.

  • verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.

Returns:

(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.

Return type:

tuple

__init__(sampling_params, offspring_params, infection_params, T=None)[source]

Initialize the Didelot unsampled transmission model.

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

  • T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.

Raises:

KeyError – If any required parameter is missing from the input dictionaries.

property T
set_T(T)[source]
samp_t_inf_between(h1, h2)[source]

Sample a time of infection between two hosts.

Uses a rejection sampling method to sample the time of infection of the infected host using the chain model from Didelot et al. 2017.

Parameters:
  • h1 (host) – Infector host.

  • h2 (host) – Infected host.

Returns:

Time of infection of the host infected by h1 and the infector of h2.

Return type:

float

Notes

This method implements the rejection sampling algorithm described in Didelot et al. (2017) for sampling infection times in transmission chains.

add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]

Add the root host to the transmission tree.

Parameters:
  • t_sampl (float) – Sampling time of the root host.

  • id (str, optional) – Identifier for the root host. Default is “0”.

  • genetic_data (list, optional) – Genetic data for the root host. Default is empty list.

  • t_inf (float, optional) – Infection time of the root host. Default is 0.

  • t_sample (float, optional) – Sampling time of the root host. Default is None.

Returns:

The root host object.

Return type:

host

successors(host)[source]

Get the successors (children) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose successors are to be returned.

Returns:

An iterator over the successors of the host.

Return type:

iterator

parent(host)[source]

Get the parent (infector) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose parent is to be returned.

Returns:

The parent host object.

Return type:

host

out_degree(host)[source]

Get the out-degree (number of children) of a host in the transmission tree.

Parameters:

host (host) – The host node whose out-degree is to be returned.

Returns:

The out-degree of the host.

Return type:

int

choose_successors(host, k=1)[source]

Choose k unique successors of a given host.

Parameters:
  • host (host) – Host whose successors will be chosen.

  • k (int, optional) – Number of successors to choose. Default is 1.

Returns:

List of k randomly chosen successors of the host.

Return type:

list

compute_Delta_loc_prior(T_new)[source]

Compute the change in the location prior log-likelihood for a new tree.

Parameters:

T_new (networkx.DiGraph) – The new transmission tree.

Returns:

(Delta log prior, new log prior, old log prior, old correction log-likelihood)

Return type:

tuple

get_candidates_to_chain()[source]

Get the list of candidate hosts for chain moves in the transmission tree.

Returns:

List of candidate host nodes for chain moves.

Return type:

list

get_N_candidates_to_chain(recompute=False)[source]

Get the number of candidate hosts for chain moves, optionally recomputing the list.

Parameters:

recompute (bool, optional) – If True, recompute the list of candidates. Default is False.

Returns:

Number of candidate hosts for chain moves.

Return type:

int

get_root_subtrees()[source]

Retrieve the root subtrees of the transmission tree.

This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.

Returns:

A list of root subtrees.

Return type:

list

get_unsampled_hosts()[source]

Get the list of unsampled hosts in the transmission tree (excluding the root).

Returns:

List of unsampled host nodes.

Return type:

list

get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]

Compute the likelihood of the sampling model.

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

Delta_log_sampling(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the sampling model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the sampling model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the sampling model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

Delta_log_offspring(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the offspring model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the offspring model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the offspring model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

get_infection_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

Delta_log_infection(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the infection model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the infection model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the infection model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

log_likelihood_host(host, T=None)[source]

Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object

Returns:

log_likelihood – The log likelihood of the host in the transmission network

Return type:

float

Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for a host.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the host at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

log_likelihood_hosts_list(hosts, T)[source]
log_likelihood_transmission_tree(T)[source]
log_posterior_transmission_tree()[source]

Compute the log-posterior of the current transmission tree.

This method calculates the log-posterior probability of the current transmission tree by summing the log-likelihood of the tree and any additional prior log-probabilities, such as genetic and location priors, if they are defined.

Returns:

The computed log-posterior of the current transmission tree.

Return type:

float

Notes

The log-posterior is computed as:

log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)

The method uses the following attributes:
  • self.log_likelihood: Log-likelihood of the transmission tree.

  • self.genetic_log_prior: Log-prior from the genetic model (if defined).

  • self.same_location_log_prior: Log-prior from the location model (if defined).

get_log_posterior_transmission_tree(T)[source]

Compute and update the log-posterior of the transmission tree.

This method calculates the log-posterior probability of the given transmission tree T by combining the log-likelihood of the tree with any additional prior log-probabilities, such as genetic and location priors, if they are defined. The computed log-posterior and any relevant prior log-likelihoods are stored as attributes of the object.

Parameters:

T (networkx.DiGraph) – The transmission tree for which to compute the log-posterior.

Returns:

The computed log-posterior of the transmission tree.

Return type:

float

Notes

The log-posterior is computed as:

log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)

The method also updates the following attributes:
  • self.log_posterior

  • self.genetic_log_prior (if applicable)

  • self.same_location_log_prior (if applicable)

show_log_likelihoods(hosts=None, T=None, verbose=False)[source]

Print and return the log-likelihoods for the sampling, offspring, and infection models.

Parameters:
  • hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.

  • T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.

  • verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.

Returns:

(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.

Return type:

tuple

log_likelihood_transmission_tree_old(T)[source]

Compute the log-likelihood of the entire transmission tree using the old method.

Parameters:

T (networkx.DiGraph) – Transmission tree to compute the log-likelihood for.

Returns:

The log-likelihood of the transmission tree.

Return type:

float

get_log_likelihood_transmission()[source]
add_genetic_prior(mu_gen, gen_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • mu_gen (float) – Mutation rate

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

add_same_location_prior(P_NM, tau, loc_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • log_K (float) – Log probability of two hosts not being in the same location

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

create_transmision_phylogeny_nets(N, mu, P_mut)[source]

N: Number of hosts mu: Mutation rate P_mut: Prob of mutation

get_newick(lengths=True)[source]
save_json(filename)[source]

Save the transmission tree to a JSON file.

Parameters:

filename (str) – Path to the output JSON file.

classmethod json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]

Load a transmission model from a JSON file and reconstruct the model object.

Parameters:
  • filename (str) – Path to the JSON file.

  • sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.

  • offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.

  • infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.

Returns:

The reconstructed transmission model.

Return type:

didelot_unsampled

infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]

Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.

This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.

Parameters:
  • selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.

  • metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.

  • verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.

Returns:

  • t_inf_new (float) – The proposed new infection time.

  • gg (float) – Proposal ratio for the Metropolis-Hastings step.

  • pp (float) – Likelihood ratio for the Metropolis-Hastings step.

  • P (float) – Acceptance probability for the Metropolis-Hastings step.

  • selected_host (host) – The host whose infection time was proposed to change.

infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]

Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.

Parameters:
  • selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.

  • metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.

  • Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.

  • verbose (bool, default=False) – If True, prints the results of the step.

add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]

Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

selected_host: host object

Host to which the unsampled host will be added. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of the time of infection of the new host.

unsampled: host object

Unsampeld host to be added to the transmission tree.

added: bool

If True, the host was added to the transmission tree.

remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]

Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.

Parameters:

selected_host: host object

Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of proposing the time of the selected_host.

added: bool

If True, the host was added to the transmission tree. Else, the node have been removed

add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]

Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

P_add: float

Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

metHast: bool

If True, the Metropolis Hastings algorithm is used to accept or reject the change.

verbose: bool

If True, prints the results of the step.

Returns:

MCMC_step(N_steps, verbose=False)[source]

MCMC

class transmission_models.classes.mcmc.mcmc.MCMC(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]

Bases: object

Markov Chain Monte Carlo sampler for transmission tree inference.

This class implements MCMC sampling algorithms for transmission network inference using various proposal mechanisms.

Parameters:
  • model (didelot_unsampled) – The transmission tree model to sample from.

  • P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.

  • P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.

  • P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.

  • P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.

  • P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.

  • P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.

  • P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.

Variables:
  • model (didelot_unsampled) – The transmission model being sampled.

  • P_rewire (float) – Probability of rewiring moves.

  • P_add_remove (float) – Probability of add/remove moves.

  • P_t_shift (float) – Probability of time shift moves.

  • P_add (float) – Probability of adding vs removing hosts.

  • P_rewire_add (float) – Probability of rewiring added hosts.

  • P_offspring_add (float) – Probability of offspring vs chain model for added hosts.

  • P_to_offspring (float) – Probability of moving to offspring model.

__init__(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]

Initialize the MCMC sampler.

Parameters:
  • model (didelot_unsampled) – The transmission tree model to sample from.

  • P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.

  • P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.

  • P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.

  • P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.

  • P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.

  • P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.

  • P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.

MCMC_iteration(verbose=False)[source]

Perform an MCMC iteration on the transmission tree model.

Parameters:

verbose (bool, optional) – Whether to print the progress of the MCMC iteration. Default is False.

Returns:

A tuple containing:

  • movestr

    The type of move proposed (‘rewire’, ‘add_remove’, or ‘time_shift’).

  • ggfloat

    The ratio of proposal probabilities.

  • ppfloat

    The ratio of posterior probabilities.

  • Pfloat

    The acceptance probability.

  • acceptedbool

    Whether the move was accepted.

  • DLfloat

    The difference in log likelihood.

Return type:

tuple

Notes

The function operates as follows:

  1. Selects a move type at random.

  2. Performs the move and computes acceptance probability.

  3. Returns move details and acceptance status.

Priors

class transmission_models.classes.genetic_prior.genetic_prior_tree(model, mu, distance_matrix)[source]

Bases: object

__init__(model, mu, distance_matrix)[source]

Initialize the genetic prior tree object.

Parameters:
  • model (object) – The transmission model containing the tree structure.

  • mu (float) – The mutation rate parameter for the Poisson distribution.

  • distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Notes

This initializes the genetic prior calculator with: - A Poisson distribution with rate mu for modeling genetic distances - A distance matrix for pairwise host comparisons - A reference to the transmission model

static search_firsts_sampled_siblings(host, T, distance_matrix)[source]

Find all sampled siblings of a host in the transmission tree.

Parameters:
  • host (object) – The host for which to find sampled siblings.

  • T (networkx.DiGraph) – The transmission tree.

  • distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Returns:

List of sampled sibling hosts that have genetic distance data.

Return type:

list

Notes

This method recursively searches through the tree to find all sampled hosts that are descendants of the given host and have valid genetic distance data (non-NaN values in the distance matrix).

static search_first_sampled_parent(host, T, root)[source]

Find the first sampled ancestor of a host in the transmission tree.

Parameters:
  • host (object) – The host for which to find the first sampled parent.

  • T (networkx.DiGraph) – The transmission tree.

  • root (object) – The root host of the transmission tree.

Returns:

The first sampled parent host, or None if no sampled parent is found.

Return type:

object or None

Notes

This method traverses up the tree from the given host until it finds the first sampled ancestor, or reaches the root without finding one.

static get_mut_time_dist(hp, hs)[source]

Calculate the mutation time distance between two hosts.

Parameters:
  • hp (object) – The parent host.

  • hs (object) – The sibling host.

Returns:

The mutation time distance: (hs.t_sample + hp.t_sample - 2 * hp.t_inf).

Return type:

float

Notes

This calculates the time available for mutations to accumulate between the sampling times of two hosts, accounting for their common infection time.

get_closest_sampling_siblings(T=None, verbose=False)[source]

Calculate log-likelihood correction for closest sampling siblings.

Parameters:
  • T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.

  • verbose (bool, optional) – If True, print detailed information during calculation.

Returns:

The log-likelihood correction value.

Return type:

float

Notes

This method calculates correction terms for the genetic prior by finding the closest sampled siblings for each host and computing the log-likelihood of their genetic distances based on the time difference between sampling events.

prior_host(host, T, parent_dist=False)[source]

Calculate the log prior for a specific host in the transmission tree.

Parameters:
  • host (object) – The host for which to calculate the log prior.

  • T (networkx.DiGraph) – The transmission tree.

  • parent_dist (bool, optional) – If True, include parent distance in the calculation. Default is False.

Returns:

The log prior value for the host.

Return type:

float

Notes

This method calculates the log prior by considering: 1. Direct connections to sampled hosts 2. Connections to sampled siblings through unsampled intermediate hosts 3. Parent distance (if parent_dist=True)

The calculation uses Poisson distributions based on the mutation rate and time differences between sampling events.

prior_pair(h1, h2)[source]

Calculate the log prior for a pair of hosts.

Parameters:
  • h1 (object) – First host in the pair.

  • h2 (object) – Second host in the pair.

Returns:

The log prior value for the pair, or 0 if either host is not sampled.

Return type:

float

Notes

This method calculates the log prior for the genetic distance between two hosts based on their sampling time difference and the Poisson distribution with rate mu * Dt.

log_prior_host_list(host_list, T=None)[source]

Calculate the total log prior for a list of hosts.

Parameters:
  • host_list (list) – List of hosts for which to calculate the log prior.

  • T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.

Returns:

The sum of log priors for all hosts in the list.

Return type:

float

Notes

This method iterates through the host list and sums the log priors for each individual host using the log_prior_host method.

log_prior_host(host, T=None)[source]

Compute the log prior for a host.

Parameters:
  • host (object) – The host for which to compute the log prior.

  • T (object, optional) – Transmission tree. Default is None.

Returns:

The log prior value for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log prior for the host based on the transmission tree.

  2. Returns the log prior value.

log_prior_T(T, update_up=True, verbose=False)[source]

Calculate the total log prior for an entire transmission tree.

Parameters:
  • T (networkx.DiGraph) – The transmission tree.

  • update_up (bool, optional) – If True, include correction terms for closest sampling siblings. Default is True.

  • verbose (bool, optional) – If True, print detailed information during calculation.

Returns:

The total log prior value for the transmission tree.

Return type:

float

Notes

This method calculates the complete log prior for a transmission tree by: 1. Iterating through all hosts and their connections 2. Computing log-likelihoods for direct connections to sampled hosts 3. Computing log-likelihoods for connections to sampled siblings through unsampled hosts 4. Adding correction terms for closest sampling siblings (if update_up=True)

The calculation uses Poisson distributions based on mutation rates and time differences.

Delta_log_prior(host, T_end, T_ini)[source]

Calculate the difference in log prior between two transmission tree states.

Parameters:
  • host (object) – The host for which to calculate the log prior difference.

  • T_end (networkx.DiGraph) – The final transmission tree state.

  • T_ini (networkx.DiGraph) – The initial transmission tree state.

Returns:

The difference in log prior: log_prior(T_end) - log_prior(T_ini).

Return type:

float

Notes

This method calculates how the log prior changes when a transmission tree transitions from state T_ini to T_end. It considers: 1. Changes in parent relationships 2. Changes in sibling relationships

The calculation is useful for MCMC acceptance ratios where only the difference in log prior is needed, not the absolute values.

transmission_models.classes.genetic_prior.get_roots_data_subtrees(host, T, dist_matrix)[source]

Get all sampled hosts with genetic data in subtrees rooted at a given host.

Parameters:
  • host (object) – The root host of the subtrees to search.

  • T (networkx.DiGraph) – The transmission tree.

  • dist_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Returns:

List of sampled hosts that have valid genetic distance data.

Return type:

list

Notes

This function recursively searches through all subtrees rooted at the given host and collects all sampled hosts that have non-NaN values in the distance matrix (indicating they have genetic sequence data).

class transmission_models.classes.location_prior.location_distance_prior_tree(model, mu, distance_matrix)[source]

Bases: object

__init__(model, mu, distance_matrix)[source]
static search_firsts_sampled_siblings(host, T)[source]
static search_first_sampleed_parent(host, T, root)[source]
static get_mut_time_dist(hp, hs)[source]
get_closest_sampling_siblings(T=None)[source]
prior_host(host, T, parent_dist=False)[source]
log_prior_T(T, update_up=True, verbose=False)[source]
class transmission_models.classes.location_prior.same_location_prior_tree(model, P_NM, tau, distance_matrix)[source]

Bases: object

Class to compute the prior of the location of the hosts in the tree. The prior model computes which is the probability that a hosts stays where it lives in a characteristic time tau. It will stay where it lives with a probability exp(-t*P_NM/tau) where P is the probability that the host no moves in tau.

__init__(model, P_NM, tau, distance_matrix)[source]
static get_roots_data_subtrees(host, T, dist_matrix)[source]
static search_firsts_sampled_siblings(host, T, distance_matrix)[source]
static get_mut_time_dist(hp, hs)[source]
get_closest_sampling_siblings(T=None)[source]
prior_host(host, T, parent_dist=False)[source]
log_prior_T(T, update_up=True, verbose=False)[source]
transmission_models.classes.location_prior.search_first_sampled_parent(host, T, root, distance_matrix)[source]

Functions

Core Functions

Transmission Models Package.

This package provides tools for modeling viral transmission networks using phylogenetic and epidemiological data. It implements the Didelot et al. (2017) framework for transmission tree inference with unsampled hosts by using a MCMC sampling.

Main modules:
  • classes: Core classes including the Didelot unsampled model, host class, and priors

  • utils: Utility functions for tree manipulation and visualization

The package supports:
  • Bayesian inference using MCMC sampling

  • Integration of genetic sequence data

  • Location-based transmission modeling

  • Visualization of transmission networks

References

Didelot, X., Gardy, J., & Colijn, C. (2017). Bayesian inference of transmission chains using timing of events, contact and genetic data. PLoS computational biology, 13(4), e1005496.

class transmission_models.host(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]

Bases: object

Represents a host that has been infected with a virus.

A host object contains information about an infected individual, including their genetic data, infection time, sampling time, and other attributes.

Variables:
  • index (int) – The index of the host.

  • sampled (bool) – Indicates whether the host has been sampled or not.

  • genetic_data (list) – The genetic data of the host.

  • dict_attributes (dict) – A dictionary to store additional attributes.

  • t_inf (int) – Time of infection.

  • t_sample (int, optional) – The time the host was sampled.

  • id (str) – The identifier of the host.

t_inf : property

Getter and setter for the time of infection attribute.

get_genetic_str() : str

Returns the genetic data as a string.

__str__() : str

Returns a string with the id of the host.

__int__() : int

Returns the index of the host.

Examples

>>> h = host('host1', 1, ['A', 'T', 'C', 'G'], 10, t_sample=15)
>>> print(h.t_inf)
10
>>> h.t_inf = 20
>>> print(h.t_inf)
20
>>> print(h.get_genetic_str())
ATCG
>>> print(h)
host1

Notes

This class follows the Python naming convention for class names (using PascalCase).

__init__(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]

Initialize a new instance of the Host class.

Parameters:
  • id (str) – The id of the host.

  • index (int) – The index of the host.

  • genetic_data (list, optional) – The genetic data of the host. Defaults to an empty list.

  • t_inf (int, optional) – Time of infection. Defaults to 0.

  • t_sample (int, optional) – The time the host was sampled. Defaults to None.

property t_inf

Getter for the time of infection attribute.

Returns:

The time of infection.

Return type:

int

get_genetic_str()[source]

Return the genetic data of the host as a string.

Returns:

The genetic data as a string.

Return type:

str

__str__()[source]

Return a string with the id of the host.

Returns:

The id of the host.

Return type:

str

__int__()[source]

Return the index of the host.

Returns:

The index of the host.

Return type:

int

class transmission_models.didelot_unsampled(sampling_params, offspring_params, infection_params, T=None)[source]

Bases: object

Didelot unsampled transmission model.

This class implements the Didelot et al. (2017) framework for transmission tree inference with unsampled hosts. It provides methods for building transmission networks, computing likelihoods, and performing MCMC sampling.

The model incorporates three main components: 1. Sampling model: Gamma distribution for sampling times 2. Offspring model: Negative binomial distribution for offspring number 3. Infection model: Gamma distribution for infection times

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

Variables:
  • T (networkx.DiGraph) – The transmission tree.

  • host_dict (dict) – Dictionary mapping host IDs to host objects.

  • log_likelihood (float) – Current log likelihood of the model.

  • genetic_prior (genetic_prior_tree, optional) – Prior for genetic data.

  • same_location_prior (same_location_prior_tree, optional) – Prior for location data.

References

Didelot, X., Gardy, J., & Colijn, C. (2017). Bayesian inference of transmission chains using timing of events, contact and genetic data. PLoS computational biology, 13(4), e1005496.

__init__(sampling_params, offspring_params, infection_params, T=None)[source]

Initialize the Didelot unsampled transmission model.

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

  • T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.

Raises:

KeyError – If any required parameter is missing from the input dictionaries.

property T
set_T(T)[source]
samp_t_inf_between(h1, h2)[source]

Sample a time of infection between two hosts.

Uses a rejection sampling method to sample the time of infection of the infected host using the chain model from Didelot et al. 2017.

Parameters:
  • h1 (host) – Infector host.

  • h2 (host) – Infected host.

Returns:

Time of infection of the host infected by h1 and the infector of h2.

Return type:

float

Notes

This method implements the rejection sampling algorithm described in Didelot et al. (2017) for sampling infection times in transmission chains.

add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]

Add the root host to the transmission tree.

Parameters:
  • t_sampl (float) – Sampling time of the root host.

  • id (str, optional) – Identifier for the root host. Default is “0”.

  • genetic_data (list, optional) – Genetic data for the root host. Default is empty list.

  • t_inf (float, optional) – Infection time of the root host. Default is 0.

  • t_sample (float, optional) – Sampling time of the root host. Default is None.

Returns:

The root host object.

Return type:

host

successors(host)[source]

Get the successors (children) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose successors are to be returned.

Returns:

An iterator over the successors of the host.

Return type:

iterator

parent(host)[source]

Get the parent (infector) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose parent is to be returned.

Returns:

The parent host object.

Return type:

host

out_degree(host)[source]

Get the out-degree (number of children) of a host in the transmission tree.

Parameters:

host (host) – The host node whose out-degree is to be returned.

Returns:

The out-degree of the host.

Return type:

int

choose_successors(host, k=1)[source]

Choose k unique successors of a given host.

Parameters:
  • host (host) – Host whose successors will be chosen.

  • k (int, optional) – Number of successors to choose. Default is 1.

Returns:

List of k randomly chosen successors of the host.

Return type:

list

compute_Delta_loc_prior(T_new)[source]

Compute the change in the location prior log-likelihood for a new tree.

Parameters:

T_new (networkx.DiGraph) – The new transmission tree.

Returns:

(Delta log prior, new log prior, old log prior, old correction log-likelihood)

Return type:

tuple

get_candidates_to_chain()[source]

Get the list of candidate hosts for chain moves in the transmission tree.

Returns:

List of candidate host nodes for chain moves.

Return type:

list

get_N_candidates_to_chain(recompute=False)[source]

Get the number of candidate hosts for chain moves, optionally recomputing the list.

Parameters:

recompute (bool, optional) – If True, recompute the list of candidates. Default is False.

Returns:

Number of candidate hosts for chain moves.

Return type:

int

get_root_subtrees()[source]

Retrieve the root subtrees of the transmission tree.

This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.

Returns:

A list of root subtrees.

Return type:

list

get_unsampled_hosts()[source]

Get the list of unsampled hosts in the transmission tree (excluding the root).

Returns:

List of unsampled host nodes.

Return type:

list

get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]

Compute the likelihood of the sampling model.

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

Delta_log_sampling(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the sampling model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the sampling model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the sampling model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

Delta_log_offspring(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the offspring model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the offspring model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the offspring model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

get_infection_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

Delta_log_infection(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the infection model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the infection model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the infection model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

log_likelihood_host(host, T=None)[source]

Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object

Returns:

log_likelihood – The log likelihood of the host in the transmission network

Return type:

float

Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for a host.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the host at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

log_likelihood_hosts_list(hosts, T)[source]
log_likelihood_transmission_tree(T)[source]
log_posterior_transmission_tree()[source]

Compute the log-posterior of the current transmission tree.

This method calculates the log-posterior probability of the current transmission tree by summing the log-likelihood of the tree and any additional prior log-probabilities, such as genetic and location priors, if they are defined.

Returns:

The computed log-posterior of the current transmission tree.

Return type:

float

Notes

The log-posterior is computed as:

log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)

The method uses the following attributes:
  • self.log_likelihood: Log-likelihood of the transmission tree.

  • self.genetic_log_prior: Log-prior from the genetic model (if defined).

  • self.same_location_log_prior: Log-prior from the location model (if defined).

get_log_posterior_transmission_tree(T)[source]

Compute and update the log-posterior of the transmission tree.

This method calculates the log-posterior probability of the given transmission tree T by combining the log-likelihood of the tree with any additional prior log-probabilities, such as genetic and location priors, if they are defined. The computed log-posterior and any relevant prior log-likelihoods are stored as attributes of the object.

Parameters:

T (networkx.DiGraph) – The transmission tree for which to compute the log-posterior.

Returns:

The computed log-posterior of the transmission tree.

Return type:

float

Notes

The log-posterior is computed as:

log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)

The method also updates the following attributes:
  • self.log_posterior

  • self.genetic_log_prior (if applicable)

  • self.same_location_log_prior (if applicable)

show_log_likelihoods(hosts=None, T=None, verbose=False)[source]

Print and return the log-likelihoods for the sampling, offspring, and infection models.

Parameters:
  • hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.

  • T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.

  • verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.

Returns:

(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.

Return type:

tuple

log_likelihood_transmission_tree_old(T)[source]

Compute the log-likelihood of the entire transmission tree using the old method.

Parameters:

T (networkx.DiGraph) – Transmission tree to compute the log-likelihood for.

Returns:

The log-likelihood of the transmission tree.

Return type:

float

get_log_likelihood_transmission()[source]
add_genetic_prior(mu_gen, gen_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • mu_gen (float) – Mutation rate

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

add_same_location_prior(P_NM, tau, loc_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • log_K (float) – Log probability of two hosts not being in the same location

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

create_transmision_phylogeny_nets(N, mu, P_mut)[source]

N: Number of hosts mu: Mutation rate P_mut: Prob of mutation

get_newick(lengths=True)[source]
save_json(filename)[source]

Save the transmission tree to a JSON file.

Parameters:

filename (str) – Path to the output JSON file.

classmethod json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]

Load a transmission model from a JSON file and reconstruct the model object.

Parameters:
  • filename (str) – Path to the JSON file.

  • sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.

  • offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.

  • infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.

Returns:

The reconstructed transmission model.

Return type:

didelot_unsampled

infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]

Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.

This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.

Parameters:
  • selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.

  • metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.

  • verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.

Returns:

  • t_inf_new (float) – The proposed new infection time.

  • gg (float) – Proposal ratio for the Metropolis-Hastings step.

  • pp (float) – Likelihood ratio for the Metropolis-Hastings step.

  • P (float) – Acceptance probability for the Metropolis-Hastings step.

  • selected_host (host) – The host whose infection time was proposed to change.

infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]

Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.

Parameters:
  • selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.

  • metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.

  • Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.

  • verbose (bool, default=False) – If True, prints the results of the step.

add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]

Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

selected_host: host object

Host to which the unsampled host will be added. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of the time of infection of the new host.

unsampled: host object

Unsampeld host to be added to the transmission tree.

added: bool

If True, the host was added to the transmission tree.

remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]

Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.

Parameters:

selected_host: host object

Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of proposing the time of the selected_host.

added: bool

If True, the host was added to the transmission tree. Else, the node have been removed

add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]

Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

P_add: float

Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

metHast: bool

If True, the Metropolis Hastings algorithm is used to accept or reject the change.

verbose: bool

If True, prints the results of the step.

Returns:

MCMC_step(N_steps, verbose=False)[source]
class transmission_models.genetic_prior_tree(model, mu, distance_matrix)[source]

Bases: object

__init__(model, mu, distance_matrix)[source]

Initialize the genetic prior tree object.

Parameters:
  • model (object) – The transmission model containing the tree structure.

  • mu (float) – The mutation rate parameter for the Poisson distribution.

  • distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Notes

This initializes the genetic prior calculator with: - A Poisson distribution with rate mu for modeling genetic distances - A distance matrix for pairwise host comparisons - A reference to the transmission model

static search_firsts_sampled_siblings(host, T, distance_matrix)[source]

Find all sampled siblings of a host in the transmission tree.

Parameters:
  • host (object) – The host for which to find sampled siblings.

  • T (networkx.DiGraph) – The transmission tree.

  • distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Returns:

List of sampled sibling hosts that have genetic distance data.

Return type:

list

Notes

This method recursively searches through the tree to find all sampled hosts that are descendants of the given host and have valid genetic distance data (non-NaN values in the distance matrix).

static search_first_sampled_parent(host, T, root)[source]

Find the first sampled ancestor of a host in the transmission tree.

Parameters:
  • host (object) – The host for which to find the first sampled parent.

  • T (networkx.DiGraph) – The transmission tree.

  • root (object) – The root host of the transmission tree.

Returns:

The first sampled parent host, or None if no sampled parent is found.

Return type:

object or None

Notes

This method traverses up the tree from the given host until it finds the first sampled ancestor, or reaches the root without finding one.

static get_mut_time_dist(hp, hs)[source]

Calculate the mutation time distance between two hosts.

Parameters:
  • hp (object) – The parent host.

  • hs (object) – The sibling host.

Returns:

The mutation time distance: (hs.t_sample + hp.t_sample - 2 * hp.t_inf).

Return type:

float

Notes

This calculates the time available for mutations to accumulate between the sampling times of two hosts, accounting for their common infection time.

get_closest_sampling_siblings(T=None, verbose=False)[source]

Calculate log-likelihood correction for closest sampling siblings.

Parameters:
  • T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.

  • verbose (bool, optional) – If True, print detailed information during calculation.

Returns:

The log-likelihood correction value.

Return type:

float

Notes

This method calculates correction terms for the genetic prior by finding the closest sampled siblings for each host and computing the log-likelihood of their genetic distances based on the time difference between sampling events.

prior_host(host, T, parent_dist=False)[source]

Calculate the log prior for a specific host in the transmission tree.

Parameters:
  • host (object) – The host for which to calculate the log prior.

  • T (networkx.DiGraph) – The transmission tree.

  • parent_dist (bool, optional) – If True, include parent distance in the calculation. Default is False.

Returns:

The log prior value for the host.

Return type:

float

Notes

This method calculates the log prior by considering: 1. Direct connections to sampled hosts 2. Connections to sampled siblings through unsampled intermediate hosts 3. Parent distance (if parent_dist=True)

The calculation uses Poisson distributions based on the mutation rate and time differences between sampling events.

prior_pair(h1, h2)[source]

Calculate the log prior for a pair of hosts.

Parameters:
  • h1 (object) – First host in the pair.

  • h2 (object) – Second host in the pair.

Returns:

The log prior value for the pair, or 0 if either host is not sampled.

Return type:

float

Notes

This method calculates the log prior for the genetic distance between two hosts based on their sampling time difference and the Poisson distribution with rate mu * Dt.

log_prior_host_list(host_list, T=None)[source]

Calculate the total log prior for a list of hosts.

Parameters:
  • host_list (list) – List of hosts for which to calculate the log prior.

  • T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.

Returns:

The sum of log priors for all hosts in the list.

Return type:

float

Notes

This method iterates through the host list and sums the log priors for each individual host using the log_prior_host method.

log_prior_host(host, T=None)[source]

Compute the log prior for a host.

Parameters:
  • host (object) – The host for which to compute the log prior.

  • T (object, optional) – Transmission tree. Default is None.

Returns:

The log prior value for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log prior for the host based on the transmission tree.

  2. Returns the log prior value.

log_prior_T(T, update_up=True, verbose=False)[source]

Calculate the total log prior for an entire transmission tree.

Parameters:
  • T (networkx.DiGraph) – The transmission tree.

  • update_up (bool, optional) – If True, include correction terms for closest sampling siblings. Default is True.

  • verbose (bool, optional) – If True, print detailed information during calculation.

Returns:

The total log prior value for the transmission tree.

Return type:

float

Notes

This method calculates the complete log prior for a transmission tree by: 1. Iterating through all hosts and their connections 2. Computing log-likelihoods for direct connections to sampled hosts 3. Computing log-likelihoods for connections to sampled siblings through unsampled hosts 4. Adding correction terms for closest sampling siblings (if update_up=True)

The calculation uses Poisson distributions based on mutation rates and time differences.

Delta_log_prior(host, T_end, T_ini)[source]

Calculate the difference in log prior between two transmission tree states.

Parameters:
  • host (object) – The host for which to calculate the log prior difference.

  • T_end (networkx.DiGraph) – The final transmission tree state.

  • T_ini (networkx.DiGraph) – The initial transmission tree state.

Returns:

The difference in log prior: log_prior(T_end) - log_prior(T_ini).

Return type:

float

Notes

This method calculates how the log prior changes when a transmission tree transitions from state T_ini to T_end. It considers: 1. Changes in parent relationships 2. Changes in sibling relationships

The calculation is useful for MCMC acceptance ratios where only the difference in log prior is needed, not the absolute values.

class transmission_models.location_distance_prior_tree(model, mu, distance_matrix)[source]

Bases: object

__init__(model, mu, distance_matrix)[source]
static search_firsts_sampled_siblings(host, T)[source]
static search_first_sampleed_parent(host, T, root)[source]
static get_mut_time_dist(hp, hs)[source]
get_closest_sampling_siblings(T=None)[source]
prior_host(host, T, parent_dist=False)[source]
log_prior_T(T, update_up=True, verbose=False)[source]
class transmission_models.same_location_prior_tree(model, P_NM, tau, distance_matrix)[source]

Bases: object

Class to compute the prior of the location of the hosts in the tree. The prior model computes which is the probability that a hosts stays where it lives in a characteristic time tau. It will stay where it lives with a probability exp(-t*P_NM/tau) where P is the probability that the host no moves in tau.

__init__(model, P_NM, tau, distance_matrix)[source]
static get_roots_data_subtrees(host, T, dist_matrix)[source]
static search_firsts_sampled_siblings(host, T, distance_matrix)[source]
static get_mut_time_dist(hp, hs)[source]
get_closest_sampling_siblings(T=None)[source]
prior_host(host, T, parent_dist=False)[source]
log_prior_T(T, update_up=True, verbose=False)[source]
class transmission_models.MCMC(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]

Bases: object

Markov Chain Monte Carlo sampler for transmission tree inference.

This class implements MCMC sampling algorithms for transmission network inference using various proposal mechanisms.

Parameters:
  • model (didelot_unsampled) – The transmission tree model to sample from.

  • P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.

  • P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.

  • P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.

  • P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.

  • P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.

  • P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.

  • P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.

Variables:
  • model (didelot_unsampled) – The transmission model being sampled.

  • P_rewire (float) – Probability of rewiring moves.

  • P_add_remove (float) – Probability of add/remove moves.

  • P_t_shift (float) – Probability of time shift moves.

  • P_add (float) – Probability of adding vs removing hosts.

  • P_rewire_add (float) – Probability of rewiring added hosts.

  • P_offspring_add (float) – Probability of offspring vs chain model for added hosts.

  • P_to_offspring (float) – Probability of moving to offspring model.

__init__(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]

Initialize the MCMC sampler.

Parameters:
  • model (didelot_unsampled) – The transmission tree model to sample from.

  • P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.

  • P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.

  • P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.

  • P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.

  • P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.

  • P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.

  • P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.

MCMC_iteration(verbose=False)[source]

Perform an MCMC iteration on the transmission tree model.

Parameters:

verbose (bool, optional) – Whether to print the progress of the MCMC iteration. Default is False.

Returns:

A tuple containing:

  • movestr

    The type of move proposed (‘rewire’, ‘add_remove’, or ‘time_shift’).

  • ggfloat

    The ratio of proposal probabilities.

  • ppfloat

    The ratio of posterior probabilities.

  • Pfloat

    The acceptance probability.

  • acceptedbool

    Whether the move was accepted.

  • DLfloat

    The difference in log likelihood.

Return type:

tuple

Notes

The function operates as follows:

  1. Selects a move type at random.

  2. Performs the move and computes acceptance probability.

  3. Returns move details and acceptance status.

Utility Functions

Utilities Module.

This module contains utility functions for tree manipulation, visualization, and data conversion in transmission network analysis.

Main Functions

tree_to_newick : Convert transmission tree to Newick format search_firsts_sampled_siblings : Find first sampled siblings in tree search_first_sampled_parent : Find first sampled parent in tree plot_transmision_network : Visualize transmission network tree_to_json : Convert tree to JSON format json_to_tree : Convert JSON to tree format tree_slicing_step : Tree topology manipulation functions

Visualization

hierarchy_pos : Generate hierarchical layout positions hierarchy_pos_times : Generate time-based hierarchical layout plot_transmision_network : Plot transmission network with various options

Data Conversion

tree_to_newick : Convert to Newick format for phylogenetic software tree_to_json : Convert to JSON for data storage json_to_tree : Convert from JSON back to tree structure

transmission_models.utils.tree_to_newick(g, lengths=True, root=None)[source]

Convert a transmission tree to Newick format for phylogenetic software.

Parameters:
  • g (networkx.DiGraph) – The transmission tree as a directed graph.

  • lengths (bool, optional) – Whether to include branch lengths in the Newick string. Default is True.

  • root (node, optional) – The root node of the tree. If None, the root will be inferred.

Returns:

The Newick string representation of the tree.

Return type:

str

transmission_models.utils.pdf_in_between(model, Dt, t)[source]

Compute the probability density function (PDF) value for the infection time between two events using a beta distribution parameterized by the model’s infection parameters.

Parameters:
  • model (object) – The model object containing infection parameters (expects attributes k_inf).

  • Dt (float) – The scale parameter (duration between events).

  • t (float) – The time at which to evaluate the PDF.

Returns:

The PDF value at time t.

Return type:

float

transmission_models.utils.sample_in_between(model, Dt)[source]

Sample a random infection time between two events using a beta distribution parameterized by the model’s infection parameters.

Parameters:
  • model (object) – The model object containing infection parameters (expects attributes k_inf).

  • Dt (float) – The scale parameter (duration between events).

Returns:

A random sample from the beta distribution.

Return type:

float

transmission_models.utils.random_combination(iterable, r=1)[source]

Randomly select a combination of r elements from the given iterable.

Parameters:
  • iterable (iterable) – The input iterable to select elements from.

  • r (int, optional) – The number of elements to select. Default is 1.

Returns:

A tuple containing r randomly selected elements from the iterable.

Return type:

tuple

Notes

This function is equivalent to a random selection from itertools.combinations(iterable, r).

transmission_models.utils.search_firsts_sampled_siblings(host, T)[source]

Search the firsts sampled siblings of a host in the transmission tree.

Parameters:
  • host (Host) – The host to search the siblings from.

  • T (nx.DiGraph) – The transmission tree where the host belongs to.

Returns:

The list of the firsts sampled siblings of the host.

Return type:

list

transmission_models.utils.search_first_sampled_parent(host, T, root)[source]

Search the first sampled parent of a host in the transmission tree.

Parameters:
  • host (Host) – The host to search the parent from. If the host is the root of the tree, it returns None.

  • T (nx.DiGraph) – The transmission tree where the host belongs to.

  • root (Host) – The root of the transmission tree.

Returns:

The first sampled parent of the host, or None if host is the root.

Return type:

Host or None

transmission_models.utils.Delta_log_gamma(Dt_ini, Dt_end, k, theta)[source]

Compute the log likelihood of the gamma distribution for the time between two events.

Parameters:
  • Dt_ini (float) – Initial time.

  • Dt_end (float) – End time.

  • k (float) – Shape parameter of the gamma distribution.

  • theta (float) – Scale parameter of the gamma distribution.

Returns:

Difference of the log likelihood of the gamma distribution.

Return type:

float

transmission_models.utils.hierarchy_pos(G, root=None, width=1.0, vert_gap=0.2, vert_loc=0, xcenter=0.5)[source]

Compute hierarchical layout positions for a tree graph.

Parameters:
  • G (networkx.Graph) – The graph (must be a tree).

  • root (node, optional) – The root node of the current branch. If None, the root will be found automatically.

  • width (float, optional) – Horizontal space allocated for this branch. Default is 1.0.

  • vert_gap (float, optional) – Gap between levels of hierarchy. Default is 0.2.

  • vert_loc (float, optional) – Vertical location of root. Default is 0.

  • xcenter (float, optional) – Horizontal location of root. Default is 0.5.

Returns:

A dictionary of positions keyed by node.

Return type:

dict

Notes

This function is adapted from Joel’s answer at https://stackoverflow.com/a/29597209/2966723. Licensed under Creative Commons Attribution-Share Alike.

transmission_models.utils.hierarchy_pos_times(G, root=None, width=1.0, vert_gap=0.2, vert_loc=0, xcenter=0.5)[source]

Compute hierarchical layout positions for a tree graph, using time as vertical position.

Parameters:
  • G (networkx.Graph) – The graph (must be a tree).

  • root (node, optional) – The root node of the current branch. If None, the root will be found automatically.

  • width (float, optional) – Horizontal space allocated for this branch. Default is 1.0.

  • vert_gap (float, optional) – Gap between levels of hierarchy. Default is 0.2.

  • vert_loc (float, optional) – Vertical location of root. Default is 0.

  • xcenter (float, optional) – Horizontal location of root. Default is 0.5.

Returns:

A dictionary of positions keyed by node.

Return type:

dict

Notes

This function is adapted from Joel’s answer at https://stackoverflow.com/a/29597209/2966723. Licensed under Creative Commons Attribution-Share Alike.

transmission_models.utils.plot_transmision_network(T, nodes_labels=False, pos=None, highlighted_nodes=None, ax=None, to_frame=False, title=None, filename=None, show=True)[source]

Visualize a transmission network using matplotlib and networkx.

Parameters:
  • T (networkx.DiGraph) – The transmission network to plot.

  • nodes_labels (bool, optional) – Whether to display node labels. Default is False.

  • pos (dict, optional) – Node positions for layout. If None, uses graphviz_layout. Default is None.

  • highlighted_nodes (list, optional) – List of nodes to highlight. Default is None.

  • ax (matplotlib.axes.Axes, optional) – The axes to plot on. If None, uses current axes. Default is None.

  • to_frame (bool, optional) – If True, saves the plot to a temporary image and returns it as an array. Default is False.

  • title (str, optional) – Title for the plot. Default is None.

  • filename (str, optional) – If provided, saves the plot to this file. Default is None.

  • show (bool, optional) – Whether to display the plot. Default is True.

Returns:

image – The image array if to_frame is True, otherwise None.

Return type:

ndarray or None

transmission_models.utils.tree_to_dict(model, h)[source]

Convert a host and its descendants to a nested dictionary suitable for JSON export.

Parameters:
  • model (object) – The transmission model containing the tree structure.

  • h (host) – The host node to convert.

Returns:

A nested dictionary representing the host and its descendants.

Return type:

dict

transmission_models.utils.cast_types(value, types_map)[source]

Recursively cast types in a nested data structure.

This function recursively traverses a nested data structure (dict, list) and casts any values that match the types in types_map to their target types. Useful for fixing JSON serialization issues with numpy types.

Parameters:
  • value (any) – The value to cast. Can be a dict, list, or any other type.

  • types_map (list of tuples) – List of (from_type, to_type) tuples specifying type conversions.

Returns:

The value with types cast according to types_map.

Return type:

any

Examples

>>> import numpy as np
>>> import json
>>> data = [np.int64(123)]
>>> data = cast_types(data, [(np.int64, int), (np.float64, float)])
>>> data_json = json.dumps(data)
>>> data_json == "[123]"
True

Notes

This function is useful for fixing “TypeError: Object of type int64 is not JSON serializable” errors when working with numpy arrays and JSON.

transmission_models.utils.tree_to_json(model, filename)[source]

Save a transmission model and its tree to a JSON file.

Parameters:
  • model (object) – The transmission model to export.

  • filename (str) – The path to the output JSON file.

transmission_models.utils.get_host_from_dict(dict_tree)[source]

Create a host object from a dictionary representation (as used in JSON trees).

Parameters:

dict_tree (dict) – The dictionary representing a host (from JSON).

Returns:

The reconstructed host object.

Return type:

host

transmission_models.utils.read_tree_dict(dict_tree, h1=None, edge_list=[])[source]

Recursively read a tree dictionary and extract edges as (parent, child) tuples.

Parameters:
  • dict_tree (dict) – The dictionary representing the tree (from JSON).

  • h1 (host, optional) – The parent host node. If None, will be created from dict_tree.

  • edge_list (list, optional) – The list to append edges to. Default is an empty list.

Returns:

A list of (parent, child) edge tuples.

Return type:

list

transmission_models.utils.json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]

Load a transmission model from a JSON file and reconstruct the model object.

Parameters:
  • filename (str) – Path to the JSON file.

  • sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.

  • offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.

  • infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.

Returns:

The reconstructed transmission model.

Return type:

didelot_unsampled

transmission_models.utils.build_infection_based_network(model, host_list)[source]

Generate a transmission tree network given a list of sampled hosts.

This function creates a transmission tree from the dataset. It uses the model’s sampling and infection parameters to construct a plausible initial transmission network.

For each host, we get a number of infected hosts and then we toss a coin to each host to see if they are connected given the infection time.

At the end, we add a virtual root host to connect all disconnected components.

Parameters:
  • model (didelot_unsampled) – The transmission model with sampling and infection parameters.

  • host_list (list) – List of host objects representing the sampled data.

Returns:

model

Return type:

The updated model with the generated transmission tree

Notes

This function implements the algorithm described in the notebook for generating initial transmission networks. It creates a directed graph representing the transmission tree and adds a virtual root host to connect all disconnected components.

transmission_models.utils.random() x in the interval [0, 1).
transmission_models.utils.topology_movements.tree_slicing_to_offspring(model, selected_host=None, forced=False, verbose=False)[source]

Slices a node reconnecting it with its grandparent. It passes from a chain model to a offspring model for the selected_host, its parent and grandparent.

Parameters:

model: transmission_models.models.didelot_unsampled.didelotUnsampled

model with the transmission network to apply the transformation

selected_host: host object. Default None

host to be sliced. If None, it is randomly selected

forced: bool. Default False

If True, the movement is forced because the other is not possible

verbose: bool. Default False

If True, it prints information about the movement

Returns:

T_new: nx.DiGraph

New transmission network with the moves applied

gg: float

Ratio of proposals

selected_host: host object

Host sliced

parent: host object

Parent of the selected_host

grandparent: host object

Grandparent of the selected_host

to_chain: bool

If True, the proposal was to reconnect selected_host to be connected with one of its brother. Else, it was reconnected to its grandparent

transmission_models.utils.topology_movements.tree_slicing_to_chain(model, selected_host=None, selected_sibling=None, forced=False, verbose=False)[source]

Slices a node reconnecting it with one of its sibling. It passes from a offspring model to a chain model for the selected_host, its parent and the choosen sibling.

Parameters:

modeltransmission_models.models.didelot_unsampled.didelot_unsampled

model with the transmission network to apply the transformation

selected_host: host object. Default None

host to be sliced. If None, it is randomly selected

selected_sibling: host object. Default None

sibling to connect the selected_host. If None, it is randomly selected

forced: bool. Default False

If True, the movement is forced because the other is not possible

verbose: bool. Default False

If True, it prints information about the movement

Returns:

T_new: nx.DiGraph

New transmission network with the moves applied

gg: float

Ratio of proposals

selected_host: host object

Host sliced

parent: host object

Parent of the selected_host

selected_sibling: host object

Sibling now connected to the selected_host

to_chain: bool

If True, the proposal was to reconnect selected_host to be connected with one of its brother. Else, it was reconnected to its grandparent

transmission_models.utils.topology_movements.tree_slicing_step(model, P_to_offspring=0.5, verbose=False)[source]

Performs a tree slicing step in the transmission tree. Can be either to parent or sibling with equal probability.

Parameters:
  • model (transmission_models.models.didelot_unsampled.didelotUnsampled) – Transmission model

  • verbose (bool. Default False) – If True, it prints information about the proposal

Prior Functions

transmission_models.classes.partial_sampled_utils.check_attribute_sampling(host, attribute)[source]

Check if a host has a sampled attribute.

Parameters:
  • host (object) – The host to check.

  • attribute (str) – The attribute to check for sampling.

Returns:

True if the attribute is sampled, False otherwise.

Return type:

bool

transmission_models.classes.partial_sampled_utils.search_partial_sampled_siblings(host, T)[source]

Search for partially sampled siblings of a host in the transmission tree.

Parameters:
  • host (object) – The host to search siblings for.

  • T (object) – The transmission tree.

Returns:

List of partially sampled siblings.

Return type:

list

transmission_models.classes.partial_sampled_utils.search_partial_sampled_parent(host, T, root)[source]

Search for the partially sampled parent of a host in the transmission tree.

Parameters:
  • host (object) – The host to search the parent for.

  • T (object) – The transmission tree.

  • root (object) – The root of the transmission tree.

Returns:

The partially sampled parent, or None if not found.

Return type:

object or None

Module Documentation

Classes Module

Classes Module.

This module contains all the main classes for the transmission_models package.

Main Classes

host : Host class representing infected individuals didelot_unsampled : Main class implementing the Didelot et al. (2017) framework genetic_prior_tree : Prior distribution for genetic sequence data location_distance_prior_tree : Prior distribution for location distance data same_location_prior_tree : Prior distribution for same location probability MCMC : Markov Chain Monte Carlo sampling algorithms

Submodules

mcmc : MCMC sampling classes and algorithms

class transmission_models.classes.host(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]

Bases: object

Represents a host that has been infected with a virus.

A host object contains information about an infected individual, including their genetic data, infection time, sampling time, and other attributes.

Variables:
  • index (int) – The index of the host.

  • sampled (bool) – Indicates whether the host has been sampled or not.

  • genetic_data (list) – The genetic data of the host.

  • dict_attributes (dict) – A dictionary to store additional attributes.

  • t_inf (int) – Time of infection.

  • t_sample (int, optional) – The time the host was sampled.

  • id (str) – The identifier of the host.

t_inf : property

Getter and setter for the time of infection attribute.

get_genetic_str() : str

Returns the genetic data as a string.

__str__() : str

Returns a string with the id of the host.

__int__() : int

Returns the index of the host.

Examples

>>> h = host('host1', 1, ['A', 'T', 'C', 'G'], 10, t_sample=15)
>>> print(h.t_inf)
10
>>> h.t_inf = 20
>>> print(h.t_inf)
20
>>> print(h.get_genetic_str())
ATCG
>>> print(h)
host1

Notes

This class follows the Python naming convention for class names (using PascalCase).

__init__(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]

Initialize a new instance of the Host class.

Parameters:
  • id (str) – The id of the host.

  • index (int) – The index of the host.

  • genetic_data (list, optional) – The genetic data of the host. Defaults to an empty list.

  • t_inf (int, optional) – Time of infection. Defaults to 0.

  • t_sample (int, optional) – The time the host was sampled. Defaults to None.

property t_inf

Getter for the time of infection attribute.

Returns:

The time of infection.

Return type:

int

get_genetic_str()[source]

Return the genetic data of the host as a string.

Returns:

The genetic data as a string.

Return type:

str

__str__()[source]

Return a string with the id of the host.

Returns:

The id of the host.

Return type:

str

__int__()[source]

Return the index of the host.

Returns:

The index of the host.

Return type:

int

class transmission_models.classes.didelot_unsampled(sampling_params, offspring_params, infection_params, T=None)[source]

Bases: object

Didelot unsampled transmission model.

This class implements the Didelot et al. (2017) framework for transmission tree inference with unsampled hosts. It provides methods for building transmission networks, computing likelihoods, and performing MCMC sampling.

The model incorporates three main components: 1. Sampling model: Gamma distribution for sampling times 2. Offspring model: Negative binomial distribution for offspring number 3. Infection model: Gamma distribution for infection times

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

Variables:
  • T (networkx.DiGraph) – The transmission tree.

  • host_dict (dict) – Dictionary mapping host IDs to host objects.

  • log_likelihood (float) – Current log likelihood of the model.

  • genetic_prior (genetic_prior_tree, optional) – Prior for genetic data.

  • same_location_prior (same_location_prior_tree, optional) – Prior for location data.

References

Didelot, X., Gardy, J., & Colijn, C. (2017). Bayesian inference of transmission chains using timing of events, contact and genetic data. PLoS computational biology, 13(4), e1005496.

__init__(sampling_params, offspring_params, infection_params, T=None)[source]

Initialize the Didelot unsampled transmission model.

Parameters:
  • sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution

  • offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection

  • infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution

  • T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.

Raises:

KeyError – If any required parameter is missing from the input dictionaries.

property T
set_T(T)[source]
samp_t_inf_between(h1, h2)[source]

Sample a time of infection between two hosts.

Uses a rejection sampling method to sample the time of infection of the infected host using the chain model from Didelot et al. 2017.

Parameters:
  • h1 (host) – Infector host.

  • h2 (host) – Infected host.

Returns:

Time of infection of the host infected by h1 and the infector of h2.

Return type:

float

Notes

This method implements the rejection sampling algorithm described in Didelot et al. (2017) for sampling infection times in transmission chains.

add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]

Add the root host to the transmission tree.

Parameters:
  • t_sampl (float) – Sampling time of the root host.

  • id (str, optional) – Identifier for the root host. Default is “0”.

  • genetic_data (list, optional) – Genetic data for the root host. Default is empty list.

  • t_inf (float, optional) – Infection time of the root host. Default is 0.

  • t_sample (float, optional) – Sampling time of the root host. Default is None.

Returns:

The root host object.

Return type:

host

successors(host)[source]

Get the successors (children) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose successors are to be returned.

Returns:

An iterator over the successors of the host.

Return type:

iterator

parent(host)[source]

Get the parent (infector) of a given host in the transmission tree.

Parameters:

host (host) – The host node whose parent is to be returned.

Returns:

The parent host object.

Return type:

host

out_degree(host)[source]

Get the out-degree (number of children) of a host in the transmission tree.

Parameters:

host (host) – The host node whose out-degree is to be returned.

Returns:

The out-degree of the host.

Return type:

int

choose_successors(host, k=1)[source]

Choose k unique successors of a given host.

Parameters:
  • host (host) – Host whose successors will be chosen.

  • k (int, optional) – Number of successors to choose. Default is 1.

Returns:

List of k randomly chosen successors of the host.

Return type:

list

compute_Delta_loc_prior(T_new)[source]

Compute the change in the location prior log-likelihood for a new tree.

Parameters:

T_new (networkx.DiGraph) – The new transmission tree.

Returns:

(Delta log prior, new log prior, old log prior, old correction log-likelihood)

Return type:

tuple

get_candidates_to_chain()[source]

Get the list of candidate hosts for chain moves in the transmission tree.

Returns:

List of candidate host nodes for chain moves.

Return type:

list

get_N_candidates_to_chain(recompute=False)[source]

Get the number of candidate hosts for chain moves, optionally recomputing the list.

Parameters:

recompute (bool, optional) – If True, recompute the list of candidates. Default is False.

Returns:

Number of candidate hosts for chain moves.

Return type:

int

get_root_subtrees()[source]

Retrieve the root subtrees of the transmission tree.

This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.

Returns:

A list of root subtrees.

Return type:

list

get_unsampled_hosts()[source]

Get the list of unsampled hosts in the transmission tree (excluding the root).

Returns:

List of unsampled host nodes.

Return type:

list

get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]

Compute the likelihood of the sampling model.

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the sampling model given the list of hosts

Return type:

float

Delta_log_sampling(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the sampling model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the sampling model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the sampling model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:

hosts (list of host objects) –

Returns:

L – The likelihood of the offspring model given the list of hosts

Return type:

float

Delta_log_offspring(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the offspring model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the offspring model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the offspring model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

get_infection_model_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]

Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.

Parameters:
  • hosts (list of host objects) –

  • T (DiGraph object) – Contagious tree which likelihood of the hosts will be computed. If it is None, the network of the model is used.

  • update (bool) – If True, the likelihood of the infection model is updated in the model object.

Returns:

L – The likelihood of the infection model given the list of hosts

Return type:

float

Delta_log_infection(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for the infection model.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the infection model.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the infection model at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

log_likelihood_host(host, T=None)[source]

Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object

Returns:

log_likelihood – The log likelihood of the host in the transmission network

Return type:

float

Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]

Compute the change in log-likelihood for a host.

Parameters:
  • hosts (list) – List of host objects.

  • T_end (float) – End time.

  • T_ini (float, optional) – Initial time. Default is None.

Returns:

Change in log-likelihood for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log-likelihood for the host at T_end.

  2. If T_ini is provided, subtracts the log-likelihood at T_ini.

  3. Returns the difference.

log_likelihood_hosts_list(hosts, T)[source]
log_likelihood_transmission_tree(T)[source]
log_posterior_transmission_tree()[source]

Compute the log-posterior of the current transmission tree.

This method calculates the log-posterior probability of the current transmission tree by summing the log-likelihood of the tree and any additional prior log-probabilities, such as genetic and location priors, if they are defined.

Returns:

The computed log-posterior of the current transmission tree.

Return type:

float

Notes

The log-posterior is computed as:

log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)

The method uses the following attributes:
  • self.log_likelihood: Log-likelihood of the transmission tree.

  • self.genetic_log_prior: Log-prior from the genetic model (if defined).

  • self.same_location_log_prior: Log-prior from the location model (if defined).

get_log_posterior_transmission_tree(T)[source]

Compute and update the log-posterior of the transmission tree.

This method calculates the log-posterior probability of the given transmission tree T by combining the log-likelihood of the tree with any additional prior log-probabilities, such as genetic and location priors, if they are defined. The computed log-posterior and any relevant prior log-likelihoods are stored as attributes of the object.

Parameters:

T (networkx.DiGraph) – The transmission tree for which to compute the log-posterior.

Returns:

The computed log-posterior of the transmission tree.

Return type:

float

Notes

The log-posterior is computed as:

log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)

The method also updates the following attributes:
  • self.log_posterior

  • self.genetic_log_prior (if applicable)

  • self.same_location_log_prior (if applicable)

show_log_likelihoods(hosts=None, T=None, verbose=False)[source]

Print and return the log-likelihoods for the sampling, offspring, and infection models.

Parameters:
  • hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.

  • T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.

  • verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.

Returns:

(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.

Return type:

tuple

log_likelihood_transmission_tree_old(T)[source]

Compute the log-likelihood of the entire transmission tree using the old method.

Parameters:

T (networkx.DiGraph) – Transmission tree to compute the log-likelihood for.

Returns:

The log-likelihood of the transmission tree.

Return type:

float

get_log_likelihood_transmission()[source]
add_genetic_prior(mu_gen, gen_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • mu_gen (float) – Mutation rate

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

add_same_location_prior(P_NM, tau, loc_dist)[source]

Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.

Parameters:
  • log_K (float) – Log probability of two hosts not being in the same location

  • gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.

create_transmision_phylogeny_nets(N, mu, P_mut)[source]

N: Number of hosts mu: Mutation rate P_mut: Prob of mutation

get_newick(lengths=True)[source]
save_json(filename)[source]

Save the transmission tree to a JSON file.

Parameters:

filename (str) – Path to the output JSON file.

classmethod json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]

Load a transmission model from a JSON file and reconstruct the model object.

Parameters:
  • filename (str) – Path to the JSON file.

  • sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.

  • offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.

  • infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.

Returns:

The reconstructed transmission model.

Return type:

didelot_unsampled

infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]

Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.

This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.

Parameters:
  • selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.

  • metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.

  • verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.

Returns:

  • t_inf_new (float) – The proposed new infection time.

  • gg (float) – Proposal ratio for the Metropolis-Hastings step.

  • pp (float) – Likelihood ratio for the Metropolis-Hastings step.

  • P (float) – Acceptance probability for the Metropolis-Hastings step.

  • selected_host (host) – The host whose infection time was proposed to change.

infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]

Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.

Parameters:
  • selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.

  • metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.

  • Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.

  • verbose (bool, default=False) – If True, prints the results of the step.

add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]

Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

selected_host: host object

Host to which the unsampled host will be added. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of the time of infection of the new host.

unsampled: host object

Unsampeld host to be added to the transmission tree.

added: bool

If True, the host was added to the transmission tree.

remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]

Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.

Parameters:

selected_host: host object

Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.

P_add: float

Probability of proposing to add a new host to the transmission tree.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

verbose: bool

If True, prints the results of the step.

only_geometrical: bool

If True, only the proposal of the new topological structure will be considered.

detailed_probs: bool

If True, the method will return both probabilities of the proposals, of adding and removing a host.

Returns:

T_new: DiGraph object

New transmission tree with the proposed changes.

gg: float

Ratio of the probabilities of the proposals.

g_go: float

Probability of the proposal of adding a host.

g_ret: float

Probability of the proposal of removing a host.

prob_time: float

Probability of proposing the time of the selected_host.

added: bool

If True, the host was added to the transmission tree. Else, the node have been removed

add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]

Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.

Parameters:

P_add: float

Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.

P_rewiring: float

Probability of rewiring the new host to another sibling host.

P_off: float

Probability to rewire the new host to be a leaf.

metHast: bool

If True, the Metropolis Hastings algorithm is used to accept or reject the change.

verbose: bool

If True, prints the results of the step.

Returns:

MCMC_step(N_steps, verbose=False)[source]
class transmission_models.classes.genetic_prior_tree(model, mu, distance_matrix)[source]

Bases: object

__init__(model, mu, distance_matrix)[source]

Initialize the genetic prior tree object.

Parameters:
  • model (object) – The transmission model containing the tree structure.

  • mu (float) – The mutation rate parameter for the Poisson distribution.

  • distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Notes

This initializes the genetic prior calculator with: - A Poisson distribution with rate mu for modeling genetic distances - A distance matrix for pairwise host comparisons - A reference to the transmission model

static search_firsts_sampled_siblings(host, T, distance_matrix)[source]

Find all sampled siblings of a host in the transmission tree.

Parameters:
  • host (object) – The host for which to find sampled siblings.

  • T (networkx.DiGraph) – The transmission tree.

  • distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.

Returns:

List of sampled sibling hosts that have genetic distance data.

Return type:

list

Notes

This method recursively searches through the tree to find all sampled hosts that are descendants of the given host and have valid genetic distance data (non-NaN values in the distance matrix).

static search_first_sampled_parent(host, T, root)[source]

Find the first sampled ancestor of a host in the transmission tree.

Parameters:
  • host (object) – The host for which to find the first sampled parent.

  • T (networkx.DiGraph) – The transmission tree.

  • root (object) – The root host of the transmission tree.

Returns:

The first sampled parent host, or None if no sampled parent is found.

Return type:

object or None

Notes

This method traverses up the tree from the given host until it finds the first sampled ancestor, or reaches the root without finding one.

static get_mut_time_dist(hp, hs)[source]

Calculate the mutation time distance between two hosts.

Parameters:
  • hp (object) – The parent host.

  • hs (object) – The sibling host.

Returns:

The mutation time distance: (hs.t_sample + hp.t_sample - 2 * hp.t_inf).

Return type:

float

Notes

This calculates the time available for mutations to accumulate between the sampling times of two hosts, accounting for their common infection time.

get_closest_sampling_siblings(T=None, verbose=False)[source]

Calculate log-likelihood correction for closest sampling siblings.

Parameters:
  • T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.

  • verbose (bool, optional) – If True, print detailed information during calculation.

Returns:

The log-likelihood correction value.

Return type:

float

Notes

This method calculates correction terms for the genetic prior by finding the closest sampled siblings for each host and computing the log-likelihood of their genetic distances based on the time difference between sampling events.

prior_host(host, T, parent_dist=False)[source]

Calculate the log prior for a specific host in the transmission tree.

Parameters:
  • host (object) – The host for which to calculate the log prior.

  • T (networkx.DiGraph) – The transmission tree.

  • parent_dist (bool, optional) – If True, include parent distance in the calculation. Default is False.

Returns:

The log prior value for the host.

Return type:

float

Notes

This method calculates the log prior by considering: 1. Direct connections to sampled hosts 2. Connections to sampled siblings through unsampled intermediate hosts 3. Parent distance (if parent_dist=True)

The calculation uses Poisson distributions based on the mutation rate and time differences between sampling events.

prior_pair(h1, h2)[source]

Calculate the log prior for a pair of hosts.

Parameters:
  • h1 (object) – First host in the pair.

  • h2 (object) – Second host in the pair.

Returns:

The log prior value for the pair, or 0 if either host is not sampled.

Return type:

float

Notes

This method calculates the log prior for the genetic distance between two hosts based on their sampling time difference and the Poisson distribution with rate mu * Dt.

log_prior_host_list(host_list, T=None)[source]

Calculate the total log prior for a list of hosts.

Parameters:
  • host_list (list) – List of hosts for which to calculate the log prior.

  • T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.

Returns:

The sum of log priors for all hosts in the list.

Return type:

float

Notes

This method iterates through the host list and sums the log priors for each individual host using the log_prior_host method.

log_prior_host(host, T=None)[source]

Compute the log prior for a host.

Parameters:
  • host (object) – The host for which to compute the log prior.

  • T (object, optional) – Transmission tree. Default is None.

Returns:

The log prior value for the host.

Return type:

float

Notes

The function operates as follows:

  1. Computes the log prior for the host based on the transmission tree.

  2. Returns the log prior value.

log_prior_T(T, update_up=True, verbose=False)[source]

Calculate the total log prior for an entire transmission tree.

Parameters:
  • T (networkx.DiGraph) – The transmission tree.

  • update_up (bool, optional) – If True, include correction terms for closest sampling siblings. Default is True.

  • verbose (bool, optional) – If True, print detailed information during calculation.

Returns:

The total log prior value for the transmission tree.

Return type:

float

Notes

This method calculates the complete log prior for a transmission tree by: 1. Iterating through all hosts and their connections 2. Computing log-likelihoods for direct connections to sampled hosts 3. Computing log-likelihoods for connections to sampled siblings through unsampled hosts 4. Adding correction terms for closest sampling siblings (if update_up=True)

The calculation uses Poisson distributions based on mutation rates and time differences.

Delta_log_prior(host, T_end, T_ini)[source]

Calculate the difference in log prior between two transmission tree states.

Parameters:
  • host (object) – The host for which to calculate the log prior difference.

  • T_end (networkx.DiGraph) – The final transmission tree state.

  • T_ini (networkx.DiGraph) – The initial transmission tree state.

Returns:

The difference in log prior: log_prior(T_end) - log_prior(T_ini).

Return type:

float

Notes

This method calculates how the log prior changes when a transmission tree transitions from state T_ini to T_end. It considers: 1. Changes in parent relationships 2. Changes in sibling relationships

The calculation is useful for MCMC acceptance ratios where only the difference in log prior is needed, not the absolute values.

class transmission_models.classes.location_distance_prior_tree(model, mu, distance_matrix)[source]

Bases: object

__init__(model, mu, distance_matrix)[source]
static search_firsts_sampled_siblings(host, T)[source]
static search_first_sampleed_parent(host, T, root)[source]
static get_mut_time_dist(hp, hs)[source]
get_closest_sampling_siblings(T=None)[source]
prior_host(host, T, parent_dist=False)[source]
log_prior_T(T, update_up=True, verbose=False)[source]
class transmission_models.classes.same_location_prior_tree(model, P_NM, tau, distance_matrix)[source]

Bases: object

Class to compute the prior of the location of the hosts in the tree. The prior model computes which is the probability that a hosts stays where it lives in a characteristic time tau. It will stay where it lives with a probability exp(-t*P_NM/tau) where P is the probability that the host no moves in tau.

__init__(model, P_NM, tau, distance_matrix)[source]
static get_roots_data_subtrees(host, T, dist_matrix)[source]
static search_firsts_sampled_siblings(host, T, distance_matrix)[source]
static get_mut_time_dist(hp, hs)[source]
get_closest_sampling_siblings(T=None)[source]
prior_host(host, T, parent_dist=False)[source]
log_prior_T(T, update_up=True, verbose=False)[source]
class transmission_models.classes.MCMC(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]

Bases: object

Markov Chain Monte Carlo sampler for transmission tree inference.

This class implements MCMC sampling algorithms for transmission network inference using various proposal mechanisms.

Parameters:
  • model (didelot_unsampled) – The transmission tree model to sample from.

  • P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.

  • P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.

  • P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.

  • P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.

  • P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.

  • P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.

  • P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.

Variables:
  • model (didelot_unsampled) – The transmission model being sampled.

  • P_rewire (float) – Probability of rewiring moves.

  • P_add_remove (float) – Probability of add/remove moves.

  • P_t_shift (float) – Probability of time shift moves.

  • P_add (float) – Probability of adding vs removing hosts.

  • P_rewire_add (float) – Probability of rewiring added hosts.

  • P_offspring_add (float) – Probability of offspring vs chain model for added hosts.

  • P_to_offspring (float) – Probability of moving to offspring model.

__init__(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]

Initialize the MCMC sampler.

Parameters:
  • model (didelot_unsampled) – The transmission tree model to sample from.

  • P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.

  • P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.

  • P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.

  • P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.

  • P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.

  • P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.

  • P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.

MCMC_iteration(verbose=False)[source]

Perform an MCMC iteration on the transmission tree model.

Parameters:

verbose (bool, optional) – Whether to print the progress of the MCMC iteration. Default is False.

Returns:

A tuple containing:

  • movestr

    The type of move proposed (‘rewire’, ‘add_remove’, or ‘time_shift’).

  • ggfloat

    The ratio of proposal probabilities.

  • ppfloat

    The ratio of posterior probabilities.

  • Pfloat

    The acceptance probability.

  • acceptedbool

    Whether the move was accepted.

  • DLfloat

    The difference in log likelihood.

Return type:

tuple

Notes

The function operates as follows:

  1. Selects a move type at random.

  2. Performs the move and computes acceptance probability.

  3. Returns move details and acceptance status.

MCMC Module

MCMC Module.

This module contains Markov Chain Monte Carlo sampling algorithms for transmission network inference.

Main Classes

MCMC : Main MCMC sampler class for transmission tree inference

The MCMC module provides methods for sampling from the posterior distribution of transmission trees using various proposal mechanisms including: - Tree topology changes (rewiring) - Adding/removing unsampled hosts - Infection time updates

transmission_models.classes.mcmc.random() x in the interval [0, 1).

Utils Module

Utilities Module.

This module contains utility functions for tree manipulation, visualization, and data conversion in transmission network analysis.

Main Functions

tree_to_newick : Convert transmission tree to Newick format search_firsts_sampled_siblings : Find first sampled siblings in tree search_first_sampled_parent : Find first sampled parent in tree plot_transmision_network : Visualize transmission network tree_to_json : Convert tree to JSON format json_to_tree : Convert JSON to tree format tree_slicing_step : Tree topology manipulation functions

Visualization

hierarchy_pos : Generate hierarchical layout positions hierarchy_pos_times : Generate time-based hierarchical layout plot_transmision_network : Plot transmission network with various options

Data Conversion

tree_to_newick : Convert to Newick format for phylogenetic software tree_to_json : Convert to JSON for data storage json_to_tree : Convert from JSON back to tree structure

transmission_models.utils.tree_to_newick(g, lengths=True, root=None)[source]

Convert a transmission tree to Newick format for phylogenetic software.

Parameters:
  • g (networkx.DiGraph) – The transmission tree as a directed graph.

  • lengths (bool, optional) – Whether to include branch lengths in the Newick string. Default is True.

  • root (node, optional) – The root node of the tree. If None, the root will be inferred.

Returns:

The Newick string representation of the tree.

Return type:

str

transmission_models.utils.pdf_in_between(model, Dt, t)[source]

Compute the probability density function (PDF) value for the infection time between two events using a beta distribution parameterized by the model’s infection parameters.

Parameters:
  • model (object) – The model object containing infection parameters (expects attributes k_inf).

  • Dt (float) – The scale parameter (duration between events).

  • t (float) – The time at which to evaluate the PDF.

Returns:

The PDF value at time t.

Return type:

float

transmission_models.utils.sample_in_between(model, Dt)[source]

Sample a random infection time between two events using a beta distribution parameterized by the model’s infection parameters.

Parameters:
  • model (object) – The model object containing infection parameters (expects attributes k_inf).

  • Dt (float) – The scale parameter (duration between events).

Returns:

A random sample from the beta distribution.

Return type:

float

transmission_models.utils.random_combination(iterable, r=1)[source]

Randomly select a combination of r elements from the given iterable.

Parameters:
  • iterable (iterable) – The input iterable to select elements from.

  • r (int, optional) – The number of elements to select. Default is 1.

Returns:

A tuple containing r randomly selected elements from the iterable.

Return type:

tuple

Notes

This function is equivalent to a random selection from itertools.combinations(iterable, r).

transmission_models.utils.search_firsts_sampled_siblings(host, T)[source]

Search the firsts sampled siblings of a host in the transmission tree.

Parameters:
  • host (Host) – The host to search the siblings from.

  • T (nx.DiGraph) – The transmission tree where the host belongs to.

Returns:

The list of the firsts sampled siblings of the host.

Return type:

list

transmission_models.utils.search_first_sampled_parent(host, T, root)[source]

Search the first sampled parent of a host in the transmission tree.

Parameters:
  • host (Host) – The host to search the parent from. If the host is the root of the tree, it returns None.

  • T (nx.DiGraph) – The transmission tree where the host belongs to.

  • root (Host) – The root of the transmission tree.

Returns:

The first sampled parent of the host, or None if host is the root.

Return type:

Host or None

transmission_models.utils.Delta_log_gamma(Dt_ini, Dt_end, k, theta)[source]

Compute the log likelihood of the gamma distribution for the time between two events.

Parameters:
  • Dt_ini (float) – Initial time.

  • Dt_end (float) – End time.

  • k (float) – Shape parameter of the gamma distribution.

  • theta (float) – Scale parameter of the gamma distribution.

Returns:

Difference of the log likelihood of the gamma distribution.

Return type:

float

transmission_models.utils.hierarchy_pos(G, root=None, width=1.0, vert_gap=0.2, vert_loc=0, xcenter=0.5)[source]

Compute hierarchical layout positions for a tree graph.

Parameters:
  • G (networkx.Graph) – The graph (must be a tree).

  • root (node, optional) – The root node of the current branch. If None, the root will be found automatically.

  • width (float, optional) – Horizontal space allocated for this branch. Default is 1.0.

  • vert_gap (float, optional) – Gap between levels of hierarchy. Default is 0.2.

  • vert_loc (float, optional) – Vertical location of root. Default is 0.

  • xcenter (float, optional) – Horizontal location of root. Default is 0.5.

Returns:

A dictionary of positions keyed by node.

Return type:

dict

Notes

This function is adapted from Joel’s answer at https://stackoverflow.com/a/29597209/2966723. Licensed under Creative Commons Attribution-Share Alike.

transmission_models.utils.hierarchy_pos_times(G, root=None, width=1.0, vert_gap=0.2, vert_loc=0, xcenter=0.5)[source]

Compute hierarchical layout positions for a tree graph, using time as vertical position.

Parameters:
  • G (networkx.Graph) – The graph (must be a tree).

  • root (node, optional) – The root node of the current branch. If None, the root will be found automatically.

  • width (float, optional) – Horizontal space allocated for this branch. Default is 1.0.

  • vert_gap (float, optional) – Gap between levels of hierarchy. Default is 0.2.

  • vert_loc (float, optional) – Vertical location of root. Default is 0.

  • xcenter (float, optional) – Horizontal location of root. Default is 0.5.

Returns:

A dictionary of positions keyed by node.

Return type:

dict

Notes

This function is adapted from Joel’s answer at https://stackoverflow.com/a/29597209/2966723. Licensed under Creative Commons Attribution-Share Alike.

transmission_models.utils.plot_transmision_network(T, nodes_labels=False, pos=None, highlighted_nodes=None, ax=None, to_frame=False, title=None, filename=None, show=True)[source]

Visualize a transmission network using matplotlib and networkx.

Parameters:
  • T (networkx.DiGraph) – The transmission network to plot.

  • nodes_labels (bool, optional) – Whether to display node labels. Default is False.

  • pos (dict, optional) – Node positions for layout. If None, uses graphviz_layout. Default is None.

  • highlighted_nodes (list, optional) – List of nodes to highlight. Default is None.

  • ax (matplotlib.axes.Axes, optional) – The axes to plot on. If None, uses current axes. Default is None.

  • to_frame (bool, optional) – If True, saves the plot to a temporary image and returns it as an array. Default is False.

  • title (str, optional) – Title for the plot. Default is None.

  • filename (str, optional) – If provided, saves the plot to this file. Default is None.

  • show (bool, optional) – Whether to display the plot. Default is True.

Returns:

image – The image array if to_frame is True, otherwise None.

Return type:

ndarray or None

transmission_models.utils.tree_to_dict(model, h)[source]

Convert a host and its descendants to a nested dictionary suitable for JSON export.

Parameters:
  • model (object) – The transmission model containing the tree structure.

  • h (host) – The host node to convert.

Returns:

A nested dictionary representing the host and its descendants.

Return type:

dict

transmission_models.utils.cast_types(value, types_map)[source]

Recursively cast types in a nested data structure.

This function recursively traverses a nested data structure (dict, list) and casts any values that match the types in types_map to their target types. Useful for fixing JSON serialization issues with numpy types.

Parameters:
  • value (any) – The value to cast. Can be a dict, list, or any other type.

  • types_map (list of tuples) – List of (from_type, to_type) tuples specifying type conversions.

Returns:

The value with types cast according to types_map.

Return type:

any

Examples

>>> import numpy as np
>>> import json
>>> data = [np.int64(123)]
>>> data = cast_types(data, [(np.int64, int), (np.float64, float)])
>>> data_json = json.dumps(data)
>>> data_json == "[123]"
True

Notes

This function is useful for fixing “TypeError: Object of type int64 is not JSON serializable” errors when working with numpy arrays and JSON.

transmission_models.utils.tree_to_json(model, filename)[source]

Save a transmission model and its tree to a JSON file.

Parameters:
  • model (object) – The transmission model to export.

  • filename (str) – The path to the output JSON file.

transmission_models.utils.get_host_from_dict(dict_tree)[source]

Create a host object from a dictionary representation (as used in JSON trees).

Parameters:

dict_tree (dict) – The dictionary representing a host (from JSON).

Returns:

The reconstructed host object.

Return type:

host

transmission_models.utils.read_tree_dict(dict_tree, h1=None, edge_list=[])[source]

Recursively read a tree dictionary and extract edges as (parent, child) tuples.

Parameters:
  • dict_tree (dict) – The dictionary representing the tree (from JSON).

  • h1 (host, optional) – The parent host node. If None, will be created from dict_tree.

  • edge_list (list, optional) – The list to append edges to. Default is an empty list.

Returns:

A list of (parent, child) edge tuples.

Return type:

list

transmission_models.utils.json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]

Load a transmission model from a JSON file and reconstruct the model object.

Parameters:
  • filename (str) – Path to the JSON file.

  • sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.

  • offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.

  • infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.

Returns:

The reconstructed transmission model.

Return type:

didelot_unsampled

transmission_models.utils.build_infection_based_network(model, host_list)[source]

Generate a transmission tree network given a list of sampled hosts.

This function creates a transmission tree from the dataset. It uses the model’s sampling and infection parameters to construct a plausible initial transmission network.

For each host, we get a number of infected hosts and then we toss a coin to each host to see if they are connected given the infection time.

At the end, we add a virtual root host to connect all disconnected components.

Parameters:
  • model (didelot_unsampled) – The transmission model with sampling and infection parameters.

  • host_list (list) – List of host objects representing the sampled data.

Returns:

model

Return type:

The updated model with the generated transmission tree

Notes

This function implements the algorithm described in the notebook for generating initial transmission networks. It creates a directed graph representing the transmission tree and adds a virtual root host to connect all disconnected components.

transmission_models.utils.random() x in the interval [0, 1).