Classes
Host
- class transmission_models.classes.host.host(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]
Bases:
objectRepresents a host that has been infected with a virus.
A host object contains information about an infected individual, including their genetic data, infection time, sampling time, and other attributes.
- Variables:
index (int) – The index of the host.
sampled (bool) – Indicates whether the host has been sampled or not.
genetic_data (list) – The genetic data of the host.
dict_attributes (dict) – A dictionary to store additional attributes.
t_inf (int) – Time of infection.
t_sample (int, optional) – The time the host was sampled.
id (str) – The identifier of the host.
- t_inf : property
Getter and setter for the time of infection attribute.
- get_genetic_str() : str
Returns the genetic data as a string.
- __str__() : str
Returns a string with the id of the host.
- __int__() : int
Returns the index of the host.
Examples
>>> h = host('host1', 1, ['A', 'T', 'C', 'G'], 10, t_sample=15) >>> print(h.t_inf) 10 >>> h.t_inf = 20 >>> print(h.t_inf) 20 >>> print(h.get_genetic_str()) ATCG >>> print(h) host1
Notes
This class follows the Python naming convention for class names (using PascalCase).
- __init__(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]
Initialize a new instance of the Host class.
- Parameters:
- property t_inf
Getter for the time of infection attribute.
- Returns:
The time of infection.
- Return type:
- get_genetic_str()[source]
Return the genetic data of the host as a string.
- Returns:
The genetic data as a string.
- Return type:
- transmission_models.classes.host.create_genome(chain_length)[source]
Create a random genome sequence of specified length.
- Parameters:
chain_length (int) – The length of the genome sequence to create.
- Returns:
A list of random nucleotides (A, G, C, T) of length chain_length.
- Return type:
Examples
>>> genome = create_genome(10) >>> print(genome) ['A', 'T', 'C', 'G', 'A', 'T', 'C', 'G', 'A', 'T']
- transmission_models.classes.host.binom_mutation(chain_length, p, genome)[source]
Perform binomial mutation on a given genome.
This function generates changes in a genome by randomly selecting ‘k’ positions to mutate, where ‘k’ follows a binomial distribution with parameters ‘chain_length’ and ‘p’. The elements at the selected positions are replaced with new randomly chosen nucleotides.
- Parameters:
- Returns:
The mutated genome sequence.
- Return type:
Notes
The function operates as follows:
Calculates the number of positions to mutate, ‘k’, by sampling from a binomial distribution with ‘chain_length’ trials and success probability ‘p’.
Randomly selects ‘k’ positions from the range [0, chain_length) without replacement.
Creates a new list ‘new_genome’ from the original genome.
Iterates over the selected positions and replaces the corresponding elements in ‘new_genome’ with randomly chosen nucleotides based on the original nucleotide at that position:
If the original nucleotide is ‘A’, it is replaced with a randomly chosen nucleotide from ‘CTG’.
If the original nucleotide is ‘C’, it is replaced with a randomly chosen nucleotide from ‘ATG’.
If the original nucleotide is ‘T’, it is replaced with a randomly chosen nucleotide from ‘ACG’.
If the original nucleotide is ‘G’, it is replaced with a randomly chosen nucleotide from ‘ACT’.
Returns the mutated genome sequence as ‘new_genome’.
Examples
>>> genome = ['A', 'T', 'C', 'G', 'G', 'A', 'T', 'C', 'G', 'A'] >>> mutated_genome = binom_mutation(len(genome), 0.2, genome) >>> print(mutated_genome) ['A', 'T', 'C', 'A', 'G', 'A', 'T', 'C', 'G', 'A']
See also
one_mutationPerform a single mutation on a genome
- transmission_models.classes.host.one_mutation(chain_length, p, genome)[source]
Perform one mutation on a given genome.
This function generates a single mutation in a genome by randomly selecting one position to mutate. The selected position is replaced with a new randomly chosen nucleotide.
- Parameters:
- Returns:
The mutated genome sequence.
- Return type:
Notes
The function operates as follows:
Randomly selects one position from the range [0, chain_length) to mutate.
Creates a new list ‘new_genome’ from the original genome.
Checks the original nucleotide at the selected position and replaces it with a randomly chosen nucleotide based on the following rules:
If the original nucleotide is ‘A’, it is replaced with a randomly chosen nucleotide from ‘CTG’.
If the original nucleotide is ‘C’, it is replaced with a randomly chosen nucleotide from ‘ATG’.
If the original nucleotide is ‘T’, it is replaced with a randomly chosen nucleotide from ‘ACG’.
If the original nucleotide is ‘G’, it is replaced with a randomly chosen nucleotide from ‘ACT’.
Returns the mutated genome sequence as ‘new_genome’.
Examples
>>> genome = ['A', 'T', 'C', 'G', 'G', 'A', 'T', 'C', 'G', 'A'] >>> mutated_genome = one_mutation(len(genome), 0.2, genome) >>> print(mutated_genome) ['A', 'T', 'C', 'A', 'G', 'A', 'T', 'C', 'G', 'T']
See also
binom_mutationPerform binomial mutation on a genome
- transmission_models.classes.host.average_mutations(mu, P_mut, tau, Dt, host_genetic)[source]
Generate a list of mutations proportional to a time interval.
The number of mutations is proportional to a given time interval (Dt) where the proportion factor is the mutation rate (mu).
- Parameters:
- Returns:
A tuple containing:
- mutationslist
List of mutated genetic sequences.
- t_mutationslist
List of mutation times.
- Return type:
Notes
The function calculates the number of mutations as int(mu * Dt / P_mut) and generates that many mutations using the one_mutation function.
Didelot Unsampled
- class transmission_models.classes.didelot_unsampled.didelot_unsampled(sampling_params, offspring_params, infection_params, T=None)[source]
Bases:
objectDidelot unsampled transmission model.
This class implements the Didelot et al. (2017) framework for transmission tree inference with unsampled hosts. It provides methods for building transmission networks, computing likelihoods, and performing MCMC sampling.
The model incorporates three main components: 1. Sampling model: Gamma distribution for sampling times 2. Offspring model: Negative binomial distribution for offspring number 3. Infection model: Gamma distribution for infection times
- Parameters:
sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution
offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection
infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution
- Variables:
T (networkx.DiGraph) – The transmission tree.
host_dict (dict) – Dictionary mapping host IDs to host objects.
log_likelihood (float) – Current log likelihood of the model.
genetic_prior (genetic_prior_tree, optional) – Prior for genetic data.
same_location_prior (same_location_prior_tree, optional) – Prior for location data.
References
Didelot, X., Gardy, J., & Colijn, C. (2017). Bayesian inference of transmission chains using timing of events, contact and genetic data. PLoS computational biology, 13(4), e1005496.
Core Methods
- __init__(sampling_params, offspring_params, infection_params, T=None)[source]
Initialize the Didelot unsampled transmission model.
- Parameters:
sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution
offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection
infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution
T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.
- Raises:
KeyError – If any required parameter is missing from the input dictionaries.
- add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]
Add the root host to the transmission tree.
- Parameters:
t_sampl (float) – Sampling time of the root host.
id (str, optional) – Identifier for the root host. Default is “0”.
genetic_data (list, optional) – Genetic data for the root host. Default is empty list.
t_inf (float, optional) – Infection time of the root host. Default is 0.
t_sample (float, optional) – Sampling time of the root host. Default is None.
- Returns:
The root host object.
- Return type:
- successors(host)[source]
Get the successors (children) of a given host in the transmission tree.
- Parameters:
host (host) – The host node whose successors are to be returned.
- Returns:
An iterator over the successors of the host.
- Return type:
iterator
- out_degree(host)[source]
Get the out-degree (number of children) of a host in the transmission tree.
Tree Structure Methods
- get_root_subtrees()[source]
Retrieve the root subtrees of the transmission tree.
This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.
- Returns:
A list of root subtrees.
- Return type:
- get_unsampled_hosts()[source]
Get the list of unsampled hosts in the transmission tree (excluding the root).
- Returns:
List of unsampled host nodes.
- Return type:
- get_candidates_to_chain()[source]
Get the list of candidate hosts for chain moves in the transmission tree.
- Returns:
List of candidate host nodes for chain moves.
- Return type:
- get_N_candidates_to_chain(recompute=False)[source]
Get the number of candidate hosts for chain moves, optionally recomputing the list.
Likelihood Methods
- get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]
Compute the likelihood of the sampling model.
Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_infection_model_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Parameters:
- Returns:
L – The likelihood of the infection model given the list of hosts
- Return type:
- get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Parameters:
- Returns:
L – The likelihood of the infection model given the list of hosts
- Return type:
- log_likelihood_host(host, T=None)[source]
Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object
- Returns:
log_likelihood – The log likelihood of the host in the transmission network
- Return type:
Delta Methods (for MCMC)
- Delta_log_sampling(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the sampling model.
- Parameters:
- Returns:
Change in log-likelihood for the sampling model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the sampling model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- Delta_log_offspring(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the offspring model.
- Parameters:
- Returns:
Change in log-likelihood for the offspring model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the offspring model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- Delta_log_infection(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the infection model.
- Parameters:
- Returns:
Change in log-likelihood for the infection model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the infection model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for a host.
- Parameters:
- Returns:
Change in log-likelihood for the host.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the host at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
MCMC Step Methods
- infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]
Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.
This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.
- Parameters:
selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.
metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.
verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.
- Returns:
t_inf_new (float) – The proposed new infection time.
gg (float) – Proposal ratio for the Metropolis-Hastings step.
pp (float) – Likelihood ratio for the Metropolis-Hastings step.
P (float) – Acceptance probability for the Metropolis-Hastings step.
selected_host (host) – The host whose infection time was proposed to change.
- infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]
Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.
- Parameters:
selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.
metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.
Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.
verbose (bool, default=False) – If True, prints the results of the step.
- add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]
Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.
Parameters:
- selected_host: host object
Host to which the unsampled host will be added. If None, a host is randomly selected.
- P_add: float
Probability of proposing to add a new host to the transmission tree.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- verbose: bool
If True, prints the results of the step.
- only_geometrical: bool
If True, only the proposal of the new topological structure will be considered.
- detailed_probs: bool
If True, the method will return both probabilities of the proposals, of adding and removing a host.
Returns:
- T_new: DiGraph object
New transmission tree with the proposed changes.
- gg: float
Ratio of the probabilities of the proposals.
- g_go: float
Probability of the proposal of adding a host.
- g_ret: float
Probability of the proposal of removing a host.
- prob_time: float
Probability of the time of infection of the new host.
- unsampled: host object
Unsampeld host to be added to the transmission tree.
- added: bool
If True, the host was added to the transmission tree.
- remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]
Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.
Parameters:
- selected_host: host object
Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.
- P_add: float
Probability of proposing to add a new host to the transmission tree.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- verbose: bool
If True, prints the results of the step.
- only_geometrical: bool
If True, only the proposal of the new topological structure will be considered.
- detailed_probs: bool
If True, the method will return both probabilities of the proposals, of adding and removing a host.
Returns:
- T_new: DiGraph object
New transmission tree with the proposed changes.
- gg: float
Ratio of the probabilities of the proposals.
- g_go: float
Probability of the proposal of adding a host.
- g_ret: float
Probability of the proposal of removing a host.
- prob_time: float
Probability of proposing the time of the selected_host.
- added: bool
If True, the host was added to the transmission tree. Else, the node have been removed
- add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]
Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.
Parameters:
- P_add: float
Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- metHast: bool
If True, the Metropolis Hastings algorithm is used to accept or reject the change.
- verbose: bool
If True, prints the results of the step.
Returns:
Prior Methods
- add_genetic_prior(mu_gen, gen_dist)[source]
Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.
- Parameters:
mu_gen (float) – Mutation rate
gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.
- add_same_location_prior(P_NM, tau, loc_dist)[source]
Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.
- Parameters:
log_K (float) – Log probability of two hosts not being in the same location
gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.
- compute_Delta_loc_prior(T_new)[source]
Compute the change in the location prior log-likelihood for a new tree.
- Parameters:
T_new (networkx.DiGraph) – The new transmission tree.
- Returns:
(Delta log prior, new log prior, old log prior, old correction log-likelihood)
- Return type:
Utility Methods
- create_transmision_phylogeny_nets(N, mu, P_mut)[source]
N: Number of hosts mu: Mutation rate P_mut: Prob of mutation
- save_json(filename)[source]
Save the transmission tree to a JSON file.
- Parameters:
filename (str) – Path to the output JSON file.
- show_log_likelihoods(hosts=None, T=None, verbose=False)[source]
Print and return the log-likelihoods for the sampling, offspring, and infection models.
- Parameters:
hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.
T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.
verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.
- Returns:
(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.
- Return type:
- __init__(sampling_params, offspring_params, infection_params, T=None)[source]
Initialize the Didelot unsampled transmission model.
- Parameters:
sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution
offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection
infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution
T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.
- Raises:
KeyError – If any required parameter is missing from the input dictionaries.
- property T
- samp_t_inf_between(h1, h2)[source]
Sample a time of infection between two hosts.
Uses a rejection sampling method to sample the time of infection of the infected host using the chain model from Didelot et al. 2017.
- Parameters:
- Returns:
Time of infection of the host infected by h1 and the infector of h2.
- Return type:
Notes
This method implements the rejection sampling algorithm described in Didelot et al. (2017) for sampling infection times in transmission chains.
- add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]
Add the root host to the transmission tree.
- Parameters:
t_sampl (float) – Sampling time of the root host.
id (str, optional) – Identifier for the root host. Default is “0”.
genetic_data (list, optional) – Genetic data for the root host. Default is empty list.
t_inf (float, optional) – Infection time of the root host. Default is 0.
t_sample (float, optional) – Sampling time of the root host. Default is None.
- Returns:
The root host object.
- Return type:
- successors(host)[source]
Get the successors (children) of a given host in the transmission tree.
- Parameters:
host (host) – The host node whose successors are to be returned.
- Returns:
An iterator over the successors of the host.
- Return type:
iterator
- out_degree(host)[source]
Get the out-degree (number of children) of a host in the transmission tree.
- compute_Delta_loc_prior(T_new)[source]
Compute the change in the location prior log-likelihood for a new tree.
- Parameters:
T_new (networkx.DiGraph) – The new transmission tree.
- Returns:
(Delta log prior, new log prior, old log prior, old correction log-likelihood)
- Return type:
- get_candidates_to_chain()[source]
Get the list of candidate hosts for chain moves in the transmission tree.
- Returns:
List of candidate host nodes for chain moves.
- Return type:
- get_N_candidates_to_chain(recompute=False)[source]
Get the number of candidate hosts for chain moves, optionally recomputing the list.
- get_root_subtrees()[source]
Retrieve the root subtrees of the transmission tree.
This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.
- Returns:
A list of root subtrees.
- Return type:
- get_unsampled_hosts()[source]
Get the list of unsampled hosts in the transmission tree (excluding the root).
- Returns:
List of unsampled host nodes.
- Return type:
- get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]
Compute the likelihood of the sampling model.
Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Delta_log_sampling(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the sampling model.
- Parameters:
- Returns:
Change in log-likelihood for the sampling model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the sampling model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Delta_log_offspring(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the offspring model.
- Parameters:
- Returns:
Change in log-likelihood for the offspring model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the offspring model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- get_infection_model_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Parameters:
- Returns:
L – The likelihood of the infection model given the list of hosts
- Return type:
- get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Parameters:
- Returns:
L – The likelihood of the infection model given the list of hosts
- Return type:
- Delta_log_infection(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the infection model.
- Parameters:
- Returns:
Change in log-likelihood for the infection model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the infection model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- log_likelihood_host(host, T=None)[source]
Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object
- Returns:
log_likelihood – The log likelihood of the host in the transmission network
- Return type:
- Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for a host.
- Parameters:
- Returns:
Change in log-likelihood for the host.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the host at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- log_posterior_transmission_tree()[source]
Compute the log-posterior of the current transmission tree.
This method calculates the log-posterior probability of the current transmission tree by summing the log-likelihood of the tree and any additional prior log-probabilities, such as genetic and location priors, if they are defined.
- Returns:
The computed log-posterior of the current transmission tree.
- Return type:
Notes
- The log-posterior is computed as:
log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)
- The method uses the following attributes:
self.log_likelihood: Log-likelihood of the transmission tree.
self.genetic_log_prior: Log-prior from the genetic model (if defined).
self.same_location_log_prior: Log-prior from the location model (if defined).
- get_log_posterior_transmission_tree(T)[source]
Compute and update the log-posterior of the transmission tree.
This method calculates the log-posterior probability of the given transmission tree T by combining the log-likelihood of the tree with any additional prior log-probabilities, such as genetic and location priors, if they are defined. The computed log-posterior and any relevant prior log-likelihoods are stored as attributes of the object.
- Parameters:
T (networkx.DiGraph) – The transmission tree for which to compute the log-posterior.
- Returns:
The computed log-posterior of the transmission tree.
- Return type:
Notes
- The log-posterior is computed as:
log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)
- The method also updates the following attributes:
self.log_posterior
self.genetic_log_prior (if applicable)
self.same_location_log_prior (if applicable)
- show_log_likelihoods(hosts=None, T=None, verbose=False)[source]
Print and return the log-likelihoods for the sampling, offspring, and infection models.
- Parameters:
hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.
T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.
verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.
- Returns:
(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.
- Return type:
- log_likelihood_transmission_tree_old(T)[source]
Compute the log-likelihood of the entire transmission tree using the old method.
- Parameters:
T (networkx.DiGraph) – Transmission tree to compute the log-likelihood for.
- Returns:
The log-likelihood of the transmission tree.
- Return type:
- add_genetic_prior(mu_gen, gen_dist)[source]
Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.
- Parameters:
mu_gen (float) – Mutation rate
gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.
- add_same_location_prior(P_NM, tau, loc_dist)[source]
Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.
- Parameters:
log_K (float) – Log probability of two hosts not being in the same location
gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.
- create_transmision_phylogeny_nets(N, mu, P_mut)[source]
N: Number of hosts mu: Mutation rate P_mut: Prob of mutation
- save_json(filename)[source]
Save the transmission tree to a JSON file.
- Parameters:
filename (str) – Path to the output JSON file.
- classmethod json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]
Load a transmission model from a JSON file and reconstruct the model object.
- Parameters:
filename (str) – Path to the JSON file.
sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.
offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.
infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.
- Returns:
The reconstructed transmission model.
- Return type:
- infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]
Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.
This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.
- Parameters:
selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.
metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.
verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.
- Returns:
t_inf_new (float) – The proposed new infection time.
gg (float) – Proposal ratio for the Metropolis-Hastings step.
pp (float) – Likelihood ratio for the Metropolis-Hastings step.
P (float) – Acceptance probability for the Metropolis-Hastings step.
selected_host (host) – The host whose infection time was proposed to change.
- infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]
Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.
- Parameters:
selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.
metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.
Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.
verbose (bool, default=False) – If True, prints the results of the step.
- add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]
Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.
Parameters:
- selected_host: host object
Host to which the unsampled host will be added. If None, a host is randomly selected.
- P_add: float
Probability of proposing to add a new host to the transmission tree.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- verbose: bool
If True, prints the results of the step.
- only_geometrical: bool
If True, only the proposal of the new topological structure will be considered.
- detailed_probs: bool
If True, the method will return both probabilities of the proposals, of adding and removing a host.
Returns:
- T_new: DiGraph object
New transmission tree with the proposed changes.
- gg: float
Ratio of the probabilities of the proposals.
- g_go: float
Probability of the proposal of adding a host.
- g_ret: float
Probability of the proposal of removing a host.
- prob_time: float
Probability of the time of infection of the new host.
- unsampled: host object
Unsampeld host to be added to the transmission tree.
- added: bool
If True, the host was added to the transmission tree.
- remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]
Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.
Parameters:
- selected_host: host object
Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.
- P_add: float
Probability of proposing to add a new host to the transmission tree.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- verbose: bool
If True, prints the results of the step.
- only_geometrical: bool
If True, only the proposal of the new topological structure will be considered.
- detailed_probs: bool
If True, the method will return both probabilities of the proposals, of adding and removing a host.
Returns:
- T_new: DiGraph object
New transmission tree with the proposed changes.
- gg: float
Ratio of the probabilities of the proposals.
- g_go: float
Probability of the proposal of adding a host.
- g_ret: float
Probability of the proposal of removing a host.
- prob_time: float
Probability of proposing the time of the selected_host.
- added: bool
If True, the host was added to the transmission tree. Else, the node have been removed
- add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]
Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.
Parameters:
- P_add: float
Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- metHast: bool
If True, the Metropolis Hastings algorithm is used to accept or reject the change.
- verbose: bool
If True, prints the results of the step.
Returns:
MCMC
- class transmission_models.classes.mcmc.mcmc.MCMC(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]
Bases:
objectMarkov Chain Monte Carlo sampler for transmission tree inference.
This class implements MCMC sampling algorithms for transmission network inference using various proposal mechanisms.
- Parameters:
model (didelot_unsampled) – The transmission tree model to sample from.
P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.
P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.
P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.
P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.
P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.
P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.
P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.
- Variables:
model (didelot_unsampled) – The transmission model being sampled.
P_rewire (float) – Probability of rewiring moves.
P_add_remove (float) – Probability of add/remove moves.
P_t_shift (float) – Probability of time shift moves.
P_add (float) – Probability of adding vs removing hosts.
P_rewire_add (float) – Probability of rewiring added hosts.
P_offspring_add (float) – Probability of offspring vs chain model for added hosts.
P_to_offspring (float) – Probability of moving to offspring model.
- __init__(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]
Initialize the MCMC sampler.
- Parameters:
model (didelot_unsampled) – The transmission tree model to sample from.
P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.
P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.
P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.
P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.
P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.
P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.
P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.
- MCMC_iteration(verbose=False)[source]
Perform an MCMC iteration on the transmission tree model.
- Parameters:
verbose (bool, optional) – Whether to print the progress of the MCMC iteration. Default is False.
- Returns:
A tuple containing:
- movestr
The type of move proposed (‘rewire’, ‘add_remove’, or ‘time_shift’).
- ggfloat
The ratio of proposal probabilities.
- ppfloat
The ratio of posterior probabilities.
- Pfloat
The acceptance probability.
- acceptedbool
Whether the move was accepted.
- DLfloat
The difference in log likelihood.
- Return type:
Notes
The function operates as follows:
Selects a move type at random.
Performs the move and computes acceptance probability.
Returns move details and acceptance status.
Priors
- class transmission_models.classes.genetic_prior.genetic_prior_tree(model, mu, distance_matrix)[source]
Bases:
object- __init__(model, mu, distance_matrix)[source]
Initialize the genetic prior tree object.
- Parameters:
model (object) – The transmission model containing the tree structure.
mu (float) – The mutation rate parameter for the Poisson distribution.
distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.
Notes
This initializes the genetic prior calculator with: - A Poisson distribution with rate mu for modeling genetic distances - A distance matrix for pairwise host comparisons - A reference to the transmission model
- static search_firsts_sampled_siblings(host, T, distance_matrix)[source]
Find all sampled siblings of a host in the transmission tree.
- Parameters:
host (object) – The host for which to find sampled siblings.
T (networkx.DiGraph) – The transmission tree.
distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.
- Returns:
List of sampled sibling hosts that have genetic distance data.
- Return type:
Notes
This method recursively searches through the tree to find all sampled hosts that are descendants of the given host and have valid genetic distance data (non-NaN values in the distance matrix).
- static search_first_sampled_parent(host, T, root)[source]
Find the first sampled ancestor of a host in the transmission tree.
- Parameters:
host (object) – The host for which to find the first sampled parent.
T (networkx.DiGraph) – The transmission tree.
root (object) – The root host of the transmission tree.
- Returns:
The first sampled parent host, or None if no sampled parent is found.
- Return type:
object or None
Notes
This method traverses up the tree from the given host until it finds the first sampled ancestor, or reaches the root without finding one.
- static get_mut_time_dist(hp, hs)[source]
Calculate the mutation time distance between two hosts.
- Parameters:
- Returns:
The mutation time distance: (hs.t_sample + hp.t_sample - 2 * hp.t_inf).
- Return type:
Notes
This calculates the time available for mutations to accumulate between the sampling times of two hosts, accounting for their common infection time.
- get_closest_sampling_siblings(T=None, verbose=False)[source]
Calculate log-likelihood correction for closest sampling siblings.
- Parameters:
T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.
verbose (bool, optional) – If True, print detailed information during calculation.
- Returns:
The log-likelihood correction value.
- Return type:
Notes
This method calculates correction terms for the genetic prior by finding the closest sampled siblings for each host and computing the log-likelihood of their genetic distances based on the time difference between sampling events.
- prior_host(host, T, parent_dist=False)[source]
Calculate the log prior for a specific host in the transmission tree.
- Parameters:
host (object) – The host for which to calculate the log prior.
T (networkx.DiGraph) – The transmission tree.
parent_dist (bool, optional) – If True, include parent distance in the calculation. Default is False.
- Returns:
The log prior value for the host.
- Return type:
Notes
This method calculates the log prior by considering: 1. Direct connections to sampled hosts 2. Connections to sampled siblings through unsampled intermediate hosts 3. Parent distance (if parent_dist=True)
The calculation uses Poisson distributions based on the mutation rate and time differences between sampling events.
- prior_pair(h1, h2)[source]
Calculate the log prior for a pair of hosts.
- Parameters:
- Returns:
The log prior value for the pair, or 0 if either host is not sampled.
- Return type:
Notes
This method calculates the log prior for the genetic distance between two hosts based on their sampling time difference and the Poisson distribution with rate mu * Dt.
- log_prior_host_list(host_list, T=None)[source]
Calculate the total log prior for a list of hosts.
- Parameters:
host_list (list) – List of hosts for which to calculate the log prior.
T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.
- Returns:
The sum of log priors for all hosts in the list.
- Return type:
Notes
This method iterates through the host list and sums the log priors for each individual host using the log_prior_host method.
- log_prior_host(host, T=None)[source]
Compute the log prior for a host.
- Parameters:
- Returns:
The log prior value for the host.
- Return type:
Notes
The function operates as follows:
Computes the log prior for the host based on the transmission tree.
Returns the log prior value.
- log_prior_T(T, update_up=True, verbose=False)[source]
Calculate the total log prior for an entire transmission tree.
- Parameters:
T (networkx.DiGraph) – The transmission tree.
update_up (bool, optional) – If True, include correction terms for closest sampling siblings. Default is True.
verbose (bool, optional) – If True, print detailed information during calculation.
- Returns:
The total log prior value for the transmission tree.
- Return type:
Notes
This method calculates the complete log prior for a transmission tree by: 1. Iterating through all hosts and their connections 2. Computing log-likelihoods for direct connections to sampled hosts 3. Computing log-likelihoods for connections to sampled siblings through unsampled hosts 4. Adding correction terms for closest sampling siblings (if update_up=True)
The calculation uses Poisson distributions based on mutation rates and time differences.
- Delta_log_prior(host, T_end, T_ini)[source]
Calculate the difference in log prior between two transmission tree states.
- Parameters:
host (object) – The host for which to calculate the log prior difference.
T_end (networkx.DiGraph) – The final transmission tree state.
T_ini (networkx.DiGraph) – The initial transmission tree state.
- Returns:
The difference in log prior: log_prior(T_end) - log_prior(T_ini).
- Return type:
Notes
This method calculates how the log prior changes when a transmission tree transitions from state T_ini to T_end. It considers: 1. Changes in parent relationships 2. Changes in sibling relationships
The calculation is useful for MCMC acceptance ratios where only the difference in log prior is needed, not the absolute values.
- transmission_models.classes.genetic_prior.get_roots_data_subtrees(host, T, dist_matrix)[source]
Get all sampled hosts with genetic data in subtrees rooted at a given host.
- Parameters:
host (object) – The root host of the subtrees to search.
T (networkx.DiGraph) – The transmission tree.
dist_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.
- Returns:
List of sampled hosts that have valid genetic distance data.
- Return type:
Notes
This function recursively searches through all subtrees rooted at the given host and collects all sampled hosts that have non-NaN values in the distance matrix (indicating they have genetic sequence data).
- class transmission_models.classes.location_prior.location_distance_prior_tree(model, mu, distance_matrix)[source]
Bases:
object
- class transmission_models.classes.location_prior.same_location_prior_tree(model, P_NM, tau, distance_matrix)[source]
Bases:
objectClass to compute the prior of the location of the hosts in the tree. The prior model computes which is the probability that a hosts stays where it lives in a characteristic time tau. It will stay where it lives with a probability exp(-t*P_NM/tau) where P is the probability that the host no moves in tau.
Module Documentation
Classes Module
Classes Module.
This module contains all the main classes for the transmission_models package.
Main Classes
host : Host class representing infected individuals didelot_unsampled : Main class implementing the Didelot et al. (2017) framework genetic_prior_tree : Prior distribution for genetic sequence data location_distance_prior_tree : Prior distribution for location distance data same_location_prior_tree : Prior distribution for same location probability MCMC : Markov Chain Monte Carlo sampling algorithms
Submodules
mcmc : MCMC sampling classes and algorithms
- class transmission_models.classes.host(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]
Bases:
objectRepresents a host that has been infected with a virus.
A host object contains information about an infected individual, including their genetic data, infection time, sampling time, and other attributes.
- Variables:
index (int) – The index of the host.
sampled (bool) – Indicates whether the host has been sampled or not.
genetic_data (list) – The genetic data of the host.
dict_attributes (dict) – A dictionary to store additional attributes.
t_inf (int) – Time of infection.
t_sample (int, optional) – The time the host was sampled.
id (str) – The identifier of the host.
- t_inf : property
Getter and setter for the time of infection attribute.
- get_genetic_str() : str
Returns the genetic data as a string.
- __str__() : str
Returns a string with the id of the host.
- __int__() : int
Returns the index of the host.
Examples
>>> h = host('host1', 1, ['A', 'T', 'C', 'G'], 10, t_sample=15) >>> print(h.t_inf) 10 >>> h.t_inf = 20 >>> print(h.t_inf) 20 >>> print(h.get_genetic_str()) ATCG >>> print(h) host1
Notes
This class follows the Python naming convention for class names (using PascalCase).
- __init__(id, index, genetic_data=[], t_inf=0, t_sample=None)[source]
Initialize a new instance of the Host class.
- Parameters:
- property t_inf
Getter for the time of infection attribute.
- Returns:
The time of infection.
- Return type:
- get_genetic_str()[source]
Return the genetic data of the host as a string.
- Returns:
The genetic data as a string.
- Return type:
- class transmission_models.classes.didelot_unsampled(sampling_params, offspring_params, infection_params, T=None)[source]
Bases:
objectDidelot unsampled transmission model.
This class implements the Didelot et al. (2017) framework for transmission tree inference with unsampled hosts. It provides methods for building transmission networks, computing likelihoods, and performing MCMC sampling.
The model incorporates three main components: 1. Sampling model: Gamma distribution for sampling times 2. Offspring model: Negative binomial distribution for offspring number 3. Infection model: Gamma distribution for infection times
- Parameters:
sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution
offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection
infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution
- Variables:
T (networkx.DiGraph) – The transmission tree.
host_dict (dict) – Dictionary mapping host IDs to host objects.
log_likelihood (float) – Current log likelihood of the model.
genetic_prior (genetic_prior_tree, optional) – Prior for genetic data.
same_location_prior (same_location_prior_tree, optional) – Prior for location data.
References
Didelot, X., Gardy, J., & Colijn, C. (2017). Bayesian inference of transmission chains using timing of events, contact and genetic data. PLoS computational biology, 13(4), e1005496.
- __init__(sampling_params, offspring_params, infection_params, T=None)[source]
Initialize the Didelot unsampled transmission model.
- Parameters:
sampling_params (dict) – Parameters for the sampling model containing: - pi : float, sampling probability - k_samp : float, shape parameter for gamma distribution - theta_samp : float, scale parameter for gamma distribution
offspring_params (dict) – Parameters for the offspring model containing: - r : float, rate of infection - p_inf : float, probability of infection
infection_params (dict) – Parameters for the infection model containing: - k_inf : float, shape parameter for gamma distribution - theta_inf : float, scale parameter for gamma distribution
T (networkx.DiGraph, optional) – The transmission tree. If provided, the model will be initialized with this tree. Default is None.
- Raises:
KeyError – If any required parameter is missing from the input dictionaries.
- property T
- samp_t_inf_between(h1, h2)[source]
Sample a time of infection between two hosts.
Uses a rejection sampling method to sample the time of infection of the infected host using the chain model from Didelot et al. 2017.
- Parameters:
- Returns:
Time of infection of the host infected by h1 and the infector of h2.
- Return type:
Notes
This method implements the rejection sampling algorithm described in Didelot et al. (2017) for sampling infection times in transmission chains.
- add_root(t_sampl, id='0', genetic_data=[], t_inf=0, t_sample=None)[source]
Add the root host to the transmission tree.
- Parameters:
t_sampl (float) – Sampling time of the root host.
id (str, optional) – Identifier for the root host. Default is “0”.
genetic_data (list, optional) – Genetic data for the root host. Default is empty list.
t_inf (float, optional) – Infection time of the root host. Default is 0.
t_sample (float, optional) – Sampling time of the root host. Default is None.
- Returns:
The root host object.
- Return type:
- successors(host)[source]
Get the successors (children) of a given host in the transmission tree.
- Parameters:
host (host) – The host node whose successors are to be returned.
- Returns:
An iterator over the successors of the host.
- Return type:
iterator
- out_degree(host)[source]
Get the out-degree (number of children) of a host in the transmission tree.
- compute_Delta_loc_prior(T_new)[source]
Compute the change in the location prior log-likelihood for a new tree.
- Parameters:
T_new (networkx.DiGraph) – The new transmission tree.
- Returns:
(Delta log prior, new log prior, old log prior, old correction log-likelihood)
- Return type:
- get_candidates_to_chain()[source]
Get the list of candidate hosts for chain moves in the transmission tree.
- Returns:
List of candidate host nodes for chain moves.
- Return type:
- get_N_candidates_to_chain(recompute=False)[source]
Get the number of candidate hosts for chain moves, optionally recomputing the list.
- get_root_subtrees()[source]
Retrieve the root subtrees of the transmission tree.
This method searches for the first sampled siblings of the root host in the transmission tree and stores them in the roots_subtrees attribute.
- Returns:
A list of root subtrees.
- Return type:
- get_unsampled_hosts()[source]
Get the list of unsampled hosts in the transmission tree (excluding the root).
- Returns:
List of unsampled host nodes.
- Return type:
- get_sampling_model_likelihood(hosts=None, T=None, update=False)[source]
Compute the likelihood of the sampling model.
Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_sampling_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the sampling model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Delta_log_sampling(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the sampling model.
- Parameters:
- Returns:
Change in log-likelihood for the sampling model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the sampling model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- get_offspring_model_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- get_offspring_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the offspring model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Delta_log_offspring(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the offspring model.
- Parameters:
- Returns:
Change in log-likelihood for the offspring model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the offspring model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- get_infection_model_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Parameters:
- Returns:
L – The likelihood of the infection model given the list of hosts
- Return type:
- get_infection_model_log_likelihood(hosts=None, T=None, update=False)[source]
Computes the likelihood of the infection model given a list of hosts. If no list is given, the likelihood of the whole transmission tree is returned.
- Parameters:
- Returns:
L – The likelihood of the infection model given the list of hosts
- Return type:
- Delta_log_infection(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for the infection model.
- Parameters:
- Returns:
Change in log-likelihood for the infection model.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the infection model at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- log_likelihood_host(host, T=None)[source]
Computes the log likelihood of a host given the transmission tree. :param host: :type host: host object :param T: :type T: DiGraph object
- Returns:
log_likelihood – The log likelihood of the host in the transmission network
- Return type:
- Delta_log_likelihood_host(hosts, T_end, T_ini=None)[source]
Compute the change in log-likelihood for a host.
- Parameters:
- Returns:
Change in log-likelihood for the host.
- Return type:
Notes
The function operates as follows:
Computes the log-likelihood for the host at T_end.
If T_ini is provided, subtracts the log-likelihood at T_ini.
Returns the difference.
- log_posterior_transmission_tree()[source]
Compute the log-posterior of the current transmission tree.
This method calculates the log-posterior probability of the current transmission tree by summing the log-likelihood of the tree and any additional prior log-probabilities, such as genetic and location priors, if they are defined.
- Returns:
The computed log-posterior of the current transmission tree.
- Return type:
Notes
- The log-posterior is computed as:
log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)
- The method uses the following attributes:
self.log_likelihood: Log-likelihood of the transmission tree.
self.genetic_log_prior: Log-prior from the genetic model (if defined).
self.same_location_log_prior: Log-prior from the location model (if defined).
- get_log_posterior_transmission_tree(T)[source]
Compute and update the log-posterior of the transmission tree.
This method calculates the log-posterior probability of the given transmission tree T by combining the log-likelihood of the tree with any additional prior log-probabilities, such as genetic and location priors, if they are defined. The computed log-posterior and any relevant prior log-likelihoods are stored as attributes of the object.
- Parameters:
T (networkx.DiGraph) – The transmission tree for which to compute the log-posterior.
- Returns:
The computed log-posterior of the transmission tree.
- Return type:
Notes
- The log-posterior is computed as:
log_posterior = log_likelihood + genetic_log_prior (if defined) + same_location_log_prior (if defined)
- The method also updates the following attributes:
self.log_posterior
self.genetic_log_prior (if applicable)
self.same_location_log_prior (if applicable)
- show_log_likelihoods(hosts=None, T=None, verbose=False)[source]
Print and return the log-likelihoods for the sampling, offspring, and infection models.
- Parameters:
hosts (list, optional) – List of host objects to compute log-likelihoods for. If None, computes for all hosts in T.
T (networkx.DiGraph, optional) – Transmission tree. If None, uses self.T.
verbose (bool, optional) – If True, prints the log-likelihoods. Default is False.
- Returns:
(LL_sampling, LL_offspring, LL_infection): Log-likelihoods for the sampling, offspring, and infection models.
- Return type:
- log_likelihood_transmission_tree_old(T)[source]
Compute the log-likelihood of the entire transmission tree using the old method.
- Parameters:
T (networkx.DiGraph) – Transmission tree to compute the log-likelihood for.
- Returns:
The log-likelihood of the transmission tree.
- Return type:
- add_genetic_prior(mu_gen, gen_dist)[source]
Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.
- Parameters:
mu_gen (float) – Mutation rate
gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.
- add_same_location_prior(P_NM, tau, loc_dist)[source]
Adds a genetic prior to the model that computes the likelihood that two sampled hosts has a relationship given the genetic distance of the virus of the hosts. Two nodes are considered that has a relationship if the only hosts that are on they are connected through unsampled hosts.
- Parameters:
log_K (float) – Log probability of two hosts not being in the same location
gen_dist (np.array) – Genetic distance matrix of the virus of the hosts. The index has to be identical to the index of the hosts.
- create_transmision_phylogeny_nets(N, mu, P_mut)[source]
N: Number of hosts mu: Mutation rate P_mut: Prob of mutation
- save_json(filename)[source]
Save the transmission tree to a JSON file.
- Parameters:
filename (str) – Path to the output JSON file.
- classmethod json_to_tree(filename, sampling_params=None, offspring_params=None, infection_params=None)[source]
Load a transmission model from a JSON file and reconstruct the model object.
- Parameters:
filename (str) – Path to the JSON file.
sampling_params (dict, optional) – Sampling parameters to override those in the file. Default is None.
offspring_params (dict, optional) – Offspring parameters to override those in the file. Default is None.
infection_params (dict, optional) – Infection parameters to override those in the file. Default is None.
- Returns:
The reconstructed transmission model.
- Return type:
- infection_time_from_sampling_step(selected_host=None, metHast=True, verbose=False)[source]
Propose and possibly accept a new infection time for a sampled host using the Metropolis-Hastings algorithm.
This method samples a new infection time for a selected host (or a random sampled host if not provided), computes the acceptance probability, and updates the host’s infection time if the proposal is accepted.
- Parameters:
selected_host (host, optional) – The host whose infection time will be changed. If None, a random sampled host is selected.
metHast (bool, optional) – If True, use the Metropolis-Hastings algorithm to accept or reject the proposal. Default is True.
verbose (bool, optional) – If True, print detailed information about the proposal. Default is False.
- Returns:
t_inf_new (float) – The proposed new infection time.
gg (float) – Proposal ratio for the Metropolis-Hastings step.
pp (float) – Likelihood ratio for the Metropolis-Hastings step.
P (float) – Acceptance probability for the Metropolis-Hastings step.
selected_host (host) – The host whose infection time was proposed to change.
- infection_time_from_infection_model_step(selected_host=None, metHast=True, Dt_new=None, verbose=False)[source]
Method to change the infection time of a host and then accept the change using the Metropolis Hastings algorithm.
- Parameters:
selected_host (host object, default=None) – Host whose infection time will be changed. If None, a host is randomly selected.
metHast (bool, default=True) – If True, the Metropolis Hastings algorithm is used to accept or reject the change.
Dt_new (float, default=None) – New infection time for the host. If None, a new time is sampled.
verbose (bool, default=False) – If True, prints the results of the step.
- add_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, verbose=False, only_geometrical=False, detailed_probs=False)[source]
Method to propose the addition of an unsampled host to the transmission tree and get the probability of the proposal.
Parameters:
- selected_host: host object
Host to which the unsampled host will be added. If None, a host is randomly selected.
- P_add: float
Probability of proposing to add a new host to the transmission tree.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- verbose: bool
If True, prints the results of the step.
- only_geometrical: bool
If True, only the proposal of the new topological structure will be considered.
- detailed_probs: bool
If True, the method will return both probabilities of the proposals, of adding and removing a host.
Returns:
- T_new: DiGraph object
New transmission tree with the proposed changes.
- gg: float
Ratio of the probabilities of the proposals.
- g_go: float
Probability of the proposal of adding a host.
- g_ret: float
Probability of the proposal of removing a host.
- prob_time: float
Probability of the time of infection of the new host.
- unsampled: host object
Unsampeld host to be added to the transmission tree.
- added: bool
If True, the host was added to the transmission tree.
- remove_unsampled_with_times(selected_host=None, P_add=0.5, P_rewiring=0.5, P_off=0.5, only_geometrical=False, detailed_probs=False, verbose=False)[source]
Method to propose the removal of an unsampled host from the transmission tree and get the probability of the proposal. In case that no unsampled hosts are available, a new host is proposed to be added to the transmission tree.
Parameters:
- selected_host: host object
Unsampled host to be removed from the transmission tree. If None, a host is randomly selected.
- P_add: float
Probability of proposing to add a new host to the transmission tree.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- verbose: bool
If True, prints the results of the step.
- only_geometrical: bool
If True, only the proposal of the new topological structure will be considered.
- detailed_probs: bool
If True, the method will return both probabilities of the proposals, of adding and removing a host.
Returns:
- T_new: DiGraph object
New transmission tree with the proposed changes.
- gg: float
Ratio of the probabilities of the proposals.
- g_go: float
Probability of the proposal of adding a host.
- g_ret: float
Probability of the proposal of removing a host.
- prob_time: float
Probability of proposing the time of the selected_host.
- added: bool
If True, the host was added to the transmission tree. Else, the node have been removed
- add_remove_step(P_add=0.5, P_rewiring=0.5, P_off=0.5, metHast=True, verbose=False)[source]
Method to propose the addition or removal of an unsampled host to the transmission tree and get the probability of the proposal.
Parameters:
- P_add: float
Probability of proposing an addition of an unsampled host. Else, an unsampled host is going to be proposed for removal.
- P_rewiring: float
Probability of rewiring the new host to another sibling host.
- P_off: float
Probability to rewire the new host to be a leaf.
- metHast: bool
If True, the Metropolis Hastings algorithm is used to accept or reject the change.
- verbose: bool
If True, prints the results of the step.
Returns:
- class transmission_models.classes.genetic_prior_tree(model, mu, distance_matrix)[source]
Bases:
object- __init__(model, mu, distance_matrix)[source]
Initialize the genetic prior tree object.
- Parameters:
model (object) – The transmission model containing the tree structure.
mu (float) – The mutation rate parameter for the Poisson distribution.
distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.
Notes
This initializes the genetic prior calculator with: - A Poisson distribution with rate mu for modeling genetic distances - A distance matrix for pairwise host comparisons - A reference to the transmission model
- static search_firsts_sampled_siblings(host, T, distance_matrix)[source]
Find all sampled siblings of a host in the transmission tree.
- Parameters:
host (object) – The host for which to find sampled siblings.
T (networkx.DiGraph) – The transmission tree.
distance_matrix (numpy.ndarray) – Matrix containing pairwise genetic distances between hosts.
- Returns:
List of sampled sibling hosts that have genetic distance data.
- Return type:
Notes
This method recursively searches through the tree to find all sampled hosts that are descendants of the given host and have valid genetic distance data (non-NaN values in the distance matrix).
- static search_first_sampled_parent(host, T, root)[source]
Find the first sampled ancestor of a host in the transmission tree.
- Parameters:
host (object) – The host for which to find the first sampled parent.
T (networkx.DiGraph) – The transmission tree.
root (object) – The root host of the transmission tree.
- Returns:
The first sampled parent host, or None if no sampled parent is found.
- Return type:
object or None
Notes
This method traverses up the tree from the given host until it finds the first sampled ancestor, or reaches the root without finding one.
- static get_mut_time_dist(hp, hs)[source]
Calculate the mutation time distance between two hosts.
- Parameters:
- Returns:
The mutation time distance: (hs.t_sample + hp.t_sample - 2 * hp.t_inf).
- Return type:
Notes
This calculates the time available for mutations to accumulate between the sampling times of two hosts, accounting for their common infection time.
- get_closest_sampling_siblings(T=None, verbose=False)[source]
Calculate log-likelihood correction for closest sampling siblings.
- Parameters:
T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.
verbose (bool, optional) – If True, print detailed information during calculation.
- Returns:
The log-likelihood correction value.
- Return type:
Notes
This method calculates correction terms for the genetic prior by finding the closest sampled siblings for each host and computing the log-likelihood of their genetic distances based on the time difference between sampling events.
- prior_host(host, T, parent_dist=False)[source]
Calculate the log prior for a specific host in the transmission tree.
- Parameters:
host (object) – The host for which to calculate the log prior.
T (networkx.DiGraph) – The transmission tree.
parent_dist (bool, optional) – If True, include parent distance in the calculation. Default is False.
- Returns:
The log prior value for the host.
- Return type:
Notes
This method calculates the log prior by considering: 1. Direct connections to sampled hosts 2. Connections to sampled siblings through unsampled intermediate hosts 3. Parent distance (if parent_dist=True)
The calculation uses Poisson distributions based on the mutation rate and time differences between sampling events.
- prior_pair(h1, h2)[source]
Calculate the log prior for a pair of hosts.
- Parameters:
- Returns:
The log prior value for the pair, or 0 if either host is not sampled.
- Return type:
Notes
This method calculates the log prior for the genetic distance between two hosts based on their sampling time difference and the Poisson distribution with rate mu * Dt.
- log_prior_host_list(host_list, T=None)[source]
Calculate the total log prior for a list of hosts.
- Parameters:
host_list (list) – List of hosts for which to calculate the log prior.
T (networkx.DiGraph, optional) – The transmission tree. If None, uses self.model.T.
- Returns:
The sum of log priors for all hosts in the list.
- Return type:
Notes
This method iterates through the host list and sums the log priors for each individual host using the log_prior_host method.
- log_prior_host(host, T=None)[source]
Compute the log prior for a host.
- Parameters:
- Returns:
The log prior value for the host.
- Return type:
Notes
The function operates as follows:
Computes the log prior for the host based on the transmission tree.
Returns the log prior value.
- log_prior_T(T, update_up=True, verbose=False)[source]
Calculate the total log prior for an entire transmission tree.
- Parameters:
T (networkx.DiGraph) – The transmission tree.
update_up (bool, optional) – If True, include correction terms for closest sampling siblings. Default is True.
verbose (bool, optional) – If True, print detailed information during calculation.
- Returns:
The total log prior value for the transmission tree.
- Return type:
Notes
This method calculates the complete log prior for a transmission tree by: 1. Iterating through all hosts and their connections 2. Computing log-likelihoods for direct connections to sampled hosts 3. Computing log-likelihoods for connections to sampled siblings through unsampled hosts 4. Adding correction terms for closest sampling siblings (if update_up=True)
The calculation uses Poisson distributions based on mutation rates and time differences.
- Delta_log_prior(host, T_end, T_ini)[source]
Calculate the difference in log prior between two transmission tree states.
- Parameters:
host (object) – The host for which to calculate the log prior difference.
T_end (networkx.DiGraph) – The final transmission tree state.
T_ini (networkx.DiGraph) – The initial transmission tree state.
- Returns:
The difference in log prior: log_prior(T_end) - log_prior(T_ini).
- Return type:
Notes
This method calculates how the log prior changes when a transmission tree transitions from state T_ini to T_end. It considers: 1. Changes in parent relationships 2. Changes in sibling relationships
The calculation is useful for MCMC acceptance ratios where only the difference in log prior is needed, not the absolute values.
- class transmission_models.classes.location_distance_prior_tree(model, mu, distance_matrix)[source]
Bases:
object
- class transmission_models.classes.same_location_prior_tree(model, P_NM, tau, distance_matrix)[source]
Bases:
objectClass to compute the prior of the location of the hosts in the tree. The prior model computes which is the probability that a hosts stays where it lives in a characteristic time tau. It will stay where it lives with a probability exp(-t*P_NM/tau) where P is the probability that the host no moves in tau.
- class transmission_models.classes.MCMC(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]
Bases:
objectMarkov Chain Monte Carlo sampler for transmission tree inference.
This class implements MCMC sampling algorithms for transmission network inference using various proposal mechanisms.
- Parameters:
model (didelot_unsampled) – The transmission tree model to sample from.
P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.
P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.
P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.
P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.
P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.
P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.
P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.
- Variables:
model (didelot_unsampled) – The transmission model being sampled.
P_rewire (float) – Probability of rewiring moves.
P_add_remove (float) – Probability of add/remove moves.
P_t_shift (float) – Probability of time shift moves.
P_add (float) – Probability of adding vs removing hosts.
P_rewire_add (float) – Probability of rewiring added hosts.
P_offspring_add (float) – Probability of offspring vs chain model for added hosts.
P_to_offspring (float) – Probability of moving to offspring model.
- __init__(model, P_rewire=0.3333333333333333, P_add_remove=0.3333333333333333, P_t_shift=0.3333333333333333, P_add=0.5, P_rewire_add=0.5, P_offspring_add=0.5, P_to_offspring=0.5)[source]
Initialize the MCMC sampler.
- Parameters:
model (didelot_unsampled) – The transmission tree model to sample from.
P_rewire (float, optional) – The probability of rewiring a transmission tree. Default is 1/3.
P_add_remove (float, optional) – The probability of adding or removing an unsampled host in the transmission tree. Default is 1/3.
P_t_shift (float, optional) – The probability of shifting the infection time of the host in the transmission tree. Default is 1/3.
P_add (float, optional) – The probability of adding a new host to the transmission tree once the add/remove have been proposed. Default is 0.5.
P_rewire_add (float, optional) – The probability of rewiring the new unsampled host once the add have been proposed. Default is 0.5.
P_offspring_add (float, optional) – The probability that the new unsampled host is an offspring once the add and rewire have been proposed. Default is 0.5.
P_to_offspring (float, optional) – The probability of moving to offspring model during rewiring. Default is 0.5.
- MCMC_iteration(verbose=False)[source]
Perform an MCMC iteration on the transmission tree model.
- Parameters:
verbose (bool, optional) – Whether to print the progress of the MCMC iteration. Default is False.
- Returns:
A tuple containing:
- movestr
The type of move proposed (‘rewire’, ‘add_remove’, or ‘time_shift’).
- ggfloat
The ratio of proposal probabilities.
- ppfloat
The ratio of posterior probabilities.
- Pfloat
The acceptance probability.
- acceptedbool
Whether the move was accepted.
- DLfloat
The difference in log likelihood.
- Return type:
Notes
The function operates as follows:
Selects a move type at random.
Performs the move and computes acceptance probability.
Returns move details and acceptance status.
MCMC Module
MCMC Module.
This module contains Markov Chain Monte Carlo sampling algorithms for transmission network inference.
Main Classes
MCMC : Main MCMC sampler class for transmission tree inference
The MCMC module provides methods for sampling from the posterior distribution of transmission trees using various proposal mechanisms including: - Tree topology changes (rewiring) - Adding/removing unsampled hosts - Infection time updates
- transmission_models.classes.mcmc.random() x in the interval [0, 1).