api module

With the Distribution class, one can :

define a distribution with a fixed set of parameters,
estimate parameters given a sample (estimate, alias fit),
calculate the pdf, cdf, quantile (alias ppf`) or generate random` samples (alias rvs`),
profile the distribution parameters through a simulation study (profile),
evaluate expectations on the distibution using quadrature (expectation, alias expect),
calculate the skewness and kurtosis of the distribution.

The estimate method outputs a subclass of Distribution called EstimatedDistribution which inherits the methods from Distribution in addition to a summary, vcov, bic, aic, loglik` and a plot` method.

The profile method outputs a subclass of Distribution called ProfiledDistribution which inherits the methods from Distribution to summary, conversion to a pandas dataframe and plot` methods for the Mean Squared Error (MSE) of the estimated parameters across different sample sizes. Additionally, an adjusted set of methods for pdf, cdf, quantile and random are available which make use of the profiled parameters instead of the estimated parameters (or randomly selected if none provided), allowing one to evaluate the distribution on the profiled parameters based on random draws from different sample sizes.

class api.Distribution(name='norm', mu=0, sigma=1, skew=0, shape=5, lamda=0)[source]

Distribution Class

All distributions are parameterized in terms of mean and standard deviation (sigma).

Parameters:

name – the distribution name. Valid distribution are the Normal (‘norm’), Student (‘std’) and Generalized Error (‘ged’) distributions; the skewed variants of these based on the transformations in [Fernandez and Steel, 1998], which are the Skew Normal (‘snorm’), Skew Student (‘sstd’) and Skew Generalized Error (‘sged`) distributions; The reparameterized version of [Johnson, 1949] SU distribution (‘jsu’); the Generalized Hyperbolic (‘sgh’) distribution of [Barndorff-Nielsen, 1977] and the Generalized Hyperbolic Skew Student (sghst) distribution of [Aas and Haff, 2006]
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – additional shape parameter for the Generalized Hyperbolic distribution

Cumulative Probability Function

The distribution parameters are read from the class object if they are not None.

Parameters:

q – a vector of quantiles
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]

Return type:

a numpy array

Probability Density Function

The distribution parameters are read from the class object if they are not None.

Parameters:

x – a vector of quantiles
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
log – whether to return the log density

Return type:

a numpy array

Quantile Function

This method is also accessible via the ppf alias.

The distribution parameters are read from the class object if they are not None.

Parameters:

p – a vector of probabilities
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]

Return type:

a numpy array

ppf(p: ArrayLike, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, lower_tail: bool = True) → NDArray[float64][source]: Alias method for quantile function

Random Number Function

The distribution parameters are read from the class object if they are not None.

Parameters:

n – the number of draws
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
seed – an optional value to initialize the random seed generator

Return type:

a numpy array

rvs(n: int = 1, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, seed: int | None = None) → NDArray[float64][source]: Alias method for random function

skewness()[source]: Distribution Skewness

kurtosis()[source]: Distribution Kurtosis

estimate(x: ArrayLike, fixed: Dict[str, float] | None = None, method: str = 'L-BFGS-B', tol: float = 1e-08, options: dict = {'disp': False, 'maxiter': 200}, type: str = 'AD') → EstimatedDistribution[source]

Parameter Estimation

Given a vector x and optionally any fixed values, estimates the parameters of the distribution using the scipy minimize function and pytorch based gradient of the likelihood. The method returns an object of class EstimatedDistribution with additional methods for summary, vcov, coef etc. For the sghst and sgh distribution, only type = ‘FD’ is supported until such time as the modified Bessel function of the second kind is implemented in pytorch.

Parameters:

x – a vector representing a stationary series
fixed – an optional dictionary of name-value pairs which are fixed instead of estimated
method – the scipy algorithm to use for estimation
tol – termination tolerance
options – a dictionary of options to pass to the scipy minimize function
type – the type of numerical differentiation to use. Valid choices are ‘FD’ for finite differences and ‘AD’ for automatic differentiation

Return type:

an object of class EstimatedDistribution

fit(x: ArrayLike, fixed: Dict[str, float] | None = None, method: str = 'L-BFGS-B', tol: float = 1e-08, options: dict = {'disp': False, 'maxiter': 200}, type: str = 'AD') → EstimatedDistribution[source]: Alias method for estimate

profile(sim: int = 100, size: list = [100, 200, 400, 800, 1000, 1500, 2000, 4000], method='Nelder-Mead', num_workers: int | None = None) → ProfiledDistribution[source]

Profile Distribution

Given a distribution, estimates the parameters for a range of sample sizes and simulations. The method returns an object of class ProfiledDistribution with the estimated parameters and the root mean squared error (RMSE) of the estimates. The RMSE is calculated as the square root of the mean squared error (MSE) of the estimates, where the MSE is the sum of the squared difference between the estimated and true parameters for each simulation and sample size combination, divided by the number of simulations.

Parameters:

sim – the number of simulations
size – the sample sizes
num_workers – the number of workers to use for parallel processing

Return type:

an object of class ProfileDistribution

expectation(fun_str: str = 'np.abs(x)', type: str = 'd', lower: float = -inf, upper: float = inf)[source]

Expectation of a custom function over either the pdf, cdf or quantile

Given a custom numpy function on x, evaluates and returns the expectation based on the given bounds and distribution parameters, using numerical quadrature.

Parameters:

fun_str – a valid string representing a function of x for which the expectation will be calculated given the distribution.
type – valid choices are d, p and q representing the pdf, cdf and quantile functions.
lower – the lower bound for the integral
upper – the upper bound for the integral

Return type:

the expectation

expect(fun_str: str = 'np.abs(x)', type: str = 'd', lower: float = -inf, upper: float = inf)[source]: Alias method for expectation

class api.EstimatedDistribution(*, name: str = 'norm', mu: float = 0, sigma: float = 1, skew: float = 0.9, shape: float = 5, lamda: float = 0, parameters: NDArray[float64], hessian: NDArray[float64], scores: NDArray[float64], scaler: NDArray[float64], index: NDArray[float64], loglikelihood: float, x: NDArray[float64], no_obs: int, solution: Any)[source]

Estimated Distribution Class

Generated when calling the estimate methodd on a Distribution object.

coef() → Dict[source]: Distribution Parameters

loglik()[source]: Log Likelihood

aic()[source]: Akaike’s Information Criterion

bic()[source]: Baysian Information Criterion

vcov(type: str = 'H') → ArrayLike[source]

Variance-Covariance Matrix of Parameter Estimates

Parameters:: type – the vcov type. Valid choices are H (Hessian), OPG (outer product of gradients) and QMLE (Quasi Maximum Likelihood)
Return type:: a numpy vector

summary(type: str = 'H', decimals: int = 2, numalign='decimal', tablefmt: str = 'pretty')[source]

Parameter Estimation Summary

Provides a printout summary of the parameter estimates.

Parameters:

type – the vcov type. Valid choices are H (Hessian), OPG (outer product of gradients) and QMLE (Quasi Maximum Likelihood)
decimals – number of decimals to print
numalign – number alignment for package tabulate
tablefmt – table format for package tabulate

Return type:

a console printout of the summary

plot(type='density') → ggplot[source]

Estimated Distribution PDF Plot

Parameters:: type – the type of plot. Valid choices are ‘density’ and ‘qq’
Return type:: a ggplot

class api.ProfiledDistribution(*, name: str = 'norm', mu: float = 0, sigma: float = 1, skew: float = 0.9, shape: float = 5, lamda: float = 0, dist: Dict[int, ndarray], rmse: Dict[int, ndarray], sim: int, size: list)[source]

Profiled Distribution Class

Generated when calling the profile methodd on a Distribution object.

summary(numalign='center', floatfmt='.2f', tablefmt: str = 'psql')[source]

Profile Distribution Summary

Provides a printout summary of the profile distribution results.

Parameters:

numalign – number alignment for package tabulate
floatfmt – float format for package tabulate
tablefmt – table format for package tabulate

Return type:

a console printout of the summary

pandas(type: str = 'wide') → DataFrame[source]

Profiled Distribution to Pandas DataFrame

Parameters:: type – the type of formatted output. Valid choices are ‘wide’ and ‘long’
Return type:: a pandas DataFrame

plot(parameter: str = 'mu') → ggplot[source]

Profiled Distribution MSE Box Plot

Parameters:: parameter – the parameter to plot. Valid choices are ‘mu’, ‘sigma’, ‘skew’, ‘shape’ and ‘lambda’
Return type:: a ggplot

cdf(q: ArrayLike, sim: int | None = None, size: int | None = None, lower_tail: bool = True) → NDArray[float64][source]

Overridden method to calculate a modified Cumulative Probability Function specific to ProfiledDistribution.

The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).

Parameters:

q – a vector of quantiles
sim – the simulation number
size – the sample size
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]

Return type:

a numpy array

pdf(x: ArrayLike, sim: int | None = None, size: int | None = None, log: bool = False) → NDArray[float64][source]

Overridden method to calculate a modified Probability Density Function specific to ProfiledDistribution.

The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).

Parameters:

x – a vector of quantiles
sim – the simulation number
size – the sample size
log – whether to return the log density

Return type:

a numpy array

quantile(p: ArrayLike, sim: int | None = None, size: int | None = None, lower_tail: bool = True) → NDArray[float64][source]

Overridden method to calculate a modified Probability Density Function specific to ProfiledDistribution.

The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).

Parameters:

p – a vector of probabilities
sim – the simulation number
size – the sample size
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]

Return type:

a numpy array

ppf(p: ArrayLike, sim: int | None = None, size: int | None = None, lower_tail: bool = True) → NDArray[float64][source]: Alias method for quantile function

random(n: int = 1, sim: int | None = None, size: int | None = None, seed: int | None = None) → NDArray[float64][source]

Overridden method to generate a modified random sample specific to ProfiledDistribution.

The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).

Parameters:

n – the number of draws
sim – the simulation number
size – the sample size
seed – an optional value to initialize the random seed generator

Return type:

a numpy array

rvs(n: int = 1, sim: int | None = None, size: int | None = None, seed: int | None = None) → NDArray[float64][source]: Alias method for random function