api module
With the Distribution class, one can :
define a distribution with a fixed set of parameters,
estimate parameters given a sample (estimate, alias fit),
calculate the pdf, cdf, quantile (alias ppf`) or generate random` samples (alias rvs`),
profile the distribution parameters through a simulation study (profile),
evaluate expectations on the distibution using quadrature (expectation, alias expect),
calculate the skewness and kurtosis of the distribution.
The estimate method outputs a subclass of Distribution called EstimatedDistribution which inherits the methods from Distribution in addition to a summary, vcov, bic, aic, loglik` and a plot` method.
The profile method outputs a subclass of Distribution called ProfiledDistribution which inherits the methods from Distribution to summary, conversion to a pandas dataframe and plot` methods for the Mean Squared Error (MSE) of the estimated parameters across different sample sizes. Additionally, an adjusted set of methods for pdf, cdf, quantile and random are available which make use of the profiled parameters instead of the estimated parameters (or randomly selected if none provided), allowing one to evaluate the distribution on the profiled parameters based on random draws from different sample sizes.
- class api.Distribution(name='norm', mu=0, sigma=1, skew=0, shape=5, lamda=0)[source]
Distribution Class
All distributions are parameterized in terms of mean and standard deviation (sigma).
- Parameters:
name – the distribution name. Valid distribution are the Normal (‘norm’), Student (‘std’) and Generalized Error (‘ged’) distributions; the skewed variants of these based on the transformations in [Fernandez and Steel, 1998], which are the Skew Normal (‘snorm’), Skew Student (‘sstd’) and Skew Generalized Error (‘sged`) distributions; The reparameterized version of [Johnson, 1949] SU distribution (‘jsu’); the Generalized Hyperbolic (‘sgh’) distribution of [Barndorff-Nielsen, 1977] and the Generalized Hyperbolic Skew Student (sghst) distribution of [Aas and Haff, 2006]
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – additional shape parameter for the Generalized Hyperbolic distribution
- cdf(q: ArrayLike, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, lower_tail: bool = True) NDArray[float64][source]
Cumulative Probability Function
The distribution parameters are read from the class object if they are not None.
- Parameters:
q – a vector of quantiles
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]
- Return type:
a numpy array
- pdf(x: ArrayLike, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, log: bool = False) NDArray[float64][source]
Probability Density Function
The distribution parameters are read from the class object if they are not None.
- Parameters:
x – a vector of quantiles
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
log – whether to return the log density
- Return type:
a numpy array
- quantile(p: ArrayLike, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, lower_tail: bool = True) NDArray[float64][source]
Quantile Function
This method is also accessible via the ppf alias.
The distribution parameters are read from the class object if they are not None.
- Parameters:
p – a vector of probabilities
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]
- Return type:
a numpy array
- ppf(p: ArrayLike, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, lower_tail: bool = True) NDArray[float64][source]
Alias method for quantile function
- random(n: int = 1, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, seed: int | None = None) NDArray[float64][source]
Random Number Function
The distribution parameters are read from the class object if they are not None.
- Parameters:
n – the number of draws
mu – the mean
sigma – the standard deviation
skew – the skew parameter
shape – the shape parameter
lamda – the GH lamda parameter
seed – an optional value to initialize the random seed generator
- Return type:
a numpy array
- rvs(n: int = 1, mu: ArrayLike | None = None, sigma: ArrayLike | None = None, skew: ArrayLike | None = None, shape: ArrayLike | None = None, lamda: ArrayLike | None = None, seed: int | None = None) NDArray[float64][source]
Alias method for random function
- estimate(x: ArrayLike, fixed: Dict[str, float] | None = None, method: str = 'L-BFGS-B', tol: float = 1e-08, options: dict = {'disp': False, 'maxiter': 200}, type: str = 'AD') EstimatedDistribution[source]
Parameter Estimation
Given a vector x and optionally any fixed values, estimates the parameters of the distribution using the scipy minimize function and pytorch based gradient of the likelihood. The method returns an object of class EstimatedDistribution with additional methods for summary, vcov, coef etc. For the sghst and sgh distribution, only type = ‘FD’ is supported until such time as the modified Bessel function of the second kind is implemented in pytorch.
- Parameters:
x – a vector representing a stationary series
fixed – an optional dictionary of name-value pairs which are fixed instead of estimated
method – the scipy algorithm to use for estimation
tol – termination tolerance
options – a dictionary of options to pass to the scipy minimize function
type – the type of numerical differentiation to use. Valid choices are ‘FD’ for finite differences and ‘AD’ for automatic differentiation
- Return type:
an object of class EstimatedDistribution
- fit(x: ArrayLike, fixed: Dict[str, float] | None = None, method: str = 'L-BFGS-B', tol: float = 1e-08, options: dict = {'disp': False, 'maxiter': 200}, type: str = 'AD') EstimatedDistribution[source]
Alias method for estimate
- profile(sim: int = 100, size: list = [100, 200, 400, 800, 1000, 1500, 2000, 4000], method='Nelder-Mead', num_workers: int | None = None) ProfiledDistribution[source]
Profile Distribution
Given a distribution, estimates the parameters for a range of sample sizes and simulations. The method returns an object of class ProfiledDistribution with the estimated parameters and the root mean squared error (RMSE) of the estimates. The RMSE is calculated as the square root of the mean squared error (MSE) of the estimates, where the MSE is the sum of the squared difference between the estimated and true parameters for each simulation and sample size combination, divided by the number of simulations.
- Parameters:
sim – the number of simulations
size – the sample sizes
num_workers – the number of workers to use for parallel processing
- Return type:
an object of class ProfileDistribution
- expectation(fun_str: str = 'np.abs(x)', type: str = 'd', lower: float = -inf, upper: float = inf)[source]
Expectation of a custom function over either the pdf, cdf or quantile
Given a custom numpy function on x, evaluates and returns the expectation based on the given bounds and distribution parameters, using numerical quadrature.
- Parameters:
fun_str – a valid string representing a function of x for which the expectation will be calculated given the distribution.
type – valid choices are d, p and q representing the pdf, cdf and quantile functions.
lower – the lower bound for the integral
upper – the upper bound for the integral
- Return type:
the expectation
- class api.EstimatedDistribution(*, name: str = 'norm', mu: float = 0, sigma: float = 1, skew: float = 0.9, shape: float = 5, lamda: float = 0, parameters: NDArray[float64], hessian: NDArray[float64], scores: NDArray[float64], scaler: NDArray[float64], index: NDArray[float64], loglikelihood: float, x: NDArray[float64], no_obs: int, solution: Any)[source]
Estimated Distribution Class
Generated when calling the estimate methodd on a Distribution object.
- vcov(type: str = 'H') ArrayLike[source]
Variance-Covariance Matrix of Parameter Estimates
- Parameters:
type – the vcov type. Valid choices are H (Hessian), OPG (outer product of gradients) and QMLE (Quasi Maximum Likelihood)
- Return type:
a numpy vector
- summary(type: str = 'H', decimals: int = 2, numalign='decimal', tablefmt: str = 'pretty')[source]
Parameter Estimation Summary
Provides a printout summary of the parameter estimates.
- Parameters:
type – the vcov type. Valid choices are H (Hessian), OPG (outer product of gradients) and QMLE (Quasi Maximum Likelihood)
decimals – number of decimals to print
numalign – number alignment for package tabulate
tablefmt – table format for package tabulate
- Return type:
a console printout of the summary
- class api.ProfiledDistribution(*, name: str = 'norm', mu: float = 0, sigma: float = 1, skew: float = 0.9, shape: float = 5, lamda: float = 0, dist: Dict[int, ndarray], rmse: Dict[int, ndarray], sim: int, size: list)[source]
Profiled Distribution Class
Generated when calling the profile methodd on a Distribution object.
- summary(numalign='center', floatfmt='.2f', tablefmt: str = 'psql')[source]
Profile Distribution Summary
Provides a printout summary of the profile distribution results.
- Parameters:
numalign – number alignment for package tabulate
floatfmt – float format for package tabulate
tablefmt – table format for package tabulate
- Return type:
a console printout of the summary
- pandas(type: str = 'wide') DataFrame[source]
Profiled Distribution to Pandas DataFrame
- Parameters:
type – the type of formatted output. Valid choices are ‘wide’ and ‘long’
- Return type:
a pandas DataFrame
- plot(parameter: str = 'mu') ggplot[source]
Profiled Distribution MSE Box Plot
- Parameters:
parameter – the parameter to plot. Valid choices are ‘mu’, ‘sigma’, ‘skew’, ‘shape’ and ‘lambda’
- Return type:
a ggplot
- cdf(q: ArrayLike, sim: int | None = None, size: int | None = None, lower_tail: bool = True) NDArray[float64][source]
Overridden method to calculate a modified Cumulative Probability Function specific to ProfiledDistribution.
The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).
- Parameters:
q – a vector of quantiles
sim – the simulation number
size – the sample size
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]
- Return type:
a numpy array
- pdf(x: ArrayLike, sim: int | None = None, size: int | None = None, log: bool = False) NDArray[float64][source]
Overridden method to calculate a modified Probability Density Function specific to ProfiledDistribution.
The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).
- Parameters:
x – a vector of quantiles
sim – the simulation number
size – the sample size
log – whether to return the log density
- Return type:
a numpy array
- quantile(p: ArrayLike, sim: int | None = None, size: int | None = None, lower_tail: bool = True) NDArray[float64][source]
Overridden method to calculate a modified Probability Density Function specific to ProfiledDistribution.
The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).
- Parameters:
p – a vector of probabilities
sim – the simulation number
size – the sample size
lower_tail – if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x]
- Return type:
a numpy array
- ppf(p: ArrayLike, sim: int | None = None, size: int | None = None, lower_tail: bool = True) NDArray[float64][source]
Alias method for quantile function
- random(n: int = 1, sim: int | None = None, size: int | None = None, seed: int | None = None) NDArray[float64][source]
Overridden method to generate a modified random sample specific to ProfiledDistribution.
The distribution parameters are selected randomly from the profile distribution if they are not specified, else can be selected based on their index (sim and size).
- Parameters:
n – the number of draws
sim – the simulation number
size – the sample size
seed – an optional value to initialize the random seed generator
- Return type:
a numpy array