DMI
Home
About
DMI
Services
Publications
Software
Careers
Contact 
J.D. Opdyke, CV & Bio

DataMineit Tackles Big Data using SAS^{®}
Why wait over 21.5 hours for Proc SurveySelect when DataMineit bootstraps in under 80 seconds?* (download .pdf brochure)
NEW! – even faster, proprietary versions of OPDY and OPDN, the fastest SAS^{®} algorithms published in peer reviewed statistics journals for conducting Bootstraps, Permutation tests, and Sampling With and Without Replacement (download publications by J.D. Opdyke, Senior Managing Director, DataMineit, LLC at http://www.DataMineIt.com/DMI_publications.htm).
NEW RESULTS: SASbased OPDY and OPDN Algos OVER 5 ORDERS OF MAGNITUDE FASTER THAN STATA, and OVER ONE ORDER OF MAGNITUDE FASTER THAN MATLAB. Contact J.D. Opdyke at JDOpdyke@DataMineit.com for additional details.
• FAST: Orders of Magnitude Faster than SAS^{®} Procs OPDY_Boot_FT1 and OPDN_Perm_FT1 are modular, compiled SAS^{®} Macros that run exactly as do OPDY and OPDN (but even faster). On large datasets, which is the only time that speed and scalability matter, OPDY_Boot_FT1 executes bootstraps over 990x faster than the relevant “builtin” SAS^{®} procedure (Proc SurveySelect). Similarly, OPDN_Perm_FT1 executes permutation tests over 530x faster than Proc SurveySelect, over 400x faster than Proc NPAR1WAY (which crashes on datasets/strata less than a tenth the size of those OPDN_Perm_FT1 can process), and over 5,970x faster than Proc Multtest (that’s over 7 days vs. under 2 minutes).
• AFFORDABLE: Only Base SAS^{®} is Required
• SCALABLE: Linear Runtime Both OPDY_Boot_FT1 and OPDN_Perm_FT1 are truly scalable: their time complexity is linear, which is not the case for the relevant SAS^{®} Procedures.
• ROBUST: Theoretically Unlimited Dataset Size The storage complexity (only memory, no I/O) of the algorithm is linear in the number of records in the largest stratum, not the size of the dataset, so the algorithm can handle theoretically unlimited dataset size with any number of strata. The SAS^{®} Procs either crash, or become prohibitively slow as dataset/strata sizes increase.
• GENERALIZABLE: Multivariate Regression Both algorithms are very generalizable. DataMineIt can modify OPDN_Perm_FT1 to conduct permutation tests using any sample statistic, and for multivariate regression, DataMineIt has modified versions of OPDY_Boot_FT1 available to users for performing bootstraps on econometric models.
CONTACT: Please contact J.D. Opdyke, Senior
Managing Director, DataMineit, LLC,
Finance/Market Risk Management Statistical Software
Sharpe Ratios are ubiquitous in financial analysis. Funds continuously are ranked by the Sharpe Ratio the world over. Yet these rankings never are accompanied by pvalues or confidence intervals indicating the likelihood that observed differences between two funds' ratios actually are caused by true differences in performance as opposed to random sampling error. To be able to state that one fund's Sharpe ratio is larger than that of another, with 95% or 99% statistical confidence, would be highly valuable whenever one was performing a riskadjusted performance assessment, via rankings of funds or a myriad of other approaches. Previous tests comparing Sharpe Ratios either were complex and computationally intensive, or relied on restrictive and highly unrealistic assumptions about the financial returns data being analyzed. But the statistical tests presented in the below Excel spreadsheets and SAS Programs relax these constraints, and are the first to provide such statistical tests, fully automated, on easily useable and universal platforms. Thus can financial analysts determine whether one fund's riskadjusted performance truly is larger than that of another, with statistical significance.
Sharpe Ratios:  Opdyke, J.D., (2006), Comparing Sharpe Ratios: So where are the pvalues?  preprint SAS Program (email for 1time password)  pvalues from Sharpe Ratio comparisons and Mutual Fund Rankings (.pdf results)  Excel Workbook (.xls 1.4MB) (email for 1time password) pvalues from Sharpe Ratio comparisons and Mutual Fund Rankings  JSM2006 PowerPoint Presentation
Permutation Test Statistical Software (download .pdf summary)
Permutation tests are often and increasingly the statistical test of choice when using data to answer business and research questions across an incredibly wide range of circumstances – literally wherever data samples are being used to address hypotheses. This is because permutation tests require minimal assumptions about the data being examined, yet often have statistical power equal to – and sometimes even greater than – their parametric counterparts that require stronger, and sometimes untenable data assumptions. And unlike many parametric and other nonparametric tests, the results of permutation tests (the pvalues) are unbiased. Several statistical software vendors offer products with permutation test capabilities, but they are limited  none can perform permutation tests within reasonable timeframes when samples are not very small and many tests are required. These products have prohibitive runtimes under these conditions (if they run at all) because the steps required to carry out a permutation test are computationally intensive.
DataMineIt’s solution to the computational demands of permutation tests is PermuteIt^{TM} – statistical software that performs fast, twosample permutation tests when one sample is large or both are moderately sized and many permutation tests must be performed (e.g. most multiple comparisons situations). PermuteIt^{TM} has been benchmarked against the available commercial alternatives (see table below or .pdf) and has relative runtimes often more than an order of magnitude faster under these conditions. This can make the difference between meeting deadlines, or missing them, when performing thousands of tests, and an hour’s runtime easily can become ten, twenty, or thirty hours. This disparity obviously becomes even more magnified when, as is the rule rather than the exception, analyses or reports need to be rerun due to the receipt of revised data; or the reprocessing of the input datasets; or any of the countless issues that arise when working with large volumes of data.
But
PermuteIt^{TM} not only
provides the speed that makes the appropriate application of permutation tests
possible where other software fails – it also provides increased precision in
the estimated pvalues.
PermuteIt^{TM} uses a combination of algorithms that,
wherever possible, provide exact pvalues based on full enumeration. When exact
inference is not possible, at the user’s request
PermuteIt^{TM} efficiently
attains variance reduction by increasing the number of permutation samples drawn
if the confidence interval contains the predetermined critical pvalue of the
test. This provides a larger number of unambiguous test results in less time by
avoiding wasteful sampling. Some of the unique and powerful features of
PermuteIt^{TM} include:
·
the availability to the user of a wide range of test statistics
for performing permutation tests on continuous, count, and binary data, including: pooledvariance ttest;
separatevariance BehrensFisher ttest and joint tests for scale and location
coefficients using nonparametric combination methodology; permutation scale
test; Brownie et. al. “modified” ttest; skewadjusted “modified” ttest
exact inference; CochranArmitage test; exact inference; Poisson normalapproximate test; Fisher’s exact test;
FreemanTukey double arcsine test
·
extremely fast exact inference (no confidence intervals –
just exact pvalues) for most count data and highfrequency continuous data,
often several orders of magnitude faster than the most widely available
commercial software (see
table below or
.pdf) · the availability to the user of a wide range of multiple testing procedures, including: Bonferroni, Sidak, Stepdown Bonferroni, Stepdown Sidak, Stepdown Bonferroni and Stepdown Sidak for discrete distributions, Hochberg Stepup, FDR, Dunnett’s onestep (for MCC under ANOVA assumptions), Singlestep Permutation, Stepdown Permutation, Singlestep and Stepdown Permutation for discrete distributions, Permutationstyle adjustment of permutation pvalues
·
efficient variancereduction under
conventional Monte Carlo via
selfadjusting permutation sampling when confidence intervals contain the
predetermined critical value of the test
·
fast, efficient, and automatic
generation of all pairwise comparisons
·
shortest confidence intervals
under conventional Monte Carlo via a new
sampling optimization technique (see Opdyke,
Journal of Modern
Applied Statistical Methods, Vol. 2, No. 1, May, 2003, and related
conference
presentations  .pps)
·
fast permutationstyle pvalue adjustments for multiple
comparisons (the code is actually designed to provide an additional speed
premium for these resamplingbased multiple comparisons adjustments  see table
below or .pdf) · simultaneous permutation testing and permutationstyle pvalue adjustment, although for relatively few tests at a time (this capability is not even provided as a preprogrammed option with any other software currently on the market)
DataMineIt has designed, benchmarked, and thoroughly tested the premier permutation test software on the statistical software market for moderate sample sizes and many tests. To learn more about how PermuteIt^{TM} can be used for your enterprise, and to obtain a demo version, please contact its author, J.D. Opdyke, Senior Managing Director, DataMineit, LLC, at JDOpdyke@DataMineit.com. Please include with your name relevant contact (email address, phone number, etc.) and background (company, title, etc.) information.
