[
  {
    "path": "ChangeLog",
    "content": "Since April 2014, this changelog includes only those changes that break existing code or are especially notable. For a full list of changes, see the Git history at \nhttps://github.com/b-k/Apophenia/commits/master\n\n\n[Key, pre-March 2014: \n--Addition or improvement\n**Change that could require recoding existing code.\n!!Big.\n]\n\n    October 2014\n** apop_model_stack --> apop_model_cross\n\n    August 2014\n** default for apop_data_pack is .all_pages='y' (was 'n').\n** remove apop_plot_lattice, apop_plot_triangle, apop_plot_line_and_scatter, apop_plot_qq.\nFind them at https://github.com/b-k/apophenia/wiki/gnuplot_snippets .\n\n\tMarch 2014\n--Command-line tools print help should a user add a --help option.\n\n\tFebruary 2014\n!!OpenMP for threading. All calls to apop_map and friends, apop_model_draws, and others auto-thread.\n!!apop_rng_get_thread to get a thread-specific RNG, so you can thread random processes.\n\n\tJanuary 2014\n**View macro reform:\n\tApop_cols       Apop_rows       A contiguous set of columns or rows as an apop_data set (with names)\n\tApop_col\t    Apop_row        One column or row as an apop_data set\n\tApop_col_t      Apop_row_t      One column or row as an apop_data set, retrieved by row/col name\n\tApop_col_v      Apop_row_v      One column or row as a gsl_vector\n\tApop_col_tv     Apop_row_tv     One column or row as a gsl_vector, retrieved by row/col name\n\tApop_matrix_col Apop_matrix_row One column or row as a gsl_matrix\n**MLE methods are now strings instead of all-caps enums.\n**All blank elements of a data->text grid point to the same NUL string.\n--Add apop_model_metropolis; revise apop_update accordingly.\n--apop_draw uses metropolis to draw from any model with a log likelihood/p and where data size>1.\n**Replace all instances of output_file with output_name (GNU sed -i 's/output_file/output_name/g' *c)\n--Consolidated headers\n**apop.h no longer #includes time.h or unistd.h\n**apop_draw returns zero on success; nonzero on failure.\n**Removed the BIC-by-cells from estimation output. Added AIC_c. OLS now reports the ICs along with R^2.\n\n\tDecember 2013\n--append and replace options for apop_text_to_db\n--apop_probit bug fix\n**apop_plot_histograms now uses gnuplot's impulses, not boxes by default---handles missing zero bins better.\n**MLE path trace lists the probs/loglikelihoods in the vector of the apop_data set it produces. path is apop_data**, from apop_data*.\n**apop_data_transpose has an .inplace option, which is 'y' by default. Add .inplace='n' to existing uses.\n--siman checks constraints for the starting point.\n--Mixture models overhauled.\n--cleaned up the command-line utilities\n**removed apop_lookup.\n\n\tNovember 2013\n--rewrite apop_data_sort to allow sorting by multiple columns or text or names\n--apop_pmf now has a CDF method.\n--fixed up K-S tests.\n--removed the Swig Python wrapper from this package.\n**replaced char apop_opts.db_nan[101] with char *apop_opts.nan_string. More descriptive, easier to use.\n**apop_name_find does plain case-insensitive search; no regexes.\n\n\tOctober 2013\n**all built-in models (apop_ols, apop_dirichlet, ...) are now apop_model* (ptr-to-struct), from apop_model (plain struct).\n**apop_estimate and apop_copy take in an apop_model* instead of plain apop_model.\n**printing no longer part of the apop_model struct; uses a vtable.\n\n\tSeptember 2013\n**change vbase, m1base, m2base ==> vsize, msize1, msize2\n**Estimate returns void (was apop_model*)\n--vtable mechanism improvements\n**Remove score, predict, and parameter_model from the apop_model object; use the vtable mechanism.\n**Upgrade model p, ll, cdf, constraint to return long double (was double)\n**consolidate vector_var and vector_weighted_var. same with cov, mean, weighted_skew, and weighted_pop. Users just have to replace apop_vector_weighted_var with apop_vector_var.\n**removed deprecated.h entirely.\n--apop_data_add_named_elmts puts new data in the vector, not the matrix, because it is intended for a list of scalars (==a vector). If you use apop_data_get(infodata, .rowname=\"statistic name\") then you'll be able to retrieve the element either way.\n**removed apop_line_to_data and apop_line_to_matrix. Use apop_data_fill and apop_data_falloc.\n\n\tAugust 2013\n--apop_map(_sum) properly threads data-row mappings. .inplace='v' to return NULL.\n**Remove apop_settings_alloc, apop_settings_group_alloc\n**Change Apop_row to return an apop_data set, not a vector (for which use Apop_matrix_row).\n**Apop_settings_set sets model->error='s' on error, instead of returning.\n**Add a .want_path='y' setting to your apop_mle_settings group, and I'll put a list of the points tried by the optimizer in an apop_data set named path (in the settings group); see documentation for details. Remove the former trace_path mechanism.\n**removed apop_(vector|matrix)_increment. Use, e.g., *gsl_vector_ptr(v, 7) += 3; or (*gsl_vector_ptr(v, 7))++.\n**Some mu and sigmas => μ and σ\n**Removed apop_settings_alloc, apop_settings_group_alloc\n**Change Apop_row to return an apop_data set, not a vector (for which use Apop_matrix_row).\n\n\tJune 2013\n--replaced makefile in base directory with ./configure.\n--version number now equals build date.\n**name->title is a ptr; name->column => name->col\n**removed apop_strip_dots; it's up to the user to give reasonable names for the db.\n\n\tMay 2013\n--jacobian transformations\n--Apop_model_copy_set to copy a model and add a settings group at once\n--mixture models\n--data-data composition\n--added apop_model_draws\n!!vtables, allowing for more functions with special cases for certain model(s) outside of the model object itself.\n\n\tApril 2013\n--plugged some memory leaks\n--default tolerance for MLE is much finer (1e-5).\n--Added apop_text_fill\n**Finally removed support for gsl_histograms, including the apop_histogram model. This cut has been promised for about four years now. Use the apop_pmf instead.\n**apop_data_to_bins no longer modifies the input data in place. It now makes a copy and modifies the copy.\n**Removed apop_crosstab_to_pmf. There's a version at https://github.com/b-k/Apophenia/wiki/Crosstab-to-PMF for matrices.\n**Removed apop_vector_to_array. If your array has stride 1, use your_array->data; else write a for loop to copy out the data.\n**Removed apop_array_to_matrix.\n\n\tMarch 2013\n**apop_text_paste now prints the pasted string at verbosity level 3 (formerly 2).\n\n\tFebruary 2013\n--Starting_point in Bayesian updating no longer does anything, but it was never significant to begin with. Added some verbosity options to apop_update.\n\n\tJanuary 2013\n--Logit regression much smarter about picking a starting point.\n--Defaults for simulated annealing try 1600x fewer points. Prior settings were overkill.\n--configure.ac checks for native asprintf and uses it if it is present\n**Removed apop_db_merge and apop_db_merge_table. Get them from http://modelingwithdata.org/arch/00000141.html .\n**Removed apop_matrix_fill; use apop_data_fill\n**Removed apop_array_to_data.\n**Removed apop_matrix_correlation; use apop_data_correlation\n**Deprecated apop_rank_compress; use apop_data_to_dummies(..., .keep_first='y').\n--apop_model_stack\n\n\tDecember 2012\n--Added an apop_pmf_settings group, eliminating a couple of hacks (e.g., see October 2012).\n--faster read-in of text files\n--Exponential model uses data in both the vector and matrix parts of the apop_data input\n--transformation to generate mixtures (i.e., linear combinations) of models.\n**apop_data_transpose now transposes the text element as well as the matrix (by default).\n\tUse apop_data_transpose(your_matrix,  .transpose_text='n') to replicate the previous behavior.\n--removed Autoconf pkg-config macros, because Autoconf no longer needs the help.\n--writing apop_data to DB uses prepared queries where possible => much faster.\n**It is now up to you to put apop_query(\"begin\"); / apop_query(\"commit\"); wrappers around\n\twriting of tables to the database.\n\n\tNovember 2012\n--apop_vector_unique_elements, apop_data_to_dummies, and apop_data_to_factors handle NaNs\n\tbetter; put them at the end of the sort order.\n!!Finally added an .error element to apop_data and apop_model structs, thus simplifying\n\terror-checking.\n--Apop_stopif macro, rendering the Apop_assert family largely obsolete (so if you're using\n\tthem in your own work, consider them deprecated...).\n--Where the assert macros used to abort() on errors, they now send signal(SIGTRAP), making\n\tdebugging a little easier. Most host systems force an abort on SIGTRAP anyway.\n\n\tOctober 2012\n**Removed apop_strcmp. If you still need it, this macro is basically equivalent:\n    #define apop_strcmp(a, b) (((a)&&(b) && !strcmp((a), (b))) || (!(a) && !(b)))\n--clean up of parameter-fixing model transformation.\n--split off multiple imputation variance code.\n**Removed apop_vector_grid_distance. Use apop_vector_distance(v1, v2, .metric='M');\n--If apop_pmf.dsize==0, apop_pmf.draw returns a row number, not the data in that row. This will change shortly.\n--Logit draw function, akin to apop_ols.draw. Both will change shortly.\n--apop_data_to_factors now auto-allocates a matrix if need be (because it always auto-allocated a vector). \n\n\tSeptember 2012\n--\\0 in text files counts as white space\n--fixed counting bug for text files with ,<end-of-line> sequences.\n\n\tJuly 2012\n--some MLE cleanup\n--fixes to apop_rake\n--apop_logit.score fixed\n\n\tJune 2012\n--The sample kurtosis calculation is still more precise.\n\n\tMay 2012\n--Autotools improvements. Use the standard 'make check' instead of the ad hoc 'make test'.\n!!Set apop_opts.stop_on_warning='n' to never abort() on any type of error. E.g., GUIs that\n\tshould never halt will use this. Default is still to halt on errors, because that's\n\tmost useful for interactively developing numeric analyses.\n--Use apop_opts.log_file=fopen(\"yourlog\", \"w\") to divert the warnings/errors from stderr into yourlog.\n--Some formerly void functions now return an int, to return an error code. E.g.,\n\tapop_opts.stop_on_warning='n';\n\tif (apop_data_set(data, row, col)) printf(\"Error! Nothing was set.\\n\");\n--Fixed a memory leak in simulated annealing.\n--Apop_data_row and apop_data_set_row handle row names\n\n\tMarch 2012\n--bug fix in apop_text_to_data when input file has no names.\n--probit dlog likelihood isn't implemented for N>2; now acknowledging this.\n--added apop_data_get_factor_names\n--apop_(vector|matrix)_(map|apply) now accepts NULL input.\n\n\tJanuary 2012\n**Removed apop_matrix_var_m, which nobody was using.\n--bug fix in apop_vector_distance for Ln norms where n is odd.\n--reading data from text files rewritten; much more robust. \n   **A space is no long a default delimiter.  Use apop_opts.input_delimiters=\"| ,\\t\" to restore old default.\n   **apop_opts.db_nan is no longer a regex; I just do a case-insensitive comparison.\n\n\tDecember 2011\n--apop_model.textsize is now a size_t instead of an int.\n--apop_update accepts likelihoods with no pre-set parameters \n\n\tNovember 2011\n--moved to Github; some changes to structure and documentation to accommodate. \n**apop_ols.predict didn't do the OLS shuffle if the input has no vector; this was anomalous.\n\n\tOctober 2011\n--apop_text_paste added.\n**apop_multinomial and apop_binomial overhauled. No longer accepting Bernoulli draws as input.\n--standardization: make docs => make doc\n\n\tSeptember 2011\n--`query turned up a blank table' warning turns up when apop_opts.verbose >=2. (used to be >=1)\n--apop_t_test and apop_paired_t_test are quieter---no intermediate results until apop_opts.verbose >=2.\n**apop_opts.db_name_column now has a blank default (instead of the SQL-specific and potentially surprise-inducing \"rowname\").\n\n\tAugust 2011\n--Bootstrap/Jackknife are better with text\n**apop_data_memcpy used to reallocate memory for the text and names elements; use apop_data_copy if you want allocation done for you.\n\n\tJuly 2011\n--Apop_data_rm_rows now accepts a test function as well as a fixed list of rows to drop.\n\n\tJune 2011\n--apop_crosstab_to_db handles missing labels and NaNs better.\n**apop_matrix_to_db removed (as promised a few years ago). Use apop_matrix_print(yourdata, \"tabname\", .output_type='d').\n**F-test defaults now match ANOVA tradition.\n--documentation script doesn't use GNU extensions to awk; should now be POSIX-standard.\n\n\tMay 2011\n**apop_map returns a data set with an allocated/filled vector when not called with .inplace='y'.  Before, it had been making a full copy, which is idiosyncratic.\n--apop_rake accepts a weights column.\n--apop_anova uses variadic arguments for a marginally nicer interface (and better argument checking).\n**apop_data_to_dummies tries to give nicer labels. You may have to recode things if you relied on the old labels.\n\n\tMarch 2011\n--Header files have been merged. A few long files is as easy to grep as a multitude of\n\tnearly one-line files. If you #include <apop.h> instead of the individual headers,\n\tthen this shouldn't affect you. Due to redundancy, compilation with gcc takes 3% longer.\n\n\t0.99 February 2011\n!!The apop_PMF model has more support:\n\t--New supporting functions: apop_data_pmf_compress, apop_model_to_pmf\n\t--functions that took apop_histogram models now take apop_pmfs as well:\n\t\tapop_test_chi_squared_goodness_of_fit, apop_test_kolmogorov\n\t--Consider the apop_histogram to be deprecated. Only two associated functions were removed; see below.\n\t**apop_histogram_plot is removed. Replace with:\n\t    fprintf(apop_opts.output_pipe, \"plot '-' using 1:3 with boxes\\n\");\n\t\tapop_model_print(hist);\n\t\tfprintf(apop_opts.output_pipe, \"e\\n\");\n\t**apop_histogram_print was a bad idea to begin with, because it basically replicates\n\t    gsl_histogram_fprintf. Use apop_model_print(your_histogram), which calls gsl_histogram_fprintf, \n\t\tor call that function directly. The only difference: the GSL function prints \n\t\t[start of bin] [end of bin] [value]\n\t\tand apop_histogram print showed\n\t\t[start of bin] [value]\n\n\tDecember 2010\n**apop_maximum_likelihood no longer calls apop_prep. If you want that, use apop_estimate.\n--apop_text_to_db lets users specify types and keys.\n--deleted some obsolete/deprecated items: apop_error, apop_multinomial_settings\n--apop_data_split retains text when splitting by rows; still ignores it when splitting by columns.\n\n\tNovember 2010\n--apop_listwise_delete uses apop_opts.db_nan to check for missing data in the text part of the input data.\n--apop_data_split handles names\n\n    September 2010\n**Multinomial distribution sets N to be the length of the row (a single observation)\n    rather than the size of the full data set. Added apop_multinomial.parameter_model method\n    for testing purposes.\n\n    August 2010\n** What was apop_assert => apop_assert_c; what was apop_assert_s => apop_assert. Their\n    arguments are slightly different, and the thing that was asserted no longer prints along\n    with the message you chose.\n--verbosity defaults to 1. Queries print at verbosity >=2.\n--apop_data_to_db writes the weights.\n--Iterative proportional fitting, aka raking. \n\n**apop_text_add now frees the contents of cell in the text grid that you are about to\n    overwrite, thus preventing memory leaks without effort from the user. If your existing\n    code has other pointers to the string in that text cell, you'll have to replace the now\n    string-freeing apop_text_add with asprintf(&(your_dataset->text[row][col]), \"your string\").\n\n    July 2010\n** apop_regex now gets all matches when you pull substrings. Each row of the text grid\n    is a match, and if you have multiple substrings, each match's substrings will be\n    along the columns.  May require recoding because the substrings used to be along\n    the rows; just switch the indices.\n\n    June 2010\n** Removed the apop_rank settings group, and thus all the code related to it. It was just\n    the wrong place to do this. Added apop_data_rank_expand to convert rank data to\n    what the various models typically expect. This is another step for some users and\n    could be a problem if the counts get into the billions, but it still makes more sense\n    than rewriting every model twice.\n\n    May 2010\n**apop_data_prune_columns_base now takes in a list of strings terminated by a NULL, not by\na zero-length string.\n--apop_data_get_row lets you pull a view from a data set. [this was briefly the apop_data_row]\n    ==>apop_data_set_rows, apop_data_rm_rows\n        ==> apop_data_listwise_delete is fifty lines shorter.\n--apop_parts_wanted_settings: fixes some infinite loops (est needs parameter models ->\n    p.m. bootstraps for variances -> bootstrap runs estimate -> repeat) and allows\n    just-the-parameters estimation when you want it.\n--cleaned up build system, including an added RPM spec file\n\n    April 2010\n--apop_t_distribution now has three parameters: mean, std dev, df. That is, it is based on un-normalized data.\n**apop_random_int and apop_random_double removed for not being particularly useful.\n\n    March 2010\n**The apop_predict special case for when all data is non-missing was a bit too special,\n    and has been eliminated---you now have to specify the first column as NaN yourself.\n    E.g., Apop_col(data, -1, to_nan); gsl_vector_set_all(to_nan, GSL_NAN);\n    This will make things more predictable, and save you if(!has_nans)... else... kind of statements.\n**Removed the prepared element of the apop_model.\n**apop_model_prep ==> apop_prep for consistency with other apop_model dispatch functions. apop_model_prep left for now as an alias in deprecated.h\n!!apop_parameter_model: a method for getting the distribution of a parameter.\n    **Moved OLS-family test stats (pval, qval, whatever) to a page of your_estimate->info. It won't be there for long either.\n--settings macros let you use lowercase, thus entirely ignoring that they're macros.\n    **apop_settings_rm_group function, which you were probably not using, changed to apop_settings_remove_group; apop_settings_get_group function => apop_settings_get_grp. Having a macro and a function that differed by a question of case was a bad idea to begin with.\n\n    February 2010\n!!Overhauling the output from estimations; pardon our dust. \n--Added CDF method to the apop_model, including apop_cdf dispatch method and default via random draws.\n**Defininitvely removed the residual, covariance, and llikelihood elements from the\n    apop_model struct. The first two will be pages appended to the data and parameters,\n    respectively, and the last will be in the Info page appended to the parameters.\n**Renamed apop_ls_settings (least square) to apop_lm_settings (linear model) \"s/apop_ls/apop_lm/g\" should work.\n**Sundry lists of scalars, like the R^2 table and the estimation routine's info table put the data in column zero, not column -1. In the next bullet point you'll see how this simplifies retrieval.\n**Added an info element to the apop_model--> more shuffling of auxiliary info. \n    --Find results like the log likelihood or AIC via, e.g., apop_data_get(your_model->info, .rowname=\"log likelihood\");\n  **Find the predicted/residuals via apop_data_get_page(your_model->info, \"Predicted\");\n        This means that the input data set is read-only again. \n    --Find the parameter covariances via apop_data_get_page(your_model->parameters, \"Covariance\");\n    **apop_estimate_coefficient_of_determination takes in the model again. Just replace est->parameters in your argument with est. apop_ols calls this fn automatically now [apop_data_get(your_model->info, .rowname=\"R sq\")], so you probably aren't even calling it anymore.\n**apop_data_add_named_elmt now writes to the zeroth element of the matrix, not the vector.\nSo instead of apop_data_get(data, .rowname=\"R squared\", -1), just go with apop_data_get(data, .rowname=\"R squared\"). This affects many of the elements of the info-type matrices.\n--apop_data_pack, apop_data_unpack, apop_ml_impute, apop_map offer an option to use all pages. \n\n    0.23 January 2010\n**expected_value element of the model renamed predict; made coherent across models.\n!!apop_data set now has a ->more pointer to an additional apop_data set, e.g., for data + covariances or predictions + confidence intervals.\n--apop_ml_imputation renamed to apop_ml_impute. \"#define apop_ml_imputation apop_ml_impute\" retains noun-form name, but consider it deprecated.\n!!apop_estimate now copies, preps, then estimates. Estimate method of apop_model struct can thus assume the copy/prep step has been done;\nprobably should not do these itself. As a side-effect, apop_maximum_likelihood's second argument is now a apop_model* (used to be apop_model).\n--apop_regex and apop_strcmp, for easier searching through your info pages.\n--minor rewording of COPYING2.\n**Because the Predicted table is now part of the parameter set, not the model,\n    apop_estimate_coefficient_of_determination now takes in the parameter set, not the model. Just replace est in your argument with est->parameters.\n**apop_multinomial_probit folded into apop_probit, where it should've been all along.\n    Regex for the fix: \"s/apop_multinomial_probit/apop_probit/g\"\n\n    December 2009\n--apop_strcmp\n--apop_loess model: 3,500 lines of code from the netlib archive, lovingly restored.\n\n    November 2009\n--apop_rm_columns bug fixed by Birger Baksaas.\n--apop_text_to_db attaches numeric affinity to sqlite3 columns, making numeric comparisons easier.\n--apop_histogram_model_reset's first argument is now \"base\" instead of \"template\", as a concession to C++ users.\n\n\tSeptember 2009\n--Many minor changes, mostly regarding adding optional arguments.\n--Dirichlet model\n--Output functions now take a consistent set of specs regarding to where they will write. You no longer have to use the global apop_opts settings if you don't want to.\n\n\tAugust 2009\n--apop_map and apop_map_sum. Reworks the apop_(map|apply) system to be more flexible but a little more complex.\n-apop_(data|matrix|vector)_fill is now more robust---no more int vs float issues.\n**Removed apop_count_cols\n--Default univariate RNG, if you don't have one: Adaptive rejection markov chain sampling\n**.use_covar and other such settings now take 'y' or 'n', not 0 or 1.\n--new macro Apop_settings_set = Apop_settings_add, but makes more human sense\n--numeric covariance, formerly maligned, now works.\n\n\tJuly 2009\n--multivariate gamma, log-gamma.\n--t, F, chi^2, Wishart distributions, for description [and Bayesian\nupdating]\n--apop_matrix_to_positive_semidefinite and apop_matrix_is_positive_semidefinite \n--bug fixes\n--Apop_model_add_group replaces Apop_settings_add_group, and is much more easy to work with.\n\n\tJune 2009\n--More variadicized functions\n\t--notably, apop_estimate is much more useful\n--apop_opts.version.\n--apop_(vector|matrix|data)_stack have an inplace option, making stacking inside a for loop easier.\n--apop_test convenience function\n--more autoconf macros ==> some compilation hacks now done right\n\n\tMay 2009\n--mysql functions slightly cleaned up\n--apop_opts.db_user and apop_opts.db_pass for mysql.\n!!Functions that take lots of basically optional inputs, like apop_text_to_db, now use some designated initializer magic to let the user rearrange or omit inputs.\n**apop_dot also now allows designated initializers, which breaks\n(only) calls of the form apop_dot(a_vector, a_matrix, 't'). Replace with\napop_dot(a_vector, a_matrix, 0, 't') or apop_dot(a_vector, a_matrix,\n.form2='t')\n--With optional inputs, some functions now handle RNGs for the lazy user ==>added apop_opts.rng_seed\n--apop_vector_distance is much more versatile\n**Removed apop_matrix_summarize. Too much like apop_data_summarize. Just\nreplace every instance of apop_matrix_summarize(m) with apop_data_summarize(apop_matrix_to_data(m)).\n\n\n    April 2009\n--sample moments are now mega-accurate---possibly the most unbiased estimators in code today.\n!!Python interface via swig\n\n\tMarch 2009\n--apop_matrix_realloc, apop_vector_realloc\n--sqlite queries no longer rely on a temp table ==> faster\n--fixed bugs in apop_table_exists making queries fail in Cygwin.\n\n\tJan/Feb 2009\n--Added more tests; some cleanup in test.c\n--Binomial distribution looks in both the data set's vector and matrix\n\n\tDecember 2008\n--When writing x=infinity to a db table, I now write 'inf' to the db, instead of breaking.  SQLite has no standard here.\n\n\tOctober 2008\n--bug fixes to new apop_data_show\n--bug fixes to apop_bernoulli.p\n--apop_update tweaks\n\n\tSeptember 2008\n--Documentation overhaul\n--apop_data_show is much more screen-friendly. Keep using\napop_data_print for more machine-readable and less fixed-width output.\n**apop_plot_histogram now takes in a vector, bin count, and name of output. This is what it did in the first half of the year.\n  The current version of apop_plot_histogram, which acts on a histogram model, is renamed to apop_histogram_plot.\n\n\tAugust 2008\n--Constraints in ML work better.\n**Overhaul of some discrete choice models\n\t--Added tests for the probit and logit.\n\t--fixed a bug revealed by the tests\n\t**the first choice has a fixed value of zero.\n\t**You'll probably need to call Apop_category_settings_add before estimating \n\t\tyour model, unless the outcome choice variable is the 0th column of the matrix.\n\t--more to come, e.g., multinomial probit will be merged with ordered probit.\n-Adding a settings group of a given type when that group already exists used to induce an error; now the old type is replaced with a clean default.\n--bug fix for apop_test_fisher_exact on non-square matrices\n--apop_settings_add and company do more work in functions and less in macros.\n--removed the settings_ct element of the apop_model; using a sentinel at the end of the array instead.\n--Slightly improved reading of text files.\n--Bootstrap/jackknife act on models with parameters in both matrix and vector form.\n\n\n    July 2008\n--Guts of apop_plot_histogram now use the apop_histogram model. \n**  Also, it no longer normalizes the histogram to integrate to one by\n  default. You need to explicitly request this via\n  apop_histogram_normalize\n--apop_plot finally deleted.\n--apop_histogram_plot deleted; use apop_plot_histogram.\n--Added apop_vector_skew_sample, apop_vector_kurtosis_sample.\n\n\n\tMay 2008\n--apop_settings_rm_group added.\n--mysql interface has the beginnings of support for multiple\n\tsemicolon-separated queries in one call.\n--apop_histogram_refill_with_model ==> apop_histogram_model_reset;\n\tapop_histogram_refill_with_vector ==> apop_histogram_vector_reset.\n\n\tApril 2008\n--apop_dot handles names.\n--apop_t_test now behaves correctly when one vector is of length 1.\n--some improvements/fixes when dealing with mySQL.\n--apop_sv_decomp renamed to apop_matrix_pca. Minor changes so that it correctly works as such.\n--apop_text_to_(db|data) handles column names like it used to, which\n   works better. Also a few other fixes for odd situations.\n\n\tMarch 2008\n--Various improvements in reading in from text.\n\t-- a, \"b, c\", d will now correctly read in as three elements: a; then \"b, c\"; then d\n\t-- a,,b,c reads as a, NAN, b, c.\n\n\tFebruary 2008\n--Some of the header references didn't work for a fresh install.\n--bug fixes, esp. with apop_test_kolmogorov\n--added convenience fn apop_data_transpose\n\n\tJanuary 2008\n--Apop_assert, which streamlines the use of apop_error (thus shrinking the code base by 2%).\n--apop_OLS now has a log likelihood (also shuffled some of the code around)\n--bug fix in apop_binomial.p.\n**More name reform: apop_correlation_matrix --> apop_matrix_correlation; apop_data_correlation_matrix --> apop_data_correlation; apop_covariance_matrix --> apop_matrix_covariance; apop_data_covariance_matrix --> apop_data_covariance.\n--apop_count_(rows|cols)_in_text are now static functions and removed from the documentation.\n--Removed apop_random_beta, which had been set as deprecated earlier.  Use apop_beta_from_mean_var.\n**Removed apop_vector_isnan---just use apop_vector_map_sum(your_vector, isnan)\n**Removed apop_vector_finite---just use apop_vector_bounded(your_vector, INFINITY)\n--For your convenience, added Apop_settings_alloc() macro.\n\t**apop_histogram_params ==> apop_histogram_settings\n\t**apop_kernel_density_params ==> apop_kernel_density_settings\n\t[The settings/params distinction is in some ways arbitrary anyway.]\n--bug fix in apop_mle.c: wasn't copying output parameters to the estimated model in some cases.\n--As part of name reform, all function names are being switched to\nlower-case throughout, so apop_ANOVA ==> apop_anova. Am keeping the old\nforms via macros. Notice also the non-yelling macro capitalizations above, such as Apop_assert.\n!!**Revised the settings for the apop_model. model_settings and\nmethod_settings are out, replaced by a much more organized single list\nof settings.\n--added the apop_lookup command-line program.\n**Renamed the apop_multinomial_logit model the apop_logit, because the\nbinary logit is a special case that requires no special handling.\n**Reversed the signs on the probit coefficients, to better conform to\nthe norm.\n\n\n\tDecember 2007\n--added apop_vector_moving_average\n--apop_model now has prep and print methods.\n**apop_p, apop_log_likelihood, apop_score now take a pointer to an apop_model, not the model itself.\n--apop_multinomial_probit model\n--When a data set has matrix and vector, apop_dot accepts a 'v' to use the vector.\n--apop_plot_lattice produces a more attractive (and standard-form) plot.\n--apop_data_print works much better now.\n**apop_model no longer requires your data input to be const. It probably\nwill be const, but it's not the interface's place to dictate that.\n**apop_data_unpack no longer allocates a new data set, but writes to an input data set assumed to be of the right size.\n--added apop_ANOVA to produce one- or two-way ANOVA tables from the database.\n**apop_test_ANOVA renamed to apop_test_ANOVA_independence to create a little more cognitive distance.\n--apop_data_text_to_factors\n--APOP_COL_T and APOP_ROW_T macros, to pull a column or row by name\n--apop_beta_from_mean_var produces a beta-distributed model with the right (alpha, beta) parameters. \n**Thus, apop_random_beta is now marked as deprecated.\n**apop_x_prime_sigma_x removed, on grounds of being silly. If you want it back, see model/apop_multivariate_normal.c, where it is now a static function.\n**apop_qq_plot --> apop_plot_qq\n--Multinomial logit model (and the probit now has names).\n**Revised the bin-syncing methods. apop_vectors_to_histogram and apop_model_to_histogram are now out; apop_histogram_refill_with_vector and apop_histogram_refill_with_model are in. \n**Also removed apop_model_test_goodness_of_fit as redundant. Just produce a histogram, use the above refill functions, and send your two histograms to apop_histograms_test_goodness_of_fit. If you do this often, you can write a convenience function to do that as quickly as I could.\n**apop_vector_replace and apop_matrix_replace are redundant---just use apop_(vector|matrix)_apply.\n--The covariance matrix is now produced via the derivative of the score function at the parameter. I follow Efron and Hinkley in using the estimated information matrix---the value of the information matrix at the estimated value of the score---not the expected information matrix that is the integral over all possible data.\n\n\tNovember 2007\n--added apop_model_copy_set_string to get a copy of a model whose\nmodel_settings is just a string.\n**Thanks to this, folded all of the _rank versions of models into their\nbase models. Set model_settings to \"r\" to use the rank version.\n--The default MLE method is now the Nelder-Mead Simplex algorithm,\ninstead of the Fletcher-Reeves conjugate gradient. This is more\nconservative.\n--apop_(vector|matrix|matrix_all)_map_sum to get the sum of a function\napplied to a vector. E.g., find count of NaNs with apop_vector_map_sum(v, isnan);\n--apop_logit bug fix.\n\n\tOctober 2007\n\n--apop_estimate now defaults to using MLEs, meaning you don't have to explicitly specify an estimate method for MLE models. \n--apop_crosstab_to_db reads both the matrix and text elements of the input apop_data set. \n--apop_system convenience function, to make C feel more like a scripting language. \n--Added some SQLite functions for mySQL compatibility: var_samp, var_pop, stddev_samp, stddev_pop, std. \n--Probit patched to not NaN for very unlikely parameter/data combinations. \n**apop_estimate_restart takes two models, rather than one model and some haphazard settings. \n--apop_plot_query no longer forces you to use the -d and -q switches to specify the database and query. \n**The two places that use regular expressions: apop_opts.db_nan and the search for a name via apop_data_get/set... use case-insensitive EREs. Before I'd been using BREs, which nobody likes. \n--<ctrl-c> stops the MLE searches, prints output, and continues the program. Especially useful for simulated annealing. [GDB tip: use the command: signal SIGINT ] \n--apop_mle_fix_params debugging: gradients work now. \n--apop_test_ANOVA added, to test the null hypothesis that all cells of a crosstab are equally likely. \n**apop_multivariate_normal_prob removed. Use the apop_multivariate_normal model and its .log_likelihood, .p, .draw, et cetera. \n**sed -i -e \"s/apop_OLS_params/apop_ls_settings/g\" -e \"s/apop_mle_params/apop_mle_settings/g\" *.c *.h\n\n\tSeptember 2007\n--The optimization methods now have an enumerated type.\n**apop_opts.mle_trace_path is now the trace_path element of the apop_mle_params struct. Also, it works much better.\n--apop_histogram_normalize function \n--improvements to apop_kernel_density and apop_histogram_print\n\n\tAugust 2007\n--removed apop_model_template. Just copy one of the existing models. \n--apop_data_ptr_ti, apop_data_ptr_ii, apop_data_ptr_it \n--And you can never have too many bug fixes\n\n\tJuly 2007\n--apop_binomial model takes two types of input now: a two-column form with hit count and miss count, and a list of binary hits or misses. \n--apop_lognormal model \n--bug fixes on Information matrix calculation.\n\n\tJune 2007\n[subversion ate this part; sorry.]\n\n\tMay 2007\n--apop_query_to_mixed_data\n**apop_produce_dummies now makes dummy variables from both data and\ntext. This means that there's another parameter you need to set to 'd'\nor 't' to indicate what you want dummified.\n!!**Merged the apop_params and apop_model structures, leaving everything\nin just one struct. That's about all the merging left.\n--apop_text_alloc and apop_text_add to make text manipulation a little\neasier.\n--apop_matrix_apply_all and apop_matrix_map_all operate on all items in\na matrix.\n**small tweak to apop_vector_normalize interface.\n--apop_matrix_inverse and apop_matrix_determinant, because the\napop_det_and_inv interface is sort of ugly.\n--APOP_SUBMATRIX macro\n--MLEs put the expected score in the ->more element of the returned\napop_model. If people find this useful, we can maybe put a\nproper expected_score element in the model.\n--More consts in function headers. You can decide whether this\nis actually useful.\n\n\t0.19 April 2007\n!!**Eliminated the apop_estimate and apop_ep structures, replacing them\nwith the apop_params structure. The apop_params + apop_model pair form a\nclosure representing a parametrized model. Expect the uses element to go\naway soon; after that, things should be stable. Parameters for individual\nmethods now have their own space; try apop_ml_params_alloc and\napop_OLS_params_alloc, for example.\nIf you are just doing things like\napop_estimate_show(apop_OLS.estimate(data,NULL));\nthen don't worry, but if you are doing a lot with the input parameters,\nthen have a look at Chapter 6 of _Modeling with Data_.\n--apop_histogram model\n**Gradually rewriting the histogram functions from before to make use of\nthat model. E.g., eliminated apop_vector_to_cmf.\n**apop_line_to_data fixed to use both vector and matrix terms. Now\nrequires arguments: (indata, vsize, m1size, m2size).\n--MLE now approximates the Information matrix using data gathered during\nthe MLE search. This is wrong but cheap; right but expensive procedures\nforthcoming. [Hint: Simulated annealing gathers more info.]\n--apop_(data|matrix|vector)_fill functions, which are a touch fragile,\nbut very useful when used with care.\n**apop_data_(get|set)_(tn|nt) changed to (ti|it), because n could stand\nfor name or number, while i stands for index, and is often used for integers.\n--apop_names now have a title element, so you can give your data\nstructures a title.\n**apop_params_alloc takes an apop_model, not an &apop_model. It's more\nnatural that way.\n**Finally erradicated every last vestige of inventories: apop_params no\nlonger has a .uses element. Instead, apop_specific_model_params may have\na want_cov, want_expected_value, want_whatever element if the element is\noptional. And really, the parameters themselves should never be\noptional. What was I thinking.\n**apop_model_fix_params now sets up and returns an apop_mle_params object,\nthus resolving the problem that the MLE params needs a model input,\nand the model_fix_params model needed an MLE params input.\n\n\t0.18 March 2007\n**apop_text_to_db now assumes column names unless you specify -nc.\n--If you set parameter_ct==0 in your model definition, the MLEs will\nassign parameter_ct == the number of columns in your data set.\n--Missing data functions: apop_listwise_delete and apop_ml_imputation\n**The constraint element of the apop_model now takes a void* parameter,\nlike it should have all this time.\n**apop_jackknife (1) renamed to apop_jackknife_cov and (2) now actually\nworks.\n**Entirely eliminated the apop_inventory structure. Its sole utility is\ninside the apop_ep struct.\n**Changed the RNG interface for the sake of allowing multidimensional\ndraws. [Not that I have any functions that do that right now.]\n--Bayes-oriented MCMC algorithm: apop_mcmc_update\n!!Bayesian model generator: apop_update\n**apop_model paramters is now an apop_model. See the documentation of\nthe model for all the changes.\n**apop_data_alloc now takes three arguments: vsize, msize1, msize2. To\nupdate, just put a 0, at the head of the arg list.\n!!An absolutely fabulous apop_linear_constraint function.\n--Produce a model with some parameters fixed via apop_model_fix_params.\n--apop_beta model\n\n\tFebruary 2007\n\n**Apop_sv_decomposition has a slightly nicer interface.\n**data->categories was too much to type, and too specific. The apop_data\nstruct now has a data->text element, and textsize[0] and [1]. The\ncategories element is linked to this, but is now deprecated.\n\n\tJanuary 2007\n**Root finding hooked into the max likelihood fns.\n\n0.17\tDecember 2006\n**Apop_model struct has lost the fdf object, which was annoying, and now\nhas the p function.\n--mySQL support.\n**apop_query_to_chars now returns an apop_data structure, so you don't\nhave to go back and gather column names and dimensions.\n**apop_name_get (and the apop_get_tt family) now use regular expressions\ninstead of SQL's LIKE operator. This is _much_ faster.\n--the apop_distribution model.\n\n\t\tNovember 2006\n--The preeminently useful APOP_COL and APOP_ROW\n--apop_data_calloc\n--apop_vector_(apply|map) debugged.\n**apop_estimation_params is just too darn long; reduced to apop_ep.\n\n\n\t\tOctober 2006\n--apop_text_to_db now reads from STDIN.\n**deleted apop_query_db; use apop_query.\n--Kolmogorov-Smirnov test.\n--apop_t_test returns GSL_NAN when given a one-element vector instead of\nhanging.\n--no more soft links in the tgz file==>may work better on Windows machines.\n--apop_(vector|matrix)_(map|apply) will apply a function to every\nrow of a matrix or every element of a vector. The map functions return a\ngsl_vector.\n--bug fix in apop_test_goodness_of_fit.\n\n\t\tSeptember 2006\n\n**Removed apop command-line server thing. It was interesting, but that's\nthe best that could be said of it.\n--Added functions for weighted data: weighted least squares, weighted moments.\n--apop_vector_percentiles now allows for averaging instead of rounding.\n\n\t\tAugust 2006\n**apop_log_likelihood and friends now demand that data be apop_data*,\nrather than void*. Too many things broke when users gave non-apop_data\ndata.\n--bug fixes\n\n\t\tJuly 2006\n--Many more checks for NULL ==> more robust code and easier debugging.\n--Bug fixes.\n**apop_data_split, and apop_data_stack has been revised to handle the\nidea that a vector is the -1st element of the matrix. I.e., check your\ncode if you're trying to merge matrices without merging the vectors.\n--Lattice plots.\n--Convenience t-tests from inside model estimations are fixed.\n--apop_query_to_vector. \n--apop_opts.output_type == 'p' to print to apop_opts.output_pipe\n**apop_..._print and apop_..._show now work out whether elements are\nintegers (if (val == (int) val)...), and print accordingly. This means\nthat apop_..._print_int and apop_..._show_int are basically obsolete,\nand have been removed.\n--Apop_OLS now allows weights.\n--Test library now includes a few NIST certified tests.\n\n\t\tJune 2006\n--added preprocessor cruft to let the library work for C++\n--Jackknife revised\n\n\t\tMay 2006\n**The apop_model no longer includes an inventory. I leave it to the\nestimate function to do its own allocation.\n\n\t\tApril 2006\n**apop_matrix_normalize and apop_vector_normalize had different\nnumbers for the same normalizations. Was that ever dumb. Also, I've\nswitched to chars instead of ints to signify this stuff, for better\nmnemonics without resorting to the\nAPOP_ENUM_YOU_HAVE_TO_LOOK_UP_EVERY_TIME_BECAUSE_ITS_SO_LONG sort of\nthing. If you were using apop_matrix_normalize(data, 0) before, you\nneed to change that to using apop_matrix_normalize(data, 'm'). Thus,\napop_vector_normalize now has one more normalization, for a total of\nfour for both.\n\n--Added apop_rng_alloc convenience fn.\n\n--Added apop_strip_dots to keep inputs to the database healthy.\n\n--apop_name_find uses LIKE instead of strcmp.\n\n--a fn to calculate the generalized harmonic.\n\n--A whole section on histograms and goodness-of-fit tests.\n\n--apop_data_set fns to go with the apop_data_get fns.\n\n--apop_data now includes a vector type\n\t--apop_estimate.parameters is now an apop_data type.\n\t--apop_estimate.names is thus obsolete.\n\n\t\tMarch 2006\n**apop_inventory is now a subset of apop_estimation_params. Implications:\n\t--added apop_estimation_params_alloc() to ensure that inventory is set right.\n\t--the model.estimate(data, inv, params) method is now model.estimate(data, params)\n\t  model.estimate(data, NULL) still does what the user expects it to.\n  This makes structural sense, but will lightly break any existing code.\n  fix: change \n   apop_inventory *inv = apop_inventory_set(1);\n   model.estimate(data, inv, NULL);\n\n   to\n   apop_estimation_params *ep = apop_estimation_params_alloc();\n   model.estimate(data, ep);\n\n   and in any apop_estimates, change any use of est->uses to\n   est.estimation_params.uses.\n\n**Next apop_estimate reform: y_values and residuals combined into one\napop_data table with actual, predicted, residual columns.\n\t--obviated the need for a 'dependent' element in apop_names; removed that.\n\tIf you need the name, it's now your_est->dependent->names->colnames[0].\n\n**your_estimate->covariance is now an apop_data set instead of a gsl_matrix.\n\n**the data element of the apop_matrix structure is now named matrix. So\ninstead of data_set->data, use data_set->matrix, and instead of\nestimate->data->data->data, you can use estimate->data->matrix->data.\n\n--The command-line utility has been revisited, and can do a few more\nthings, like OLS.\n\n--Simulated annealing\n\t--added convenience fns apop_vector_distance(pt1,pt2) and\n\t\tapop_vector_grid_distance(pt1,pt2)\n\n**Apop_data_memcpy no longer malloc()s for you, for comparability with\nthe world's other memcpy fns. If you want mallocing, use apop_data_copy.\n\n--apop_test_fisher_exact(). Cut 'n' pasted from R, who cut 'n' pasted it\nfrom somebody else. Despite being the same code, it runs fifty (50)\ntimes faster from Apophenia.\n\n\t\tFebruary 2006\n--sort-of-adaptive MLE: use apop_estimate_restart to execute a new MLE\nsearch beginning where the last one ended, perhaps using a new method or\nrescaled parameters.\n\t--This needed convenience functions to check for divergence, thus\n\t\tadded apop_vector_finite, apop_vector_bounded, apop_vector_isnan.\n**apop_db_to_crosstab now returns an apop_data set instead of a gsl_matrix.\nAlso, it finally works with column headers that aren't numeric.\n**stats like apop_mean are now apop_vector_mean, following the proper\n\tpkg-noun-verb naming scheme. \n--Textbook is much improved.\n--apop_vector_to_pdf convenience fn.\n\n--Some of the fns that used to be of the form\n\tapop_get_something(input, &output);\nare now of the more natural\n\tout\t= apop_get_something(input);\nThis includes apop_array_to_vector and apop_array_to_matrix\n\n--bootstrapping works, and works with with apop_models.\n--apop_poisson model\n\n0.15\tJanuary 2006\nAdded an apop_opts structure for options. Alowed the following changes:\n\t--apop_verbose  is now apop_opts.verbose. Try this on your existing code:\n\t\tperl -pi.bak -e 's/apop_verbose/apop_opts.verbose/g' *.c *.h\n\t--the output functions now output into three formats: on screen, to file, to db;\n\t\tsee chapter five of the manual.\n\t\t\n--F tests \n--R squared.\n\n0.14\tDecember 2005\n\nThe apop_data structure, which is just a shell for a gsl_matrix and an\napop_name. Was just sick of sending names following around my tables. \nLets us keep both numerical and categorical data in one place; kind of\nlike R's data frame.\n\n--Added linear model objects: OLS, GLS. This means that what had been\nthe apop_OLS function is now the apop_estimate_OLS function, and where it\nused to take in a gsl_matrix and an apop_name, now it takes an apop_data\nstructure and a NULL. So you'll have to modify your code accordingly.\n\n--A function to generate dummy variables, useful in conjunction with\n--Functions to stack matrices and apop_data sets. Even a\napop_partitioned_OLS function, that will only practically work for small data sets.\n\n--pow(.,.) in the database. I can't believe I dealt with SQL this long w/o it.\n\n0.13\tDecember 2005\n\n--The apop_model object. This was a big deal that deserves more than\njust one line; see the manuals.\n\n0.12 \tmid November 2005\n\n--Bar charts (assuming you've got Gnuplot > 4.1)\n--percentiles (in case you haven't got it)\n**redid MLE system so you can pick among the many options now available.  As a part of this:\n\t--better handling of constraints.\n\t--numerical gradients.\n\t--numerical Hessians.\n\n0.12 \tearly November 2005, post-hiatus\n\n--You now have three maximum likelihood estimators to choose from: the\nGSL's no-gradient, the GSL's with-gradient, and Mr. WN's autocalculated\ngradient. \n\n--If you haven't seen it before, the apop_distribution structure is\nincreasingly well-supported. It allows the user to specify the features\nof the Max. Likeihood model in a consistent manner which facilitates\nthings like comparing two models.\n\n--I'd still suggest taking the Waring and Yule distributions with care;\neverything else seems to check out.\n\n\n0.12 \tSeptember 2005\n**The distributions are now objects, which just provides a neat way\nof grouping together the half-dozen functions which are associated with\nany one distribution.\n\n0.11\tSeptember 2005\n--command-line server is much improved. I actually do work with it.\n--Documentation is now via doxygen.\n--asst bug fixes.\n--Have started to take plotting (via gnuplot) seriously\n--a limited test suite. Try: make test .\n\n0.10\tAugust 2005\n--This version includes a server to park itself in memory and receive data\nprocessing requests. The intent is that one can then do analysis from\nthe command line or a Perl/Python/Whatever script. The client/server\nworks in the sense of handling a handful of requests without\nsegfaulting, but remains in proof-of-concept stage. \n--Added apop_merge_db for joining databases, both via C and command line \n--Run t tests from the cmd line or the database.\n\n0.09\tJuly 2005\n--Flattened the relatively complex vasprintf subsystem from GNU, so if\nyou've been having trouble compiling on non-GNU systems, try again.\nAdded two little command-line programs. Also, added more little\nfunctions which aren't very interesting, like t-tests; maybe you'll\nstumble upon them.\n\n0.08\tMay 2005\n--OLS/GLS/MLE now properly support the apop_estimate structure \n--Column names\n\n0.07\tApril 2005\n--uses the apop_estimate structure to return heaps of data from regressions & MLEs\n--uses the apop_model structure\n\n0.06\n--var(x), skew(x), kurtosis(x) added to SQL understood by Apophenia.\n\n0.05\n--added a little crosstab utility\n--queries now accept printf-type arguments. \n\t==>GNU vasprintf was added.\n\t\t==>updated to work with autoconf 1.7\n"
  },
  {
    "path": "README",
    "content": "Apophenia is an open statistical library for working with data sets and statistical or simulation models. It provides functions on the same level as those of the typical stats package (such as OLS, probit, or singular value decomposition) but gives the user more flexibility to be creative in model-building. Being in C, it is often an order of magnitude faster when searching for optima or running MCMC chains. The core functions are written in C, but experience has shown them to be easy to bind to Python/Julia/Perl/Ruby/&c.\n\nhttp://apophenia.info/gentle.html provides an overview of the basics of using the library. If you want to know more about the package, see the web site, http://apophenia.info, or have a look at the textbook from Princeton University Press that coevolved with Apophenia, downloadable from http://modelingwithdata.org .\n\n\nThe quick summary for installation:\n\n∙ The library depends on the GNU Scientific Library and SQLite3. If you are using a system with a package manager of some sort, there is certainly a package for them. Be sure to include both the main package and the lib-, -dev, or -devel package. Sample package manager calls:\n\n    sudo apt-get install make gcc libgsl0-dev libsqlite3-dev \nor \n    sudo yum install make gcc gsl-devel libsqlite3x-devel\nor \n    sudo pacman -S make gcc gsl sqlite \n\n∙ The prebuilt package, that has only basic prerequisites (no Autotools or m4) can be downloaded from another Git branch:\n\n    #Download the zip file, via wget or your preferred downloading method:\n    wget https://github.com/b-k/Apophenia/archive/pkg.zip\n\n    #unzip and build\n    unzip pkg.zip\n    cd Apophenia-pkg\n    ./configure\n    make\n    sudo make install\n\nOr check out the branch via git:\n\n    git clone https://github.com/b-k/Apophenia.git\n    cd Apophenia\n    git checkout pkg\n    ./configure\n    make\n    sudo make install\n\n∙ This master branch of the git repository requires Autotools, so it can build the\npackage. Try (apt-get || yum install) autoconf automake libtool. If you have Autotools installed, then from this branch you can run:\n\n    ./configure\n    cd apophenia-1.0\n    make \n    sudo make install\n\n∙ Find detailed setup instructions and some troubleshooting notes at\nhttp://apophenia.info/setup.html .\n\n\nThanks for your interest. I do hope that Apophenia helps you learn more from your data.\n\n--BK\n\nPS: Lawyers, please note that a file named COPYING in the install/ directory describes how this package is licensed under GPLv2.\n"
  },
  {
    "path": "apop.m4.h",
    "content": "/** \\file  */\n/* Copyright (c) 2005--2014 by Ben Klemens.  Licensed under the GPLv2; see COPYING. */\n\n/* Here are the headers for all of apophenia's functions, typedefs, static variables and\nmacros. All of these begin with the apop_ (or Apop_ or APOP_) prefix.\n\nThere used to be a series of sub-headers, but they never provided any serious\nbenefit. Please use your text editor's word-search feature to find any elements you\nmay be looking for. About a third of the file is comments and doxygen documentation,\nso syntax highlighting that distinguishes code from comments will also help to make\nthis more navigable.*/\n\n/** \\defgroup all_public Public functions, structs, and types\n\\addtogroup all_public\n@{\n*/\n\n#pragma once\n#ifdef\t__cplusplus\nextern \"C\" {\n#endif\n\n/** \\cond doxy_ignore */\n#ifndef _GNU_SOURCE\n#define  _GNU_SOURCE //for asprintf\n#endif\n\n#include <assert.h>\n#include <signal.h> //raise(SIGTRAP)\n#include <string.h>\n#include <gsl/gsl_rng.h>\n#include <gsl/gsl_matrix.h>\n\n\n            //////Optional arguments\n\n/* A means of providing more script-like means of sending arguments to a function.\n\nThese macros are intended as internal. If you are interested in using this mechanism\nin out-of-Apophenia work, grep docs/documentation.h for optionaldetails to find notes\non how these are used (Doxygen doesn't use that page),\n*/\n#define apop_varad_head(type, name) type variadic_##name(variadic_type_##name varad_in)\n\n#define apop_varad_declare(type, name, ...) \\\n    typedef struct {                        \\\n                __VA_ARGS__ ;               \\\n            } variadic_type_##name;         \\\n    apop_varad_head(type, name);\n\n#define apop_varad_var(name, value) name = varad_in.name ? varad_in.name : (value);\n#define apop_varad_link(name,...) variadic_##name((variadic_type_##name) {__VA_ARGS__})\n\n/** \\endcond */ //End of Doxygen ignore.\n\n\n            //////The types and functions that act on them\n\n/** This structure holds the names of the components of the \\ref apop_data set. You may never have to worry about it directly, because most operations on \\ref apop_data sets will take care of the names for you.\n*/\ntypedef struct{\n    char *title;\n\tchar * vector;\n\tchar ** col;\n\tchar ** row;\n\tchar ** text;\n\tint colct, rowct, textct;\n    unsigned long *colhash, *rowhash, *texthash;\n} apop_name;\n\n/** The \\ref apop_data structure represents a data set. See \\ref dataoverview.*/\ntypedef struct apop_data{\n    gsl_vector  *vector;\n    gsl_matrix  *matrix;\n    apop_name   *names;\n    char        ***text;\n    size_t      textsize[2];\n    gsl_vector  *weights;\n    struct apop_data   *more;\n    char        error;\n} apop_data;\n\n/* Settings groups. For internal use only; see apop_settings.c and \n   settings.h for related machinery. */\ntypedef struct {\n    char name[101];\n    unsigned long name_hash;\n    void *setting_group;\n    void *copy;\n    void *free;\n} apop_settings_type;\n\n/** A statistical model. See \\ref modelsec for details. */\ntypedef struct apop_model apop_model;\n\n/** The elements of the \\ref apop_model type, representing a statistical model. See \\ref\n modelsec and \\ref modeldetails for use and details.  */\nstruct apop_model{\n    char name[101]; \n    int vsize, msize1, msize2, dsize;\n    apop_data *data;\n    apop_data *parameters;\n    apop_data *info;\n    void (*estimate)(apop_data * data, apop_model *params); \n    long double (*p)(apop_data *d, apop_model *params);\n    long double (*log_likelihood)(apop_data *d, apop_model *params);\n    long double (*cdf)(apop_data *d, apop_model *params);\n    long double (*constraint)(apop_data *data, apop_model *params);\n    int (*draw)(double *out, gsl_rng* r, apop_model *params);\n    void (*prep)(apop_data *data, apop_model *params);\n    apop_settings_type *settings;\n    void *more;\n    size_t more_size;\n    char error;\n};\n\n/** The global options. */\ntypedef struct{\n    int verbose; /**< Set this to zero for silent mode, one for errors and warnings. default = 0. */\n    char stop_on_warning; /**< See \\ref debugging . */\n    char output_delimiter[100]; /**< The separator between elements of output tables. The default is \"\\t\", but \n                                for LaTeX, use \"&\\t\", or use \"|\" to get pipe-delimited output. */\n    char input_delimiters[100]; /**< Deprecated. Please use per-function inputs to \\ref apop_text_to_db and \\ref apop_text_to_data. Default = \"|,\\t\" */\n    char *db_name_column; /**< If not NULL or <tt>\"\"</tt>, the name of the column in your tables that holds row names.*/\n    char *nan_string; /**< The string used to indicate NaN. Default: <tt>\"NaN</tt>. Comparisons are case-insensitive.*/\n    char db_engine; /**< If this is 'm', use mySQL, else use SQLite. */\n    char db_user[101]; /**< Username for database login. Max 100 chars.  */\n    char db_pass[101]; /**< Password for database login. Max 100 chars.  */\n    FILE *log_file;  /**< The file handle for the log. Defaults to \\c stderr, but change it with, e.g.,\n                           <tt>apop_opts.log_file = fopen(\"outlog\", \"w\");</tt> */\n\n#define Autoconf_no_atomics @Autoconf_no_atomics@\n\n    #if __STDC_VERSION__ > 201100L && !defined(__STDC_NO_ATOMICS__) && Autoconf_no_atomics==0\n        _Atomic(int) rng_seed;\n    #else\n        int rng_seed;\n    #endif\n    float version;\n} apop_opts_type;\n\nextern apop_opts_type apop_opts;\n\napop_name * apop_name_alloc(void);\nint apop_name_add(apop_name * n, char const *add_me, char type);\nvoid  apop_name_free(apop_name * free_me);\nvoid  apop_name_print(apop_name * n);\nApop_var_declare( void  apop_name_stack(apop_name * n1, apop_name *nadd, char type1, char typeadd) )\napop_name * apop_name_copy(apop_name *in);\nint  apop_name_find(const apop_name *n, const char *findme, const char type);\n\nvoid apop_data_add_names_base(apop_data *d, const char type, char const ** names);\n\n/** Add a list of names to a data set.\n\n\\li Use this with a list of names that you type in yourself, like\n\\code\napop_data_add_names(mydata, 'c', \"age\", \"sex\", \"height\");\n\\endcode\nNotice the lack of curly braces around the list.\n\n\\li You may have an array of names, probably autogenerated, that you would like to\nadd. In this case, make certain that the last element of the array is \\c NULL, and\ncall the base function:\n\\code\nchar **[] colnames = {\"age\", \"sex\", \"height\", NULL};\napop_data_add_names_base(mydata, 'c', colnames);\n\\endcode\nBut if you forget the \\c NULL marker, this has good odds of segfaulting. You may prefer to use a \\c for loop that inserts each name in turn using \\ref apop_name_add.\n\n\\see \\ref apop_name_add, although \\ref apop_data_add_names will be more useful in most cases. \n*/\n#define apop_data_add_names(dataset, type, ...) apop_data_add_names_base((dataset), (type), (char const*[]) {__VA_ARGS__, NULL}) \n\n\n/** Free an \\ref apop_data structure.\n \n\\li As with \\c free(), it is safe to send in a \\c NULL pointer (in which case the function does nothing).\n\\li If the \\c more pointer is not \\c NULL, I will free the pointed-to data set first.\nIf you don't want to free data sets down the chain, set <tt>more=NULL</tt> before calling this.\n\\li This is actually a macro (that calls \\ref apop_data_free_base). It\nsets \\c freeme to \\c NULL when it's done, because there's nothing safe you can do with the\nfreed location, and you can later safely test conditions like <tt>if (data) ...</tt>.\n*/\n#define apop_data_free(freeme) (apop_data_free_base(freeme) ? 0 : ((freeme)= NULL))\n\nchar        apop_data_free_base(apop_data *freeme);\nApop_var_declare( apop_data * apop_data_alloc(const size_t size1, const size_t size2, const int size3) )\nApop_var_declare( apop_data * apop_data_calloc(const size_t size1, const size_t size2, const int size3) )\nApop_var_declare( apop_data * apop_data_stack(apop_data *m1, apop_data * m2, char posn, char inplace) )\napop_data ** apop_data_split(apop_data *in, int splitpoint, char r_or_c);\napop_data * apop_data_copy(const apop_data *in);\nvoid        apop_data_rm_columns(apop_data *d, int *drop);\nvoid apop_data_memcpy(apop_data *out, const apop_data *in);\nApop_var_declare( double * apop_data_ptr(apop_data *data, int row, int col, const char *rowname, const char *colname, const char *page) )\nApop_var_declare( double apop_data_get(const apop_data *data, size_t row, int  col, const char *rowname, const char *colname, const char *page) )\nApop_var_declare( int apop_data_set(apop_data *data, size_t row, int col, const double val, const char *rowname, const char * colname, const char *page) )\nvoid apop_data_add_named_elmt(apop_data *d, char *name, double val);\nint apop_text_set(apop_data *in, const size_t row, const size_t col, const char *fmt, ...);\napop_data * apop_text_alloc(apop_data *in, const size_t row, const size_t col);\nvoid apop_text_free(char ***freeme, int rows, int cols);\nApop_var_declare( apop_data * apop_data_transpose(apop_data *in, char transpose_text, char inplace) )\ngsl_matrix * apop_matrix_realloc(gsl_matrix *m, size_t newheight, size_t newwidth);\ngsl_vector * apop_vector_realloc(gsl_vector *v, size_t newheight);\n\n#define apop_data_prune_columns(in, ...) apop_data_prune_columns_base((in), (char *[]) {__VA_ARGS__, NULL})\napop_data* apop_data_prune_columns_base(apop_data *d, char **colnames);\n\nApop_var_declare( apop_data * apop_data_get_page(const apop_data * data, const char * title, const char match) )\napop_data * apop_data_add_page(apop_data * dataset, apop_data *newpage,const char *title);\nApop_var_declare( apop_data* apop_data_rm_page(apop_data * data, const char *title, const char free_p) )\nApop_var_declare( apop_data * apop_data_rm_rows(apop_data *in, int *drop, int (*do_drop)(apop_data* ! void*), void* drop_parameter) )\n\n//in apop_asst.c:\nApop_var_declare( apop_data * apop_model_draws(apop_model *model, int count, apop_data *draws) )\n\n\n/* Convenience functions to convert among vectors (gsl_vector), matrices (gsl_matrix), \n  arrays (double **), and database tables */\n\n//From vector\ngsl_vector *apop_vector_copy(const gsl_vector *in);\nApop_var_declare( gsl_matrix * apop_vector_to_matrix(const gsl_vector *in, char row_col) )\n\n//From matrix\ngsl_matrix *apop_matrix_copy(const gsl_matrix *in);\nApop_var_declare( apop_data *apop_db_to_crosstab(char const*tabname, char const*row, char const*col, char const*data, char is_aggregate) )\n\n//From array\nApop_var_declare( gsl_vector * apop_array_to_vector(double *in, int size) )\n/** \\cond doxy_ignore */   //Deprecated\n#define apop_text_add apop_text_set\n#define apop_line_to_vector apop_array_to_vector\n/** \\endcond */\n\n//From text\nApop_var_declare( apop_data * apop_text_to_data(char const *text_file, int has_row_names, int has_col_names, int const *field_ends, char const *delimiters) )\nApop_var_declare( int apop_text_to_db(char const *text_file, char *tabname, int has_row_names, int has_col_names, char **field_names, int const *field_ends, apop_data *field_params, char *table_params, char const *delimiters, char if_table_exists) )\n\n//rank data\napop_data *apop_data_rank_expand (apop_data *in);\nApop_var_declare( apop_data *apop_data_rank_compress (apop_data *in, int min_bins) )\n\n//From crosstabs\nvoid apop_crosstab_to_db(apop_data *in, char *tabname, char *row_col_name, \n\t\t\t\t\t\tchar *col_col_name, char *data_col_name);\n\n//packing data into a vector\nApop_var_declare( gsl_vector * apop_data_pack(const apop_data *in, gsl_vector *out, char more_pages, char use_info_pages) )\nApop_var_declare( void apop_data_unpack(const gsl_vector *in, apop_data *d, char use_info_pages) )\n\n#define apop_vector_fill(avfin, ...) apop_vector_fill_base((avfin), (double []) {__VA_ARGS__})\n#define apop_data_fill(adfin, ...) apop_data_fill_base((adfin), (double []) {__VA_ARGS__})\n#define apop_text_fill(dataset, ...)   apop_text_fill_base((dataset), (char* []) {__VA_ARGS__, NULL})\n#define apop_data_falloc(sizes, ...) apop_data_fill(apop_data_alloc sizes, __VA_ARGS__)\n    \napop_data *apop_data_fill_base(apop_data *in, double []);\ngsl_vector *apop_vector_fill_base(gsl_vector *in, double []);\napop_data *apop_text_fill_base(apop_data *data, char* text[]);\n\n            //// Models and model support functions\n\nextern apop_model *apop_beta;\nextern apop_model *apop_bernoulli;\nextern apop_model *apop_binomial;\nextern apop_model *apop_chi_squared;\nextern apop_model *apop_dirichlet;\nextern apop_model *apop_exponential;\nextern apop_model *apop_f_distribution;\nextern apop_model *apop_gamma;\nextern apop_model *apop_improper_uniform;\nextern apop_model *apop_iv;\nextern apop_model *apop_kernel_density;\nextern apop_model *apop_loess;\nextern apop_model *apop_logit;\nextern apop_model *apop_lognormal;\nextern apop_model *apop_multinomial;\nextern apop_model *apop_multivariate_normal;\nextern apop_model *apop_normal;\nextern apop_model *apop_ols;\nextern apop_model *apop_pmf;\nextern apop_model *apop_poisson;\nextern apop_model *apop_probit;\nextern apop_model *apop_t_distribution;\nextern apop_model *apop_uniform;\n//extern apop_model *apop_wishart;\nextern apop_model *apop_wls;\nextern apop_model *apop_yule;\nextern apop_model *apop_zipf;\n\n//model transformations\nextern apop_model *apop_coordinate_transform;\nextern apop_model *apop_composition;\nextern apop_model *apop_dconstrain;\nextern apop_model *apop_mixture;\nextern apop_model *apop_cross;\n\n/** Alias for the \\ref apop_normal distribution, qv. */\n#define apop_gaussian apop_normal\n#define apop_OLS apop_ols\n#define apop_PMF apop_pmf\n#define apop_F_distribution apop_f_distribution\n#define apop_IV apop_iv\n\n\nvoid apop_model_free (apop_model * free_me);\nApop_var_declare( void apop_model_print (apop_model * model, FILE *output_pipe) )\nvoid apop_model_show (apop_model * print_me); //deprecated\napop_model * apop_model_copy(apop_model *in); //in apop_model.c\napop_model * apop_model_clear(apop_data * data, apop_model *model);\n\napop_model * apop_estimate(apop_data *d, apop_model *m);\nvoid apop_score(apop_data *d, gsl_vector *out, apop_model *m);\ndouble apop_log_likelihood(apop_data *d, apop_model *m);\ndouble apop_p(apop_data *d, apop_model *m);\ndouble apop_cdf(apop_data *d, apop_model *m);\nint apop_draw(double *out, gsl_rng *r, apop_model *m);\nvoid apop_prep(apop_data *d, apop_model *m);\napop_model *apop_parameter_model(apop_data *d, apop_model *m);\napop_data * apop_predict(apop_data *d, apop_model *m);\n\napop_model *apop_beta_from_mean_var(double m, double v); //in apop_beta.c\n\n#define apop_model_set_parameters(in, ...) apop_model_set_parameters_base((in), (double []) {__VA_ARGS__})\napop_model *apop_model_set_parameters_base(apop_model *in, double ap[]);\n\n//apop_mixture.c\n/** Produce a model as a linear combination of other models. See the documentation for the \\ref apop_mixture model. \n\n\\param ... A list of models, either all parameterized or all unparameterized. See\nexamples in the \\ref apop_mixture documentation.\n*/\n#define apop_model_mixture(...) apop_model_mixture_base((apop_model *[]){__VA_ARGS__, NULL})\napop_model *apop_model_mixture_base(apop_model **inlist);\n\n//transform/apop_cross.c.\n\n/** Generate a model consisting of the cross product of several independent models. The output \\ref apop_model\nis a copy of \\ref apop_cross; see that model's documentation for details.\n\n\\li If you input only one model, return a copy of that model; print a warning iff <tt>apop_opts.verbose >= 2</tt>.\n\n\\exception error=='n' First model input is \\c NULL.\n\nExamples:\n\n\\include cross_models.c\n*/\n#define apop_model_cross(...) apop_model_cross_base((apop_model *[]){__VA_ARGS__, NULL})\napop_model *apop_model_cross_base(apop_model *mlist[]);\n\n        ////More functions\n\n    //The variadic versions, with lots of options to input extra parameters to the\n    //function being mapped/applied\nApop_var_declare( apop_data * apop_map(apop_data *in, double (*fn_d)(double), double (*fn_v)(gsl_vector*),\n                double (*fn_r)(apop_data *), double (*fn_dp)(double! void *), double (*fn_vp)(gsl_vector*! void *),\n                double (*fn_rp)(apop_data *! void *), double (*fn_dpi)(double! void *! int),\n                double (*fn_vpi)(gsl_vector*! void *! int), double (*fn_rpi)(apop_data*! void *! int),\n                double (*fn_di)(double! int), double (*fn_vi)(gsl_vector*! int), double (*fn_ri)(apop_data*! int),\n                void *param, int inplace, char part, int all_pages) )\nApop_var_declare( double apop_map_sum(apop_data *in, double (*fn_d)(double), double (*fn_v)(gsl_vector*),\n                double (*fn_r)(apop_data *), double (*fn_dp)(double! void *), double (*fn_vp)(gsl_vector*! void *),\n                double (*fn_rp)(apop_data *! void *), double (*fn_dpi)(double! void *! int),\n                double (*fn_vpi)(gsl_vector*! void *! int), double (*fn_rpi)(apop_data*! void *! int),\n                double (*fn_di)(double! int), double (*fn_vi)(gsl_vector*! int), double (*fn_ri)(apop_data*! int),\n                void *param, char part, int all_pages) )\n\n    //the specific-to-a-type versions, quicker and easier when appropriate.\ngsl_vector *apop_matrix_map(const gsl_matrix *m, double (*fn)(gsl_vector*));\ngsl_vector *apop_vector_map(const gsl_vector *v, double (*fn)(double));\nvoid apop_matrix_apply(gsl_matrix *m, void (*fn)(gsl_vector*));\nvoid apop_vector_apply(gsl_vector *v, void (*fn)(double*));\ngsl_matrix * apop_matrix_map_all(const gsl_matrix *in, double (*fn)(double));\nvoid apop_matrix_apply_all(gsl_matrix *in, void (*fn)(double *));\n\ndouble apop_vector_map_sum(const gsl_vector *in, double(*fn)(double));\ndouble apop_matrix_map_sum(const gsl_matrix *in, double (*fn)(gsl_vector*));\ndouble apop_matrix_map_all_sum(const gsl_matrix *in, double (*fn)(double));\n\n\n        // Some output routines\nApop_var_declare( void apop_matrix_print(const gsl_matrix *data, char const *output_name, FILE *output_pipe, char output_type, char output_append) )\nApop_var_declare( void apop_vector_print(gsl_vector *data, char const *output_name, FILE *output_pipe, char output_type, char output_append) )\nApop_var_declare( void apop_data_print(const apop_data *data, char const *output_name, FILE *output_pipe, char output_type, char output_append) )\n\nvoid apop_matrix_show(const gsl_matrix *data);\nvoid apop_vector_show(const gsl_vector *data);\nvoid apop_data_show(const apop_data *data);\n\n\n        //statistics\nApop_var_declare( double apop_vector_mean(gsl_vector const *v, gsl_vector const *weights))\nApop_var_declare( double apop_vector_var(gsl_vector const *v, gsl_vector const *weights))\nApop_var_declare( double apop_vector_skew_pop(gsl_vector const *v, gsl_vector const *weights))\nApop_var_declare( double apop_vector_kurtosis_pop(gsl_vector const *v, gsl_vector const *weights))\nApop_var_declare( double apop_vector_cov(gsl_vector const *v1, gsl_vector const *v2,\n                                         gsl_vector const *weights))\n\nApop_var_declare( double apop_vector_distance(const gsl_vector *ina, const gsl_vector *inb, const char metric, const double norm) )\n\nApop_var_declare( void apop_vector_normalize(gsl_vector *in, gsl_vector **out, const char normalization_type) )\n\napop_data * apop_data_covariance(const apop_data *in);\napop_data * apop_data_correlation(const apop_data *in);\nlong double apop_vector_entropy(gsl_vector *in);\nlong double apop_matrix_sum(const gsl_matrix *m);\ndouble apop_matrix_mean(const gsl_matrix *data);\nvoid apop_matrix_mean_and_var(const gsl_matrix *data, double *mean, double *var);\napop_data * apop_data_summarize(apop_data *data);\nApop_var_declare( double * apop_vector_percentiles(gsl_vector *data, char rounding)  )\n\napop_data *apop_test_fisher_exact(apop_data *intab); //in apop_fisher.c\n\n//from apop_t_f_chi.c:\nApop_var_declare( int apop_matrix_is_positive_semidefinite(gsl_matrix *m, char semi) )\ndouble apop_matrix_to_positive_semidefinite(gsl_matrix *m);\nlong double apop_multivariate_gamma(double a, int p);\nlong double apop_multivariate_lngamma(double a, int p);\n\n//apop_tests.c\napop_data *\tapop_t_test(gsl_vector *a, gsl_vector *b);\napop_data *\tapop_paired_t_test(gsl_vector *a, gsl_vector *b);\nApop_var_declare( apop_data* apop_anova(char *table, char *data, char *grouping1, char *grouping2) )\n#define apop_ANOVA apop_anova\nApop_var_declare( apop_data * apop_f_test (apop_model *est, apop_data *contrast) )\n#define apop_F_test apop_f_test\n\n//from the regression code:\n#define apop_estimate_r_squared(in) apop_estimate_coefficient_of_determination(in)\n\napop_data * apop_text_unique_elements(const apop_data *d, size_t col);\ngsl_vector * apop_vector_unique_elements(const gsl_vector *v);\nApop_var_declare( apop_data * apop_data_to_factors(apop_data *data, char intype, int incol, int outcol) )\nApop_var_declare( apop_data * apop_data_get_factor_names(apop_data *data, int col, char type) )\n\nApop_var_declare( apop_data * apop_data_to_dummies(apop_data *d, int col, char type, int keep_first, char append, char remove) )\n\nApop_var_declare( long double apop_model_entropy(apop_model *in, int draws) )\nApop_var_declare( long double apop_kl_divergence(apop_model *from, apop_model *to, int draw_ct, gsl_rng *rng) )\n\napop_data *apop_estimate_coefficient_of_determination (apop_model *);\nvoid apop_estimate_parameter_tests (apop_model *est);\n\n//Bootstrapping & RNG\napop_data * apop_jackknife_cov(apop_data *data, apop_model *model);\nApop_var_declare( apop_data * apop_bootstrap_cov(apop_data *data, apop_model *model, gsl_rng* rng, int iterations, char keep_boots, char ignore_nans, apop_data **boot_store) )\ngsl_rng *apop_rng_alloc(int seed);\ndouble apop_rng_GHgB3(gsl_rng * r, double* a); //in apop_asst.c\n\n#define apop_rng_get_thread(thread_in) apop_rng_get_thread_base(#thread_in[0]=='\\0' ? -1: (thread_in+0))\ngsl_rng *apop_rng_get_thread_base(int thread);\n\nint apop_arms_draw (double *out, gsl_rng *r, apop_model *m);\n\n\n    // maximum likelihod estimation related functions\n\nApop_var_declare( gsl_vector * apop_numerical_gradient(apop_data * data, apop_model* model, double delta) )\nApop_var_declare( apop_data * apop_model_hessian(apop_data * data, apop_model *model, double delta) )\nApop_var_declare( apop_data * apop_model_numerical_covariance(apop_data * data, apop_model *model, double delta) )\n\nvoid apop_maximum_likelihood(apop_data * data, apop_model *dist);\n\nApop_var_declare( apop_model * apop_estimate_restart (apop_model *e, apop_model *copy, char * starting_pt, double boundary) )\n\n//in apop_linear_constraint.c\nApop_var_declare( long double  apop_linear_constraint(gsl_vector *beta, apop_data * constraint, double margin) )\n\n//in apop_model_fix_params.c\napop_model * apop_model_fix_params(apop_model *model_in);\napop_model * apop_model_fix_params_get_base(apop_model *model_in);\n\n\n\n            //////vtables\n/** \\cond doxy_ignore */\n\n/* This declares the vtable macros for each procedure that uses the mechanism.\n\n--We want to have type-checking on the functions put into the vtables. Type checking\nhappens only with functions, not macros, so we need a type_check function for every\nvtable.\n\n--Only once in your codebase, you'll need to #define Declare_type_checking_fns to\nactually define the type checking function. Everywhere else, the function is merely\ndeclared.\n\n--All other uses point to having a macro, such as using __VA_ARGS__ to allow any sort\nof inputs to the hash.\n\n--We want to have such a macro for every vtable. That means that we need a macro\nto write macros. We can't do that with C macros, so this file uses m4 macros to\ngenerate C macros.\n\n--After the m4 definition of make_vtab_fns, each new vtable requires a typedef, a hash\ndefinition, and a call to make_vtab_fns to do the rest.\n*/\nm4_define(make_vtab_fns, <|m4_dnl\n#ifdef Declare_type_checking_fns\nvoid $1_type_check($1_type in){ };\n#else\nvoid $1_type_check($1_type in);\n#endif\n#define $1_vtable_add(fn, ...) $1_type_check(fn), apop_vtable_add(\"$1\", fn, $1_hash(__VA_ARGS__))\n#define $1_vtable_get(...) apop_vtable_get(\"$1\", $1_hash(__VA_ARGS__))\n#define $1_vtable_drop(...) apop_vtable_drop(\"$1\", $1_hash(__VA_ARGS__))m4_dnl\n|>)\n\nint apop_vtable_add(char const *tabname, void *fn_in, unsigned long hash);\nvoid *apop_vtable_get(char const *tabname, unsigned long hash);\nint apop_vtable_drop(char const *tabname, unsigned long hash);\n\ntypedef apop_model *(*apop_update_type)(apop_data *, apop_model* , apop_model*);\n#define apop_update_hash(m1, m2) (          \\\n           ((m1)->log_likelihood ? (size_t)(m1)->log_likelihood : \\\n            (m1)->p              ? (size_t)(m1)->p*33 : \\\n            (m1)->draw           ? (size_t)(m1)->draw*33*27 \\\n                                 : 33*27*19) \\\n          +((m2)->log_likelihood ? (size_t)(m2)->log_likelihood : \\\n            (m2)->p              ? (size_t)(m2)->p*33 : \\\n            (m2)->draw           ? (size_t)(m2)->draw*33*27 \\\n                                 : 33*27*19 \\\n           ) * 37)\nmake_vtab_fns(apop_update)\n\ntypedef long double (*apop_entropy_type)(apop_model *model);\n#define apop_entropy_hash(m1) ((size_t)(m1)->log_likelihood + 33 * (size_t)((m1)->p) + 27*(size_t)((m1)->draw))\nmake_vtab_fns(apop_entropy)\n\ntypedef void (*apop_score_type)(apop_data *d, gsl_vector *gradient, apop_model *params);\n#define apop_score_hash(m1) ((size_t)((m1)->log_likelihood ? (m1)->log_likelihood : (m1)->p))\nmake_vtab_fns(apop_score)\n\ntypedef apop_model* (*apop_parameter_model_type)(apop_data *, apop_model *);\n#define apop_parameter_model_hash(m1) ((size_t)((m1)->log_likelihood ? (m1)->log_likelihood : (m1)->p)*33 + (m1)->estimate ? (size_t)(m1)->estimate: 27)\nmake_vtab_fns(apop_parameter_model)\n\ntypedef apop_data * (*apop_predict_type)(apop_data *d, apop_model *params);\n#define apop_predict_hash(m1) ((size_t)((m1)->log_likelihood ? (m1)->log_likelihood : (m1)->p)*33 + (m1)->estimate ? (size_t)(m1)->estimate: 27)\nmake_vtab_fns(apop_predict)\n\ntypedef void (*apop_model_print_type)(apop_model *params, FILE *out);\n#define apop_model_print_hash(m1) ((m1)->log_likelihood ? (size_t)(m1)->log_likelihood : \\\n            (m1)->p ? (size_t)(m1)->p*33 : \\\n            (m1)->estimate ? (size_t)(m1)->estimate*33*33 : \\\n            (m1)->draw ? (size_t)(m1)->draw*33*27  : \\\n            (m1)->cdf ? (size_t)(m1)->cdf*27*27  \\\n            : 27)\nmake_vtab_fns(apop_model_print)\n\n/** \\endcond */ //End of Doxygen ignore.\n\n\n        //////Asst\n\nlong double apop_generalized_harmonic(int N, double s) __attribute__ ((__pure__));\n\napop_data * apop_test_anova_independence(apop_data *d);\n#define apop_test_ANOVA_independence(d) apop_test_anova_independence(d)\n\nApop_var_declare( int apop_regex(const char *string, const char* regex, apop_data **substrings, const char use_case) )\n\nint apop_system(const char *fmt, ...) __attribute__ ((format (printf,1,2)));\n\n//Histograms and PMFs\ngsl_vector * apop_vector_moving_average(gsl_vector *, size_t);\napop_data * apop_histograms_test_goodness_of_fit(apop_model *h0, apop_model *h1);\napop_data * apop_test_kolmogorov(apop_model *m1, apop_model *m2);\napop_data *apop_data_pmf_compress(apop_data *in);\nApop_var_declare( apop_data * apop_data_to_bins(apop_data const *indata, apop_data const *binspec, int bin_count, char close_top_bin) )\nApop_var_declare( apop_model * apop_model_to_pmf(apop_model *model, apop_data *binspec, long int draws, int bin_count) )\n\n//text conveniences\nApop_var_declare( char* apop_text_paste(apop_data const*strings, char *between, char *before, char *after, char *between_cols, int (*prune)(apop_data* ! int ! int ! void*), void* prune_parameter) )\n/** Notify the user of errors, warning, or debug info. \n\nwrites to \\ref apop_opts.log_file, which is a \\c FILE handle. The default is \\c stderr,\nbut use \\c fopen to attach to a file.\n\n \\param verbosity   At what verbosity level should the user be warned? E.g., if level==2, then print iff apop_opts.verbosity >= 2.\n \\param ... The message to write to the log (presuming the verbosity level is high\nenough). This can be a printf-style format with following arguments, \ne.g., <tt>apop_notify(0, \"Beta is currently %g\", beta)</tt>.\n*/\n#define Apop_notify(verbosity, ...) {\\\n    if (apop_opts.verbose != -1 && apop_opts.verbose >= verbosity) {  \\\n        if (!apop_opts.log_file) apop_opts.log_file = stderr; \\\n        fprintf(apop_opts.log_file, \"%s: \", __func__); fprintf(apop_opts.log_file, __VA_ARGS__); fprintf(apop_opts.log_file, \"\\n\");   \\\n        fflush(apop_opts.log_file); \\\n} }\n\n/** \\cond doxy_ignore */\n#define Apop_maybe_abort(level) \\\n            {if ((apop_opts.verbose >= level && apop_opts.stop_on_warning == 'v') \\\n                 || (apop_opts.stop_on_warning=='w') ) \\\n                raise(SIGTRAP);}\n/** \\endcond */\n\n/** Execute an action and print a message to the current \\c FILE handle held by <tt>apop_opts.log_file</tt> (default: \\c stderr).\n \n\\param test The expression that, if true, triggers the action.\n\\param onfail If the assertion fails, do this. E.g., <tt>out->error='x'; return GSL_NAN</tt>. Notice that it is OK to include several lines of semicolon-separated code here, but if you have a lot to do, the most readable option may be <tt>goto outro</tt>, plus an appropriately-labeled section at the end of your function.\n\\param level Print the warning message only if \\ref apop_opts_type \"apop_opts.verbose\" is greater than or equal to this. Zero usually works, but for minor infractions use one, or for more verbose debugging output use 2.\n\\param ... The error message in printf form, plus any arguments to be inserted into the printf string. I'll provide the function name and a carriage return.\n\nSome examples:\n\n\\code\n//the typical case, stopping function execution:\nApop_stopif(isnan(x), return NAN, 0, \"x is NAN; failing\");\n\n//Mark a flag, go to a cleanup step\nApop_stopif(x < 0, needs_cleanup=1; goto cleanup, 0, \"x is %g; cleaning up and exiting.\", x);\n\n//Print a diagnostic iff <tt>apop_opts.verbose>=1</tt> and continue\nApop_stopif(x < 0,  , 1, \"warning: x is %g.\", x);\n\\endcode\n\n\\li If \\c apop_opts.stop_on_warning is nonzero and not <tt>'v'</tt>, then a failed test halts via \\c abort(), even if the <tt>apop_opts.verbose</tt> level is set so that the warning message doesn't print to screen. Use this when running via debugger.\n\\li If \\c apop_opts.stop_on_warning is <tt>'v'</tt>, then a failed test halts via \\c abort() iff the verbosity level is high enough to print the error.\n*/\n#define Apop_stopif(test, onfail, level, ...) do {\\\n     if (test) {  \\\n        Apop_notify(level,  __VA_ARGS__);   \\\n        Apop_maybe_abort(level)  \\\n        onfail;  \\\n    } } while(0)\n\n#define apop_errorlevel -5\n\n/** \\cond doxy_ignore */\n//For use in stopif, to return a blank apop_data set with an error attached.\n#define apop_return_data_error(E) {apop_data *out=apop_data_alloc(); out->error='E'; return out;}\n\n/* The Apop_stopif macro is currently favored, but there's a long history of prior\n   error-handling setups. Consider all of the Assert... macros below to be deprecated.\n*/\n#define Apop_assert_c(test, returnval, level, ...) \\\n    Apop_stopif(!(test), return returnval, level, __VA_ARGS__)\n\n#define Apop_assert(test, ...) Apop_assert_c((test), 0, apop_errorlevel, __VA_ARGS__)\n\n//For things that return void. Transitional and deprecated at birth.\n#define Apop_assert_n(test, ...) Apop_assert_c((test),  , apop_errorlevel, __VA_ARGS__)\n#define Apop_assert_negone(test, ...) Apop_assert_c((test), -1, apop_errorlevel, __VA_ARGS__)\n/** \\endcond */ //End of Doxygen ignore.\n\n//Missing data\nApop_var_declare( apop_data * apop_data_listwise_delete(apop_data *d, char inplace) )\napop_model * apop_ml_impute(apop_data *d, apop_model* meanvar);\n\nApop_var_declare(apop_model *apop_model_metropolis(apop_data *d, gsl_rng* rng, apop_model *m))\nApop_var_declare( apop_model * apop_update(apop_data *data, apop_model *prior, apop_model *likelihood, gsl_rng *rng) )\n\nApop_var_declare( double apop_test(double statistic, char *distribution, double p1, double p2, char tail) )\n\n//apop_sort.c\nApop_var_declare( apop_data *apop_data_sort(apop_data *data, apop_data *sort_order, char asc, char inplace, double *col_order))\n\n//raking\nApop_var_declare( apop_data * apop_rake(char const *margin_table, char * const*var_list, \n                    int var_ct, char * const *contrasts, int contrast_ct, \n                    char const *structural_zeros, int max_iterations, double tolerance, \n                    char const *count_col, char const *init_table, \n                    char const *init_count_col, double nudge) )\n\n\n#include <gsl/gsl_cdf.h>\n#include <gsl/gsl_blas.h>\n#include <gsl/gsl_sf_log.h>\n#include <gsl/gsl_sf_exp.h>\n#include <gsl/gsl_linalg.h>\n#include <gsl/gsl_sf_gamma.h>\n#include <gsl/gsl_sf_psi.h>\n#include <gsl/gsl_randist.h>\n#include <gsl/gsl_histogram.h>\n#include <gsl/gsl_statistics_double.h>\n\n\n    //Some linear algebra utilities\n\ndouble apop_det_and_inv(const gsl_matrix *in, gsl_matrix **out, int calc_det, int calc_inv);\nApop_var_declare( apop_data * apop_dot(const apop_data *d1, const apop_data *d2, char form1, char form2) )\nApop_var_declare( int         apop_vector_bounded(const gsl_vector *in, long double max) )\ngsl_matrix * apop_matrix_inverse(const gsl_matrix *in) ;\ndouble      apop_matrix_determinant(const gsl_matrix *in) ;\n//apop_data*  apop_sv_decomposition(gsl_matrix *data, int dimensions_we_want);\nApop_var_declare( apop_data *  apop_matrix_pca(gsl_matrix *data, int const dimensions_we_want) )\nApop_var_declare( gsl_vector * apop_vector_stack(gsl_vector *v1, gsl_vector const * v2, char inplace) )\nApop_var_declare( gsl_matrix * apop_matrix_stack(gsl_matrix *m1, gsl_matrix const * m2, char posn, char inplace) )\n\nvoid apop_vector_log(gsl_vector *v);\nvoid apop_vector_log10(gsl_vector *v);\nvoid apop_vector_exp(gsl_vector *v);\n\n                ////Subsetting macros\n\n/** \\cond doxy_ignore */\n/** These are all deprecated.*/\n#define APOP_SUBMATRIX(m, srow, scol, nrows, ncols, o) gsl_matrix apop_mm_##o = gsl_matrix_submatrix((m), (srow), (scol), (nrows),(ncols)).matrix;\\\ngsl_matrix * o = &( apop_mm_##o );                                                  // Use \\ref Apop_subm. \n#define Apop_submatrix APOP_SUBMATRIX\n\n#define Apop_col_v(m, col, v) gsl_vector apop_vv_##v = ((col) == -1) ? (gsl_vector){} : gsl_matrix_column((m)->matrix, (col)).vector;\\\ngsl_vector * v = ((col)==-1) ? (m)->vector : &( apop_vv_##v );                      // Use \\ref Apop_cv.\n\n#define Apop_row_v(m, row, v) Apop_matrix_row((m)->matrix, row, v)                  // Use \\ref Apop_rv.\n#define Apop_rows(d, rownum, len, outd) apop_data *outd = Apop_rs(d, rownum, len)   // Use \\ref Apop_rs.\n#define Apop_row(d, row, outd) Apop_rows(d, row, 1, outd)                           // Use \\ref Apop_r.\n#define Apop_cols(d, colnum, len, outd) apop_data *outd =  Apop_cs(d, colnum, len); // Use \\ref Apop_cs.\n/** \\endcond */ //End of Doxygen ignore.\n\n/** \\def Apop_row_tv(m, row_name, v)\n After this call, \\c v will hold a \\c gsl_vector view of an \\ref apop_data set \\c m. The view will consist only of the row with name \\c row_name.\n Unlike \\ref Apop_rv, the second argument is a row name, that I'll look up using \\ref apop_name_find, and the third is the name of the view to be generated.\n\\see Apop_rs, Apop_r, Apop_rv, Apop_row_t, Apop_mrv\n*/\n#define Apop_row_tv(m, row, v) gsl_vector apop_vv_##v = gsl_matrix_row((m)->matrix, apop_name_find((m)->names, row, 'r')).vector;\\\ngsl_vector * v = &( apop_vv_##v );\n\n/** \\def Apop_col_tv(m, col_name, v)\nAfter this call, \\c v will hold a \\c gsl_vector view of the \\ref apop_data set \\c m.\nThe view will consist only of the column with name \\c col_name.\nUnlike \\ref Apop_cv, the second argument is a column name, that I'll look up using \\ref apop_name_find, and the third is the name of the view to be generated.\n\\see Apop_cs, Apop_c, Apop_cv, Apop_col_t, Apop_mcv\n*/\n#define Apop_col_tv(m, col, v) gsl_vector apop_vv_##v = gsl_matrix_column((m)->matrix, apop_name_find((m)->names, col, 'c')).vector;\\\ngsl_vector * v = &( apop_vv_##v );\n\n/** \\def Apop_row_t(m, row_name, v)\n After this call, \\c v will hold an \\ref apop_data view of an \\ref apop_data set \\c m. The view will consist only of the row with name \\c row_name.\n Unlike \\ref Apop_r, the second argument is a row name, that I'll look up using \\ref apop_name_find, and the third is the name of the view to be generated.\n\\see Apop_rs, Apop_r, Apop_rv, Apop_row_tv, Apop_mrv\n*/\n#define Apop_row_t(d, rowname, outd) int apop_row_##outd = apop_name_find((d)->names, rowname, 'r'); Apop_rows(d, apop_row_##outd, 1, outd)\n\n/** \\def Apop_col_t(m, col_name, v)\n After this call, \\c v will hold a view of the \\ref apop_data set \\c m. The view will consist only of a \\c gsl_vector view of the column of the \\ref apop_data set \\c m with name \\c col_name.\n Unlike \\ref Apop_c, the second argument is a column name, that I'll look up using \\ref apop_name_find, and the third is the name of the view to be generated.\n\\see Apop_cs, Apop_c, Apop_cv, Apop_col_tv, Apop_mcv\n*/\n#define Apop_col_t(d, colname, outd) int apop_col_##outd = apop_name_find((d)->names, colname, 'c'); Apop_cols(d, apop_col_##outd, 1, outd)\n\n// The above versions relied on gsl_views, which stick to C as of 1989 CE.\n// Better to just create the views via designated initializers.\n\n\n/** \\def Apop_subm(data_to_view, srow, scol, nrows, ncols)\nGenerate a view of a submatrix within a \\c gsl_matrix. Like \\ref Apop_r, et al., the view is an automatically-allocated variable that is lost once the program flow leaves the scope in which it is declared.\n\n\\param data_to_view The root matrix\n\\param srow the first row (in the root matrix) of the top of the submatrix\n\\param scol the first column (in the root matrix) of the left edge of the submatrix\n\\param nrows number of rows in the submatrix\n\\param ncols number of columns in the submatrix\n\\return An automatically-allocated view of type \\c gsl_matrix.\n*/\n#define Apop_subm(matrix_to_view, srow, scol, nrows, ncols)(                  \\\n        (!(matrix_to_view)                                                   \\\n            || (matrix_to_view)->size1 < (srow)+(nrows) || (srow) < 0        \\\n            || (matrix_to_view)->size2 < (scol)+(ncols) || (scol) < 0) ? NULL \\\n        : &(gsl_matrix){.size1=(nrows), .size2=(ncols),                         \\\n             .tda=(matrix_to_view)->tda,                                  \\\n             .data=gsl_matrix_ptr((matrix_to_view), (srow), (scol))}      \\\n        )\n\n/** Get a vector view of a single row of a \\ref gsl_matrix.\n\n\\param matrix_to_vew A \\ref gsl_matrix.\n\\param row An integer giving the row to be viewed.\n\\return A \\c gsl_vector view of the given row. The view is automatically allocated,\n  and disappears as soon as the program leaves the scope in which it is declared.\n\nSee \\ref apop_vector_correlation for an example of use.\n\\see Apop_r, Apop_rv\n*/\n#define Apop_mrv(matrix_to_view, row) Apop_rv(&(apop_data){.matrix=matrix_to_view}, row)\n\n/** Get a vector view of a single column of a \\ref gsl_matrix.\n\n\\param matrix_to_vew A \\ref gsl_matrix.\n\\param row An integer giving the column to be viewed.\n\\return A \\c gsl_vector view of the given column. The view is automatically allocated,\n  and disappears as soon as the program leaves the scope in which it is declared.\n\n\\code \ngsl_matrix *m = apop_query_to_data(\"select col1, col2, col3 from data\")->matrix;\nprintf(\"The correlation coefficient between columns two \"\n       \"and three is %g.\\n\", apop_vector_correlation(Apop_mcv(m, 2), Apop_mcv(m, 3)));\n\\endcode \n\n\\see Apop_r, Apop_cv\n*/\n#define Apop_mcv(matrix_to_view, col) Apop_cv(&(apop_data){.matrix=matrix_to_view}, col)\n\n/** \\def Apop_rv(d, row)\nA macro to generate a temporary one-row view of the matrix in an \\ref apop_data set \\c d, pulling out only\nrow \\c row. The view is a \\c gsl_vector set.\n\n\\code\ngsl_vector *v = Apop_rv(your_data, i);\n\nfor (int i=0; i< your_data->matrix->size1; i++)\n    printf(\"Σ_%i = %g\\n\", i, apop_vector_sum(Apop_r(your_data, i)));\n\\endcode\n\nThe view is automatically allocated, and disappears as soon as the program leaves the scope in which it is declared.\n\\see Apop_r, Apop_rv, Apop_row_tv, Apop_row_t, Apop_mrv\n*/\n#define Apop_rv(data_to_view, row) (                                            \\\n        ((data_to_view) == NULL || (data_to_view)->matrix == NULL               \\\n            || (data_to_view)->matrix->size1 <= (row) || (row) < 0) ? NULL        \\\n        : &(gsl_vector){.size=(data_to_view)->matrix->size2,                    \\\n             .stride=1, .data=gsl_matrix_ptr((data_to_view)->matrix, (row), 0)} \\\n        )\n\n/** \\def Apop_cv(d, col)\nA macro to generate a temporary one-column view of the matrix in an \\ref apop_data\nset \\c d, pulling out only column \\c col. The view is a \\c gsl_vector set.\n\nAs usual, column -1 is the vector element of the \\ref apop_data set.\n\n\\code\ngsl_vector *v = Apop_cv(your_data, i);\n\nfor (int i=0; i< your_data->matrix->size2; i++)\n    printf(\"Σ_%i = %g\\n\", i, apop_vector_sum(Apop_c(your_data, i)));\n\\endcode\n\nThe view is automatically allocated, and disappears as soon as the program leaves the\nscope in which it is declared.\n\n\\see Apop_cs, Apop_c, Apop_col_tv, Apop_col_t, Apop_mcv\n*/\n#define Apop_cv(data_to_view, col) (                                           \\\n          !(data_to_view) ? NULL                                               \\\n        : (col)==-1       ? (data_to_view)->vector                             \\\n        : (!(data_to_view)->matrix                                             \\\n            || (data_to_view)->matrix->size2 <= (col) || ((int)(col)) < -1) ? NULL    \\\n        : &(gsl_vector){.size=(data_to_view)->matrix->size1,                   \\\n             .stride=(data_to_view)->matrix->tda, .data=gsl_matrix_ptr((data_to_view)->matrix, 0, (col))} \\\n        )\n\n/** \\cond doxy_ignore */\n/* Not (yet) for public use. */\n#define Apop_subvector(v, start, len) (                                          \\\n        ((v) == NULL || (v)->size < ((start)+(len)) || (start) < 0) ? NULL      \\\n        : &(gsl_vector){.size=(len), .stride=(v)->stride, .data=(v)->data+(start*(v)->stride)})\n/** \\endcond */\n\n/** \\def Apop_rs(d, row, len)\nA macro to generate a temporary view of \\ref apop_data set \\c d pulling only certain rows, beginning at row \\c row\nand having height \\c len. \n\nThe view is automatically allocated, and disappears as soon as the program leaves the scope in which it is declared.\n\\see Apop_r, Apop_rv, Apop_row_tv, Apop_row_t, Apop_mrv\n*/\n#define Apop_rs(d, rownum, len)(                                                 \\\n        (!(d) || (rownum) < 0) ? NULL                                            \\\n        : &(apop_data){                                                          \\\n         .names= ( !((d)->names) ? NULL :                                        \\\n            &(apop_name){                                                        \\\n                .title = (d)->names->title,                                      \\\n                .vector = (d)->names->vector,                                    \\\n                .col = (d)->names->col,                                          \\\n                .row = ((d)->names->row && (d)->names->rowct > (rownum)) ? &((d)->names->row[rownum]) : NULL,  \\\n                .texthash = (d)->names->texthash,                                \\\n                .rowhash = ((d)->names->rowhash && (d)->names->rowct > (rownum)) ? &((d)->names->rowhash[rownum]) : NULL,  \\\n                .colhash = (d)->names->colhash,                                  \\\n                .text = (d)->names->text,                                        \\\n                .colct = (d)->names->colct,                                      \\\n                .rowct = (d)->names->row ? (GSL_MIN(1, GSL_MAX((d)->names->rowct - (int)(rownum), 0)))      \\\n                                          : 0,                                   \\\n                .textct = (d)->names->textct }),                                 \\\n        .vector= Apop_subvector((d->vector), (rownum), (len)),                   \\\n        .matrix = Apop_subm(((d)->matrix), (rownum), 0,  (len), (d)->matrix?(d)->matrix->size2:0),    \\\n        .weights =  Apop_subvector(((d)->weights), (rownum), (len)),             \\\n        .textsize[0]=(d)->textsize[0]> (rownum)+(len)-1 ? (len) : 0,                                   \\\n        .textsize[1]=(d)->textsize[1],                                           \\\n        .text = (d)->text ? &((d)->text[rownum]) : NULL,                         \\\n        })\n\n\n/** \\def Apop_cs(d, col, len)\nA macro to generate a temporary view of \\ref apop_data set \\c d including only certain columns, beginning at column \\c col and having length \\c len. \n\nThe view is automatically allocated, and disappears as soon as the program leaves the scope in which it is declared.\n\\see Apop_c, Apop_cv, Apop_col_tv, Apop_col_t, Apop_mcv\n*/\n#define Apop_cs(d, colnum, len) ( \\\n            (!(d)||!(d)->matrix || (d)->matrix->size2 <= (colnum)+(len)-1)       \\\n             ? NULL                                                              \\\n             : &(apop_data){                                                     \\\n                .vector= NULL,                                                   \\\n                .weights= (d)->weights,                                          \\\n                .matrix = Apop_subm((d)->matrix, 0, colnum, (d)->matrix->size1, (len)),\\\n                .textsize[0] = 0,                                                \\\n                .textsize[1] = 0,                                                \\\n                .text = NULL,                                                    \\\n                .names= (d)->names ? &(apop_name){                                                         \\\n                    .title = (d)->names->title,                                      \\\n                    .vector = NULL,                                                  \\\n                    .row = (d)->names->row,                                          \\\n                    .col = ((d)->names->col && (d)->names->colct > colnum) ? &((d)->names->col[colnum]) : NULL,  \\\n                    .text = NULL,                                                    \\\n                    .texthash = NULL,                                                \\\n                    .rowhash = (d)->names->rowhash,                                  \\\n                    .colhash = ((d)->names->colhash && (d)->names->colct > (colnum)) ? &((d)->names->colhash[colnum]) : NULL,  \\\n                    .rowct = (d)->names->rowct,                                      \\\n                    .colct = (d)->names->col ? (GSL_MIN(len, GSL_MAX((d)->names->colct - colnum, 0)))      \\\n                                              : 0,                                   \\\n                    .textct = (d)->names->textct } : NULL \\\n            })\n\n/** \\def Apop_r(d, row)\nA macro to generate a temporary one-row view of \\ref apop_data set \\c d, pulling out only\nrow \\c row. The view is also an \\ref apop_data set, with names and other decorations.\n\\code\n//pull a single row\napop_data *v = Apop_r(your_data, 7);\n\n//or loop through a sequence of one-row data sets.\napop_model *std = apop_model_set_parameters(apop_normal, 0, 1);\nfor (int i=0; i< your_data->matrix->size1; i++)\n    printf(\"Std Normal CDF up to observation %i is %g\\n\",\n                       i, apop_cdf(Apop_r(your_data, i), std));\n\\endcode\n\nThe view is automatically allocated, and disappears as soon as the program leaves the\nscope in which it is declared.\n\\see Apop_rs, Apop_row_v, Apop_row_tv, Apop_row_t, Apop_mrv\n*/\n#define Apop_r(d, rownum) Apop_rs(d, rownum, 1)\n\n/** \\def Apop_c(d, col)\nA macro to generate a temporary one-column view of \\ref apop_data set \\c d, pulling out only\ncolumn \\c col. \nAfter this call, \\c outd will be a pointer to this temporary\nview, that you can use as you would any \\ref apop_data set.\n\\see Apop_cs, Apop_cv, Apop_col_tv, Apop_col_t, Apop_mcv\n*/\n#define Apop_c(d, col) Apop_cs(d, col, 1)\n\n/** \\cond doxy_ignore */\n#define APOP_COL Apop_col\n#define apop_col Apop_col\n#define APOP_COL_T Apop_col_t\n#define apop_col_t Apop_col_t\n#define APOP_COL_TV Apop_col_tv\n#define apop_col_tv Apop_col_tv\n\n#define APOP_ROW Apop_row\n#define apop_row Apop_row\n#define APOP_COLS Apop_cols\n#define apop_cols Apop_cols\n#define APOP_COL_V Apop_col_v\n#define apop_col_v Apop_col_v\n#define APOP_ROW_V Apop_row_v\n#define apop_row_v Apop_row_v\n#define APOP_ROWS Apop_rows\n#define apop_rows Apop_rows\n#define Apop_data_row Apop_row   #deprecated\n#define APOP_ROW_T Apop_row_t\n#define apop_row_t Apop_row_t\n#define APOP_ROW_TV Apop_row_tv\n#define apop_row_tv Apop_row_tv\n\n/** Deprecated. Use Apop_mrv */\n#define Apop_matrix_row(m, row, v) gsl_vector apop_vv_##v = gsl_matrix_row((m), (row)).vector;\\\ngsl_vector * v = &( apop_vv_##v );\n\n/* Deprecated. Use Apop_mcv */\n#define Apop_matrix_col(m, col, v) gsl_vector apop_vv_##v = gsl_matrix_column((m), (col)).vector;\\\ngsl_vector * v = &( apop_vv_##v );\n\n#define APOP_MATRIX_ROW Apop_matrix_row \n#define apop_matrix_row Apop_matrix_row \n#define APOP_MATRIX_COL Apop_matrix_col \n#define apop_matrix_col Apop_matrix_col \n/** \\endcond */\n\n\nlong double apop_vector_sum(const gsl_vector *in);\ndouble apop_vector_var_m(const gsl_vector *in, const double mean);\nApop_var_declare( double apop_vector_correlation(const gsl_vector *ina, const gsl_vector *inb, const gsl_vector *weights) )\ndouble apop_vector_kurtosis(const gsl_vector *in);\ndouble apop_vector_skew(const gsl_vector *in);\n\n#define apop_sum apop_vector_sum\n#define apop_var apop_vector_var\n#define apop_mean apop_vector_mean\n\n        //////database utilities\n\nApop_var_declare( int apop_table_exists(char const *name, char remove) )\n\nint apop_db_open(char const *filename);\nApop_var_declare( int apop_db_close(char vacuum) )\n\nint apop_query(const char *q, ...) __attribute__ ((format (printf,1,2)));\napop_data * apop_query_to_text(const char * fmt, ...) __attribute__ ((format (printf,1,2)));\napop_data * apop_query_to_data(const char * fmt, ...) __attribute__ ((format (printf,1,2)));\napop_data * apop_query_to_mixed_data(const char *typelist, const char * fmt, ...) __attribute__ ((format (printf,2,3)));\ngsl_vector * apop_query_to_vector(const char * fmt, ...) __attribute__ ((format (printf,1,2)));\ndouble apop_query_to_float(const char * fmt, ...) __attribute__ ((format (printf,1,2)));\n\nint apop_data_to_db(const apop_data *set, const char *tabname, char);\n\n\n        //////Settings groups\n\n    //Part I: macros and fns for getting/setting settings groups and elements\n\n/** \\cond doxy_ignore */\nvoid * apop_settings_get_grp(apop_model *m, char *type, char fail);\nvoid apop_settings_remove_group(apop_model *m, char *delme);\nvoid apop_settings_copy_group(apop_model *outm, apop_model *inm, char *copyme);\nvoid *apop_settings_group_alloc(apop_model *model, char *type, void *free_fn, void *copy_fn, void *the_group);\napop_model *apop_settings_group_alloc_wm(apop_model *model, char *type, void *free_fn, void *copy_fn, void *the_group);\n/** \\endcond */ //End of Doxygen ignore.\n\n/** Retrieves a settings group from a model.  See \\ref Apop_settings_get\nto just pull a single item from within the settings group.\n\nThis macro returns NULL if a group of type \\c type_settings isn't found attached\nto model \\c m, so you can easily put it in a conditional like\n  \\code \n  if (!apop_settings_get_group(m, \"apop_ols\")) ...\n  \\endcode\n\n\\param m An \\ref apop_model\n\\param type A string giving the type of the settings group you are retrieving. E.g., for an \\ref apop_mle_settings group, use only \\c apop_mle.\n\\return A void pointer to the desired struct (or \\c NULL if not found).\n*/\n#define Apop_settings_get_group(m, type) apop_settings_get_grp(m, #type, 'c')\n\n/** Removes a settings group from a model's list. \n \n\\li  If the so-named group is not found, do nothing.\n*/\n#define Apop_settings_rm_group(m, type) apop_settings_remove_group(m, #type)\n\n/** Add a settings group. The first two arguments (the model you are\nattaching to and the settings group name) are mandatory, and then you\ncan use the \\ref designated syntax to specify default values (if any).\n\\return A pointer to the newly-prepped group.\n\nSee \\ref modelsettings, \\ref maxipage, or \\ref Apop_settting_set for examples.\n\n\\li If a settings group of the given type is already attached to the model, \nthe previous version is removed. Use \\ref Apop_settings_get to check whether a group\nof the given type is already attached to a model, and \\ref Apop_settings_set to modify\nan existing group.\n*/\n#define Apop_settings_add_group(model, type, ...)  \\\n    apop_settings_group_alloc(model, #type, type ## _settings_free, type ## _settings_copy, type ##_settings_init ((type ## _settings) {__VA_ARGS__}))\n\n/** Copy a model and add a settings group. Useful for models that require a settings group to function. See \\ref Apop_settings_add_group.\n\n\\return A pointer to the newly-prepped model.\n*/\n#define apop_model_copy_set(model, type, ...)  \\\n    apop_settings_group_alloc_wm(apop_model_copy(model), #type, type ## _settings_free, type ## _settings_copy, type ##_settings_init ((type ## _settings) {__VA_ARGS__}))\n\n\n/** This is the complement to \\ref apop_model_set_parameters, for those models that are\n set up by adding settings group, rather than filling in a list of parameters.\n\nFor example, the \\ref apop_kernel_density model is built by adding a \\ref apop_kernel_density_settings group. From the example on the \\ref apop_kernel_density page:\n\n\\code\napop_model *k2 = apop_model_set_settings(apop_kernel_density,\n                    .base_data=d,\n                    .set_fn = set_uniform_edges,\n                    .kernel = apop_uniform);\n\\endcode\n\nThe name of the model and the settings group to be built must match, which is the case\nfor many model transformations, including \\ref apop_dconstrain and \\ref apop_cross. If the names do not match, use \\ref apop_model_copy_set.\n*/\n#define Apop_model_set_settings(model, ...)  \\\n    apop_settings_group_alloc_wm(apop_model_copy(model), #model, model ## _settings_free, model ## _settings_copy, model ##_settings_init ((model ## _settings) {__VA_ARGS__}))\n\n#define apop_model_set_settings Apop_model_set_settings\n\n/** Retrieves a setting from a model.  See \\ref Apop_settings_get_group to pull the entire group.\n\n\\param model An \\ref apop_model.\n\\param type A string giving the type of the settings group you are retrieving, without the \\c _settings ending. E.g., for an \\ref apop_mle_settings group, use \\c apop_mle.\n\\param setting The struct element you want to retrieve.\n*/\n#define Apop_settings_get(model, type, setting)  \\\n    (((type ## _settings *) apop_settings_get_grp(model, #type, 'f'))->setting)\n\n/** Modifies a single element of a settings group to the given value. \n\nFor example,\n\\code\n//set up a mixture of two Normals. This function initializes an apop_mixture_settings group\napop_model *mix = apop_model_mixture(apop_model_copy(apop_normal), apop_model_copy(apop_normal));\n\n//Add an apop_mle_settings group to specify the search strategy\nApop_settings_add_group(mix, apop_mle, .starting_pt=(double[]){.5, .5, 50, 5, 80, 5},\n                                           .step_size=3, .tolerance=1e-6);\n\n//The mix model now has apop_mle and apop_mixture settings groups attached. Modify them:\nApop_settings_set(mix, apop_mixture, find_weights, 'y');  //Search for optimal mixture weights\nApop_settings_set(mix, apop_mle, method, \"NM simplex\");   //Nelder-Mead simplex algorithm\napop_model *optimal_mix = apop_estimate(input_data, mix); //Everything is set up, so do the search.\n\\endcode\n\n\\li If <tt>model==NULL</tt>, fails silently. \n\\li If <tt>model!=NULL</tt> but the given settings group is not found attached to the model, set <tt>model->error='s'</tt>.\n*/\n#define Apop_settings_set(model, type, setting, data)   \\\n    do {                                                \\\n        if (!(model)) continue; /* silent fail. */      \\\n        type ## _settings *apop_tmp_settings = apop_settings_get_grp(model, #type, 'c');  \\\n        Apop_stopif(!apop_tmp_settings, (model)->error='s', 0, \"You're trying to modify a setting in \" \\\n                        #model \"'s setting group of type \" #type \" but that model doesn't have such a group.\"); \\\n    apop_tmp_settings->setting = (data);                \\\n    } while (0);\n\n/** \\cond doxy_ignore */\n#define Apop_settings_add Apop_settings_set\n#define APOP_SETTINGS_ADD Apop_settings_set\n#define apop_settings_set Apop_settings_set\n#define APOP_SETTINGS_GET Apop_settings_get\n#define apop_settings_get Apop_settings_get\n#define APOP_SETTINGS_ADD_GROUP Apop_settings_add_group\n#define apop_settings_add_group Apop_settings_add_group\n#define APOP_SETTINGS_GET_GROUP Apop_settings_get_group\n#define apop_settings_get_group Apop_settings_get_group\n#define APOP_SETTINGS_RM_GROUP Apop_settings_rm_group\n#define apop_settings_rm_group Apop_settings_rm_group\n#define Apop_model_copy_set apop_model_copy_set\n\n//deprecated:\n#define Apop_model_add_group Apop_settings_add_group\n\n/** \\endcond */ //End of Doxygen ignore.\n\n/** Put this in your header file to declare the init, copy, and\nfree functions for ysg_settings. Of course, these functions will also have to be defined\nin a .c file using \\ref Apop_settings_init, \\ref Apop_settings_copy, and \\ref Apop_settings_free. */\n#define Apop_settings_declarations(ysg) \\\n   ysg##_settings * ysg##_settings_init(ysg##_settings); \\\n   void * ysg##_settings_copy(ysg##_settings *); \\\n   void ysg##_settings_free(ysg##_settings *);\n\n/** A convenience macro for declaring the initialization function for a new settings group.\nSee \\ref settingswriting for details and an example.\n*/\n#define Apop_settings_init(name, ...)   \\\n    name##_settings *name##_settings_init(name##_settings in) {       \\\n        name##_settings *out = malloc(sizeof(name##_settings));     \\\n        *out = in; \\\n        __VA_ARGS__;            \\\n        return out; \\\n    }\n\n/** \\cond doxy_ignore */\n#define Apop_varad_set(var, value) (out)->var = (in).var ? (in).var : (value);\n/** \\endcond */\n\n/** A convenience macro for declaring the copy function for a new settings group.\nSee \\ref settingswriting for details and an example.\n*/\n#define Apop_settings_copy(name, ...) \\\n    void * name##_settings_copy(name##_settings *in) {\\\n        name##_settings *out = malloc(sizeof(name##_settings)); \\\n        *out = *in; \\\n        __VA_ARGS__;    \\\n        return out;     \\\n    }\n\n/** A convenience macro for declaring the delete function for a new settings group.\nSee \\ref settingswriting for details and an example.\n*/\n#define Apop_settings_free(name, ...) \\\n    void name##_settings_free(name##_settings *in) {\\\n        __VA_ARGS__;    \\\n        free(in);  \\\n    }\n\n        //Part II: the details of extant settings groups.\n\n\n/** The settings for maximum likelihood estimation (including simulated annealing). */\ntypedef struct{\n    double      *starting_pt;   /**< An array of doubles (e.g., <tt>(double*){2,4,6,8}</tt>) suggesting a starting point. \n                                  If NULL, use an all-ones vector.  If \\c startv is a \\c gsl_vector\n                                  and is not a view of a matrix, use <tt>.starting_pt=startv->data</tt>.*/\n    char *method; /**< The method to be used for the optimization. All strings are case-insensitive.\n\n        <table>\n<tr>\n<td> String <td></td> Name  <td></td>  Notes\n</td> </tr>\n                                     \n<tr><td> \"NM simplex\" </td><td> Nelder-Mead simplex </td><td> Does not use gradients at all. Can sometimes get stuck.</td></tr>\n\n<tr><td> \"FR cg\"  </td><td> Conjugate gradient (Fletcher-Reeves) (default) </td><td> CG methods use derivatives. The converge to the optimum of a quadratic function in one step; performance degrades as the objective digresses from quadratic.</td></tr>\n\n<tr><td> \"BFGS cg\" </td><td> Broyden-Fletcher-Goldfarb-Shanno conjugate gradient        </td><td>  </td></tr>\n\n<tr><td> \"PR cg\"  </td><td> Polak-Ribiere conjugate gradient  </td><td>  </td></tr>\n\n<tr><td> \"Annealing\"  </td><td> \\ref simanneal \"simulated annealing\"         </td><td> Slow but works for objectives of arbitrary complexity, including stochastic objectives.</td></tr>\n\n<tr><td> \"Newton\"</td><td> Newton's method  </td><td> Search by finding a root of the derivative. Expects that gradient is reasonably well-behaved. </td></tr>\n\n<tr><td> \"Newton hybrid\"</td><td> Newton's method/gradient descent hybrid        </td><td>  Find a root of the derivative via the Hybrid method </td> If Newton proposes stepping outside of a certain interval, use an alternate method. See <a href=\"https://www.gnu.org/software/gsl/manual/gsl-ref_35.html#SEC494\">the GSL manual</a> for discussion.</tr>\n\n<tr><td> \"Newton hybrid no scale\"</td><td>  Newton's method/gradient descent hybrid with spherical scale</td><td>  As above, but use a simplified trust region. </td></tr>\n</table> */\n    double      step_size, /**< The initial step size. */\n                tolerance, /**< The precision the minimizer uses in its stopping rule. Only vaguely related to the precision of the actual MLE.*/\ndelta;\n    int         max_iterations; /**< Ignored by simulated annealing. Other methods halt (and set the \\c \"status\" element of the output estimate's info page) if\n                                 they do this many iterations without finding an optimum. */\n    int         verbose; /**<\tGive status updates as we go.  This is orthogonal to the \n                                <tt>apop_opts.verbose</tt> setting. */\n    double      dim_cycle_tolerance; /**< If zero (the default), the usual procedure.\n                             If \\f$>0\\f$, cycle across dimensions: fix all but the first dimension at the starting\n                             point, optimize only the first dim. Then fix the all but the second dim, and optimize the\n                             second dim. Continue through all dims, until the log likelihood at the outset of one cycle\n                             through the dimensions is within this amount of the previous cycle's log likelihood. There\n                             will be at least two cycles.\n                             */\n//simulated annealing (also uses step_size);\n    int         n_tries, iters_fixed_T;\n    double      k, t_initial, mu_t, t_min ;\n    gsl_rng     *rng;\n    apop_data   **path;    /**< If not \\c NULL, record each vector tried by the optimizer as one row of this \\ref apop_data set.\n                              Each row of the \\c matrix element holds the vector tried; the corresponding element in the \\c vector is the evaluated value at that vector (after out-of-constraints penalties have been subtracted).\n                              A new \\ref apop_data set is allocated at the pointer you send in. This data set has no names; add them as desired. For a sample use, see \\ref maxipage.\n*/\n} apop_mle_settings;\n\n/** Settings for least-squares type models such as \\ref apop_ols or \\ref apop_iv */\ntypedef struct {\n    int destroy_data; /**< If \\c 'y', then the input data set may be normalized or otherwise mangled. */\n    apop_data *instruments; /**< Use for the \\ref apop_iv regression, qv. */\n    char want_cov; /**< Deprecated. Please use \\ref apop_parts_wanted_settings. */\n    char want_expected_value; /**< Deprecated. Please use \\ref apop_parts_wanted_settings. */\n    apop_model *input_distribution; /**< The distribution of \\f$P(Y|X)\\f$ is specified by the model holding this struct, but the distribution of \\f$X\\f$ needs to be specified as well for any calculation of \\f$P(Y)\\f$. See the notes in the RNG section of the \\ref apop_ols documentation. */\n} apop_lm_settings;\n\n/** The default is for the estimation routine to give some auxiliary information,\n  such as a covariance matrix, predicted values, and common hypothesis tests.\n  Some uses of a model depend on these items, but if they are a waste\n  of time for your purposes, this settings group gives a quick way to bypass them all.\n\n  Adding this settings group to your model without changing any default values---\n  \\code\n  Apop_model_add_group(your_model, apop_parts_wanted);\n  \\endcode\n  ---will turn off all of the auxiliary calculations covered, because the default value\n  for all the switches is <tt>'n'</tt>, indicating that all elements are not wanted.\n\n  From there, you can change some of the default <tt>'n'</tt>s to <tt>'y'</tt>s to retain some but not all auxiliary elements.  If you just want the parameters themselves and the covariance matrix:\n  \\code\n  Apop_model_add_group(your_model, apop_parts_wanted, .covariance='y');\n  \\endcode\n\n  \\li Not all models support this, although the models with especially compute-intensive\n  auxiliary info do (e.g., the maximum likelihood estimation system). Check the model's documentation. \n\n  \\li Tests may depend on covariance, so <tt>.covariance='n', .tests='y'</tt> may be \n  treated as <tt>.covariance='y', .tests='y'</tt>.\n*/\ntypedef struct {\n    //init/copy/free are in apop_mle.c\n    char covariance;    /*< If 'y', calculate the covariance matrix. Default 'n'. */\n    char predicted;/*< If 'y', calculate the predicted values. This is typically as many\n                     items as rows in your data set. Default 'n'. */\n    char tests;/*< If 'y', run any hypothesis tests offered by the model's estimation routine. Default 'n'. */\n    char info;/*< If 'y', add an info table with elements such as log likelihood or AIC. Default 'n'. */\n} apop_parts_wanted_settings;\n\n/** For use by \\ref apop_cdf when the CDF is generated via Monte Carlo methods. */\ntypedef struct {\n    int draws;  /**< For random draw methods, how many draws? Default: 10,000.*/\n    gsl_rng *rng; /**< For random draw methods. See \\ref apop_rng_get_thread on the default. */\n    gsl_matrix *draws_made; /**< A store of random draws used to calcuate the CDF. Need only be generated once, and so stored here. */\n    int *draws_refcount; /**< For internal use.*/\n} apop_cdf_settings;\n\n\n/** Settings for getting parameter models (i.e. the distribution of parameter estimates) */\ntypedef struct {\n    apop_model *base;\n    int index;\n    gsl_rng *rng;\n    int draws;\n} apop_pm_settings;\n\n\n/** Settings to accompany the \\ref apop_pmf. */\ntypedef struct {\n    gsl_vector *cmf;  /**< A cumulative mass function, for the purposes of making random draws.*/\n    char draw_index;  /**< If \\c 'y', then draws from the PMF return the integer index of the row drawn. \n                           If \\c 'n' (the default), then return the data in the vector/matrix elements of the data set. */\n    long double total_weight; /**< Keep the total weight, in case the input weights aren't normalized to sum to one. */\n    int *cmf_refct;    /**< For internal use, so I can garbage-collect the CMF when needed. */\n} apop_pmf_settings;\n\n\n/** Settings for the \\ref apop_kernel_density model. */\ntypedef struct{\n    apop_data *base_data; /**< The data that will be smoothed by the KDE. */\n    apop_model *base_pmf; /**< I actually need the data in a \\ref apop_pmf. You can give\n                            that to me explicitly, or I can wrap the <tt>.base_data</tt> in a PMF.  */\n    apop_model *kernel; /**< The distribution to be centered over each data point. Default, \n                                    \\ref apop_normal with std dev 1. */\n    void (*set_fn)(apop_data*, apop_model*); /**< The function I will use for each data\n                                                  point to center the kernel over each point.\n            Default: set the upper-left element of the parameter set to the upper-left scalar in the data:\n            <tt>apop_data_set(m->parameters, .val= apop_data_get(in));</tt>.\n                                                  */\n    int own_pmf, own_kernel; /**< For internal use only. */\n}apop_kernel_density_settings;\n\nstruct apop_mcmc_settings;\n\n/** A proposal distribution for \\ref apop_mcmc_settings and its accompanying functions and\ninformation.  By default, these will be \\ref apop_multivariate_normal models. The \\c\nstep_fn and \\c adapt_fn have to be written around the model and your preferences.\nFor the defaults, the step function recenters the mean of the distribution around the\nlast accepted proposal, and the adapt function widens \\f$\\Sigma\\f$ for the Normal if the\naccept rate is too low; narrows it if the accept rate is too large.\n\nYou may provide an array of proposals. The length of the list of proposals\nmust match the number of chunks, as per the \\c gibbs_chunks setting in the \\ref\napop_mcmc_settings group that the array of proposals is a part of. Each proposal must\nbe initialized to include all elements, and the step and adapt functions probably have\nto be written anew for each type of model.\n*/\ntypedef struct apop_mcmc_proposal_s {\n    apop_model *proposal; /**< The distribution from which test parameters will be\n        drawn. After getting the draw using the \\c draw method of the proposal, the base\n        model's \\c parameters element is filled using \\ref apop_data_fill.\n        If \\c NULL, \\ref apop_model_metropolis will use a Multivariate Normal with the\n        appropriate dimension, mean zero, and covariance matrix I. If not \\c NULL, be sure to\n        parameterize your model with an initial position. */\n\n    void (*step_fn)(double const *, struct apop_mcmc_proposal_s*, struct apop_mcmc_settings *); /**< Modifies the parameters of the\n        proposal distribution given a successful draw. Typically, this function writes the\n        drawn data point to the parameter set. If the draw is a scalar, the default\n        function sets the 0th element of the model's \\c parameter set with the draw\n        (works for the \\ref apop_normal and other models). If the draw has multiple\n        dimensions, they are all copied to the parameter set, which must have the same\n        size. */\n\n    int (*adapt_fn)(struct apop_mcmc_proposal_s *ps, struct apop_mcmc_settings *ms); /**< Called\n        every step, to adapt the proposal distribution using information to this point in\n        the chain. */\n\n    int accept_count, reject_count;  /**< If there are multiple \\ref apop_mcmc_proposal_s structs for \n                                       multiple chunks, These count accepts/rejects for\n                                       this chunk. The \\ref apop_mcmc_settings group has\n                                       a total for the aggregate across all chunks. */\n} apop_mcmc_proposal_s;\n\n/** Method settings for a model to be put through Bayesian updating. */\ntypedef struct apop_mcmc_settings {\n    apop_data *data;\n    long int periods; /**< For how many steps should the MCMC chain run? */\n    double burnin; /**< What <em>percentage</em> of the periods should be ignored\n                         as initialization. That is, this is a number between zero and one. */\n    int histosegments; /**< If outputting a binned PMF, how many segments should it have? */\n    double last_ll; /**< If you have already run MCMC, the last log likelihood in the chain.*/\n    apop_model *pmf; /**< If you have already run MCMC, I keep a pointer to the model\n            so far here. Use \\ref apop_model_metropolis_draw to get one more draw.*/\n    apop_model *base_model; /**< The model you provided with a \\c log_likelihood or\n            \\c p element (which need not sum to one). You do not have to set this: if it is\n            \\c NULL on input to \\ref apop_model_metropolis, I will fill it in.*/\n    apop_mcmc_proposal_s *proposals; /**< The list of proposals. You can probably use\n            the default of adaptive multivariate normals. See the \\ref apop_mcmc_proposal_s\n            struct for details. */\n    int proposal_count; /**< The number of proposal sets; see \\c gibbs_chunks below. */\n    double target_accept_rate; /**< The desired acceptance rate, for use by adaptive proposals. Default: .35 */\n    int accept_count;   /**< After calling \\ref apop_model_metropolis, this will have the number of accepted proposals.*/\n    int reject_count;   /**< After calling \\ref apop_model_metropolis, this will have the number of rejected proposals.*/\n    char gibbs_chunks;  /**< See the \\ref apop_model_metropolis documentation for discussion.\n                          \n                          \\c 'a': One step draws and accepts/rejects all parameters as a unit<br>\n\n                             \\c 'b': draw in blocks: the vector is a block, the matrix\n                                is a separate block, the weights are a separate\n                                block, and so on through every page of the model\n                                parameters. Each block of parameters is drawn and\n                                accepted/rejected as a unit. <br>\n\n                             \\c '1': draw each parameter and accept/reject separately. One\n                                MCMC step consists of a set of draws for every\n                                parameter.<br> */\n    size_t *block_starts; /**< For internal use */\n    int block_count, proposal_is_cp; /**< For internal use. */\n\n    char start_at; /**< If \\c '1' (the default), start with a first proposal of all\n        1s. Even when this is a far-from-useful starting point, MCMC typically does a good\n        job of crawling to better spots early in the chain.<br>\n    The default when this is unset is to start at the \\c parameters of the \\ref apop_model sent in to \\ref\n    apop_model_metropolis.*/\n    void (*base_step_fn)(double const *, struct apop_mcmc_proposal_s*, struct apop_mcmc_settings *); /**< If an \\ref apop_mcmc_proposal_s struct has \\c NULL \\c step_fn, use this. If you don't want a step function, set this to a do-nothing function. */\n    int (*base_adapt_fn)(struct apop_mcmc_proposal_s *ps, struct apop_mcmc_settings *ms); /**< If a \\ref apop_mcmc_proposal_s has \\c NULL \\c adapt_fn, use this.  If you don't want an adapt function, set this to a do-nothing function.*/\n\n} apop_mcmc_settings;\n\n/** \\cond doxy_ignore */\n//Loess, including the old FORTRAN-to-C.\nstruct loess_struct {\n\tstruct {\n\t\tlong    n, p;\n        double  *y, *x;\n\t\tdouble\t*weights;\n\t} in;\n\tstruct {\n\t        double  span;\n\t        long    degree;\n\t        long    normalize;\n\t        long    parametric[8];\n\t        long    drop_square[8];\n\t        char    *family;\n\t} model;\n\tstruct {\n\t        char    *surface;\n\t        char    *statistics;\n\t        double  cell;\n\t        char    *trace_hat;\n\t        long    iterations;\n\t} control;\n\tstruct {\n\t\tlong\t*parameter, *a;\n\t\tdouble\t*xi, *vert, *vval;\n\t} kd_tree;\n\tstruct {\n\t\tdouble\t*fitted_values;\n        double  *fitted_residuals;\n\t\tdouble  enp, s;\n\t\tdouble  one_delta, two_delta;\n\t\tdouble\t*pseudovalues;\n\t\tdouble\ttrace_hat;\n\t\tdouble\t*diagonal;\n\t\tdouble\t*robust;\n\t\tdouble  *divisor;\n\t} out;\n};\n/** \\endcond */ //End of Doxygen ignore.\n\n/** The code for the loess system is based on FORTRAN code from 1988,\noverhauled in 1992, linked in to Apophenia in 2009. The structure that\ndoes all the work, then, is a \\c loess_struct that you should\nbasically take as opaque. \n\nThe useful settings from that struct re-appear in the \\ref\napop_loess_settings struct so you can set them directly, and then the\nsettings init function will copy your preferences into the working struct.\n\nThe documentation for the elements is cut/pasted/modified from Cleveland,\nGrosse, and Shyu.\n*/\ntypedef struct {\n    apop_data *data;\n    struct  loess_struct lo_s; /**< \n\n<tt>.data</tt>: Mandatory. Your input data set.\n\n<tt>.lo_s.model.span</tt>:\tsmoothing parameter. Default is 0.75.\n\n<tt>.lo_s.model.degree</tt>: overall degree of locally-fitted polynomial. 1 is\n\t\tlocally-linear fitting and 2 is locally-quadratic fitting. Default is 2.\n\n<tt>.lo_s.normalize</tt>:\tShould numeric predictors\n\t\tbe normalized?\tIf \\c 'y' - the default - the standard normalization\n\t\tis used. If \\c 'n', no normalization is carried out.\n\n\\c .lo_s.model.parametric:\tfor two or more numeric predictors, this argument\n\t\tspecifies those variables that should be\n\t\tconditionally-parametric. The argument should be a logical\n\t\tvector of length \\c p, specified in the order of the predictor\n\t\tgroup ordered in \\c x.  Default is a vector of 0's of length \\c p.\n\n\\c .lo_s.model.drop_square:\tfor cases with degree = 2, and with two or more\n\t\tnumeric predictors, this argument specifies those numeric\n\t\tpredictors whose squares should be dropped from the set of\n\t\tfitting variables. The method of specification is the same as\n\t\tfor parametric.  Default is a vector of 0's of length p.\n\n\\c .lo_s.model.family: the assumed distribution of the errors. The values may be \n        <tt>\"gaussian\"</tt> or <tt>\"symmetric\"</tt>. The first value is the default.\n        If the second value is specified, a robust fitting procedure is used.\n\n\\c lo_s.control.surface:\tdetermines whether the fitted surface is computed\n        <tt>\"directly\"</tt> at all points  or whether an <tt>\"interpolation\"</tt>\n        method is used. The default, interpolation, is what most users should use\n\t\tunless special circumstances warrant.\n\n\\c lo_s.control.statistics:\tdetermines whether the statistical quantities are \n    computed <tt>\"exactly\"</tt> or approximately, where <tt>\"approximate\"</tt>\n    is the default. The former should only be used for testing the approximation in\n    statistical development and is not meant for routine usage because computation\n    time can be horrendous.\n\n    \\c lo_s.control.cell: if interpolation is used to compute the surface,\n    this argument specifies the maximum cell size of the k-d tree. Suppose k =\n    floor(n*cell*span) where n is the number of observations.  Then a cell is\n    further divided if the number of observations within it is greater than or\n    equal to k. default=0.2\n\n\\c lo_s.control.trace_hat: Options are <tt>\"approximate\"</tt>, <tt>\"exact\"</tt>, and <tt>\"wait.to.decide\"</tt>.\t\n    When lo_s.control.surface is <tt>\"approximate\"</tt>, determines\n    the computational method used to compute the trace of the hat\n    matrix, which is used in the computation of the statistical\n    quantities.  If \"exact\", an exact computation is done; normally\n    this goes quite fast on the fastest machines until n, the number\n    of observations is 1000 or more, but for very slow machines,\n    things can slow down at n = 300.  If \"wait.to.decide\" is selected,\n    then a default is chosen in loess();  the default is \"exact\" for\n    n < 500 and \"approximate\" otherwise.  If surface is \"exact\", an\n    exact computation is always done for the trace. Set trace_hat to\n    \"approximate\" for large dataset will substantially reduce the\n    computation time.\n\n\\c lo_s.model.iterations:\tif family is <tt>\"symmetric\"</tt>, the number of iterations \n    of the robust fitting method.  Default is 0 for\n    lo_s.model.family = gaussian; 4 for family=symmetric.\n\n    That's all you can set. Here are some output parameters:\n\n\\c fitted_values:\tfitted values of the local regression model\n\n\\c fitted_residuals:\tresiduals of the local regression fit\n\n   \\c  enp:\t\tequivalent number of parameters.\n\n   \\c  s:\t\testimate of the scale of the residuals.\n\n   \\c  one_delta:\ta statistical parameter used in the computation of standard errors.\n\n   \\c  two_delta:\ta statistical parameter used in the computation of standard errors.\n\n   \\c  pseudovalues:\tadjusted values of the response when robust estimation is used.\n\n\\c trace_hat:\ttrace of the operator hat matrix.\n\n   \\c  diagonal:\tdiagonal of the operator hat matrix.\n\n   \\c  robust:\t\trobustness weights for robust fitting.\n\n   \\c  divisor:\tnormalization divisor for numeric predictors.\n*/\n\n    int     want_predict_ci; /**< If \\c 'y' (the default), calculate the\n                                confidence bands for predicted values */\n    double  ci_level; /**< If running a prediction, the level at which\n                        to calculate the confidence interval. default: 0.95 */\n} apop_loess_settings;\n\n\n    /** \\cond doxy_ignore */\ntypedef struct point {    /* a point in the x,y plane */\n  double x,y;             /* x and y coordinates */\n  double ey;              /* exp(y-ymax+YCEIL) */\n  double cum;             /* integral up to x of rejection envelope */\n  int f;                  /* is y an evaluated point of log-density */\n  struct point *pl,*pr;   /* envelope points to left and right of x */\n} POINT;\n\n/* This includes the envelope info and the metropolis steps. */\ntypedef struct {  /* attributes of the entire rejection envelope */\n  int cpoint;              /* number of POINTs in current envelope */\n  int npoint;              /* max number of POINTs allowed in envelope */\n  double ymax;             /* the maximum y-value in the current envelope */\n  POINT *p;                /* start of storage of envelope POINTs */\n  double *convex;          /* adjustment for convexity */\n  double metro_xprev;      /* previous Markov chain iterate */\n  double metro_yprev;      /* current log density at xprev */\n} arms_state;\n    /** \\endcond */\n\n/** For use with \\ref apop_arms_draw, to perform derivative-free adaptive rejection sampling with metropolis step. \n\nThat function generates default values for this struct if you do not attach one to the\nmodel beforehand, via a form like <tt>apop_model_add_group(your_model, apop_arms,\n.model=your_model, .xl=8, .xr =14);</tt>. If you initialize it manually via \\ref\napop_settings_add_group, the \\c model element is mandatory; you'll get a run-time\ncomplaint if you forget it.\n*/\ntypedef struct {\n    double *xinit;  /**< A <tt>double*</tt> giving starting values for x in ascending\n                      order, e.g., <tt>(double *){1, 10, 100}</tt>.  . Default: -1,\n                      0, 1. If this isn't \\c NULL, I need at least three items, and\n                      the length in \\c ninit. */\n    double  xl;     /**< Left bound. If you don't give me one, I'll use min[min(xinit)/10, min(xinit)*10].*/\n    double  xr;     /**< Right bound. If you don't give me one, I'll use max[max(xinit)/10, max(xinit)*10]. */\n    double convex;  /**< Adjustment for convexity */\n    int ninit;      /**< The length of \\c xinit.*/\n    int npoint;     /**< Maximum number of envelope points. I \\c malloc space for this many <tt>double</tt>s at the outset. Default = 1e5. */\n   char do_metro;   /**< Set to \\c 'y' if the metropolis step is required (i.e.,\n                           if you're not sure if the function is log-concave).*/\n   double xprev;    /**< For internal use; please ignore. Previous value from Markov chain. */\n   int neval;       /**< On exit, the number of function evaluations performed */\n   arms_state *state;\n   apop_model *model; /**< The model from which to draw. Mandatory. Must have either a \\c log_likelihood or \\c p method.*/\n} apop_arms_settings;\n\n\n/** The settings to accompany the \\ref apop_cross model, representing the cross product of two models (or, via recursion, a list of models of arbitrary length).*/\ntypedef struct {\n    char *splitpage;    /**< The name of the page at which to split the data. If \\c NULL, I send the entire data set to both models as needed. */\n    apop_model *model1; /**< The first model in the stack.*/\n    apop_model *model2; /**< The second model.*/\n} apop_cross_settings;\n\ntypedef struct {\n    apop_data *(*base_to_transformed)(apop_data*); /**< The function to transform the model from pre-transform space to post-transform space. */\n    apop_data *(*transformed_to_base)(apop_data*); /**< The function to transform from post-transform space back to pre-transform space. If this function does not exist, using a Jacobian-based transformation is probably not mathematically correct. */\n    double (*jacobian_to_base)(apop_data*); /**< The derivative of the \\c transformed_to_base function. */\n    apop_model *base_model;  /**< The pre-transformation model. */\n} apop_coordinate_transform_settings;/**< Settings for an \\ref apop_coordinate_transform model; see its documentation for notes and an example.\n*/\n\n/** For use with the \\ref apop_dconstrain model. See its documentation for an example. \n*/\ntypedef struct {\n    apop_model *base_model; /**< The model, before constraint. */\n    double (*constraint)(apop_data *, apop_model *); /**< The constraint. Return 1 if the data is in the constraint; zero if out. */\n    double (*scaling)(apop_model *); /**< Optional. Return the percent of the model density inside the constraint. */\n    gsl_rng *rng; /**< If you don't provide a \\c scaling function, I calculate the in-constraint model density via random draws.\n                       If no \\c rng is provided, I use a default RNG; see \\ref apop_rng_get_thread. */\n    double scale; /**< After the scaling has been calculated, store it here. If you change the parameters of your base model,\n                       set this to zero to have the scaling recalculated. */\n    gsl_vector *last_params; /**< The parameters used to calculate \\c scale. If these change, recalculate. */\n    int draw_ct; /**< How many draws to make for calculating the in-constraint model density via random draws. Current default: 1e4. */\n    int refct; /**< For internal use. */\n} apop_dconstrain_settings;\n\ntypedef struct {\n    apop_model *generator_m;\n    apop_model *ll_m;\n    int draw_ct;\n} apop_composition_settings;/**< All of the elements of this struct should be considered private.*/\n\n/** For mixture distributions, typically set up using \\ref apop_model_mixture. See\n\\ref apop_mixture for discussion. Please consider all elements but \\c model_list and \\c\nweights as private and subject to change. See the examples for use of these elements.  \n*/\ntypedef struct {\n    gsl_vector *weights;     /**< The likelihood of a draw from each component. Default is equal likelihood\n                              for each mixture element. Or set this to a weight vector of your choosing, or set\n                              <tt>find_weights='y'</tt> and have <tt>apop_estimate</tt> find optimal weights. */\n    apop_model **model_list; /**< A \\c NULL-terminated list of component models. */\n    int model_count;\n    int *param_sizes;  /**< The number of parameters for each model. Useful for unpacking the params. */\n    apop_model *cmf;   /**< For internal use by the draw method. */\n    int *cmf_refct;    /**< For internal use, so I can garbage-collect the CMF when needed. */\n    char find_weights; /**< By default, weights are fixed. Set this b \\c 'y' to allow \\ref apop_estimate to\n                            use an EM algorithm to find the optimal weights.\n                            See the documentation for \\ref apop_mixture for details. */\n    gsl_vector *next_weights; /**< For internal use.*/\n} apop_mixture_settings;\n\n    //Models built via call to apop_model_copy_set.\n\n#define apop_model_dcompose(...) Apop_model_set_settings(apop_composition, __VA_ARGS__)\n#define apop_model_dconstrain(...) Apop_model_set_settings(apop_dconstrain, __VA_ARGS__)\n#define apop_model_coordinate_transform(...) Apop_model_set_settings(apop_coordinate_transform, __VA_ARGS__)\n\n//Doxygen drops whatever is after these declarations, so I put them last.\nApop_settings_declarations(apop_lm)\nApop_settings_declarations(apop_pm)\nApop_settings_declarations(apop_pmf)\nApop_settings_declarations(apop_mle)\nApop_settings_declarations(apop_cdf)\nApop_settings_declarations(apop_arms)\nApop_settings_declarations(apop_mcmc)\nApop_settings_declarations(apop_loess)\nApop_settings_declarations(apop_cross)\nApop_settings_declarations(apop_mixture)\nApop_settings_declarations(apop_dconstrain)\nApop_settings_declarations(apop_composition)\nApop_settings_declarations(apop_parts_wanted)\nApop_settings_declarations(apop_kernel_density)\nApop_settings_declarations(apop_coordinate_transform)\n\n#ifdef\t__cplusplus\n}\n#endif\n\n/** @} */ //End doxygen's all_public grouping\n\n//Part of the intent of a convenience header like this is that you\n//don't have to remember what else you're including. So here are \n//some other common GSL headers:\n#include <math.h>\n#include <gsl/gsl_sort.h>\n#include <gsl/gsl_eigen.h>\n#include <gsl/gsl_sort_vector.h>\n#include <gsl/gsl_permutation.h>\n#include <gsl/gsl_integration.h>\n"
  },
  {
    "path": "apop_arms.c",
    "content": "/** \\file \n  adaptive rejection metropolis sampling */\n\n/** (C) Wally Gilks; see documentation below for details.\n  Adaptations for Apophenia (c) 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n\n#define XEPS  0.00001            /* critical relative x-value difference */\n#define YEPS  0.1                /* critical y-value difference */\n#define EYEPS 0.001              /* critical relative exp(y) difference */\n#define YCEIL 50.                /* maximum y avoiding overflow in exp(y) */\n\n/* declarations for functions defined in this file  (minus those in arms.h). */\nvoid invert(double prob, arms_state *env, POINT *p);\nint test(arms_state *state, POINT *p,  apop_arms_settings *params, gsl_rng *r);\nint update(arms_state *state, POINT *p,  apop_arms_settings *params);\nstatic void cumulate(arms_state *env);\nint meet (POINT *q, arms_state *state, apop_arms_settings *params);\ndouble area(POINT *q);\ndouble expshift(double y, double y0);\ndouble logshift(double y, double y0);\ndouble perfunc(apop_arms_settings*, double x);\nvoid display(FILE *f, arms_state *env, apop_arms_settings *);\nint initial (apop_arms_settings* params, arms_state *state);\n\nApop_settings_copy(apop_arms,\n    out->state = malloc(sizeof(arms_state));\n    *out->state = *in->state;\n)\n\nApop_settings_free(apop_arms,\n    if (in->state){\n        free(in->state->p);\n        free(in->state);\n\t}\n)\n\nApop_settings_init(apop_arms,\n    if ((in.xl || in.xr) && !in.xinit)\n        out->xinit = (double []) {in.xl+GSL_DBL_EPSILON, (in.xl+in.xr)/2., in.xr-GSL_DBL_EPSILON};\n    else{\n        //Apop_varad_set(xinit, ((double []) {0, 0.5, 1}));\n        Apop_varad_set(xinit, ((double []) {-1, 0, 1}));\n    }\n    Apop_varad_set(ninit, 3);\n    Apop_varad_set(xl, GSL_MIN(out->xinit[0]/10., out->xinit[0]*10)-.1);\n    Apop_varad_set(xr, GSL_MAX(out->xinit[out->ninit-1]/10., out->xinit[out->ninit-1]*10)+.1);\n    Apop_varad_set(convex, 0);\n    Apop_varad_set(npoint, 100);\n    Apop_varad_set(do_metro, 'y');\n    Apop_varad_set(xprev, (out->xinit[0]+out->xinit[out->ninit-1])/2.);\n    Apop_varad_set(neval, 1000);\n    Apop_assert(out->model, \"the model input (e.g.: .model = parent_model) is mandatory.\");\n\n  // allocate the state \n    out->state = malloc(sizeof(arms_state));\n    Apop_assert(out->state, \"Malloc failed. Out of memory?\");\n    *out->state = (arms_state) { };\n    int err = initial(out, out->state);\n    Apop_assert_c(!err, NULL, 0, \"init failed, error %i. Returning NULL\", err);\n\n  /* finish setting up metropolis struct (can only do this after setting up env) */\n    if(out->do_metro=='y'){\n        /* I don't understand why this is needed.\n          if((params->xprev < params->xl) || (params->xprev > params->xr))\n            apop_assert(0, 1007, 0, 's', \"previous Markov chain iterate out of range\")*/\n        out->state->metro_xprev = out->xprev;\n        out->state->metro_yprev = perfunc(out,out->xprev);\n        assert(isfinite(out->state->metro_xprev));\n        assert(isfinite(out->state->metro_yprev));\n    }\n)\n\nvoid distract_doxygen_arms(){/*Doxygen gets thrown by the settings macros. This decoy function is a workaround. */}\n\n/** Adaptive rejection Metropolis sampling, to make random draws from a univariate distribution.\n\nThe author, Wally Gilks, explains on \nhttp://www.amsta.leeds.ac.uk/~wally.gilks/adaptive.rejection/web_page/Welcome.html , that\n``ARS works by constructing an envelope function of the log of the target density, which is then used in rejection sampling (see, for example,  Ripley, 1987). Whenever a point is rejected by ARS, the envelope is updated to correspond more closely to the true log density, thereby reducing the chance of rejecting subsequent points. Fewer ARS rejection steps implies fewer point-evaluations of the log density.''\n\n\\li It accepts only functions with univariate inputs. I.e., it will put a single value into a 1x1 \\ref apop_data set, and then evaluate the log likelihood at that point. For multivariate situations, see \\ref apop_model_metropolis.\n\n\\li It is currently the default for the \\ref apop_draw function given a univariate model, so you can just call that if you prefer.\n\n\\li There are a great number of parameters, in the \\c apop_arms_settings structure.  The structure also holds a history of the points tested to date. That means that the system will be more accurate as more draws are made. It also means that if the parameters change, or you use \\ref apop_model_copy, you should call <tt>Apop_settings_rm_group(your_model, apop_arms)</tt> to clear the model of points that are not valid for a different situation.\n  */\nint apop_arms_draw (double *out, gsl_rng *r, apop_model *m){\n    apop_arms_settings *params = Apop_settings_get_group(m, apop_arms);\n    if (!params) params = Apop_model_add_group(m, apop_arms, .model=m);\n  POINT pwork;        /* a working point, not yet incorporated in envelope */\n  int msamp=0;        /* the number of x-values currently sampled */\n  arms_state *state = params->state; \n  /* now do adaptive rejection */\n  do {\n    // Sample a new point from piecewise exponential envelope \n    double prob = gsl_rng_uniform(r);\n    /* get x-value correponding to a cumulative probability prob */\n    assert(isfinite(state->p->x));\n    assert(isfinite(state->p->y));\n    invert(prob,state,&pwork);\n\n    /* perform rejection (and perhaps metropolis) tests */\n    int i = test(state,&pwork, params, r);\n    if (i == 1){ // point accepted \n        Apop_notify(3, \" point accepted.\");\n        *out = pwork.x;\n        assert(isfinite(pwork.x));\n        return 0;\n    } else \n      Apop_stopif(i!=0, return 1,-5, \"envelope error - violation without metropolis\");\n    msamp ++;\n    Apop_notify(3, \" point rejected.\");\n  } while (msamp < 1e3);\n  Apop_notify(1, \"I just rejected 1,000 samples. Something is wrong.\");\n  return 0;\n}\n\nint initial (apop_arms_settings* params,  arms_state *env){\n// to set up initial envelope\n\n  POINT *q;\n  int mpoint = 2*params->ninit + 1;\n\n  Apop_assert_c(params->ninit>=3, 1001, 0, \"too few initial points\");\n  Apop_assert_c(params->npoint >= mpoint, 1002, 0, \"too many initial points\");\n  Apop_assert_c((params->xinit[0] >= params->xl) && (params->xinit[params->ninit-1] <= params->xr),\n                     1003, 0, \"initial points do not satisfy bounds\");\n  for(int i=1; i<params->ninit; i++)\n      Apop_assert_c(params->xinit[i] > params->xinit[i-1], 1004, 0, \"data not ordered\");\n  Apop_assert_c(params->convex >= 0.0, 1008, 0, \"negative convexity parameter\");\n\n  env->convex = &params->convex; // copy convexity address to env\n  params->neval = 0; // initialise current number of function evaluations\n\n  /* set up space for envelope POINTs */\n  env->npoint = params->npoint;\n  env->p = malloc(params->npoint*sizeof(POINT));\n  Apop_assert(env->p, \"malloc of env->p failed. Out of space?\");\n\n  /* set up envelope POINTs */\n  q = env->p;\n  q->x = params->xl; // left bound\n  q->f = 0;\n  q->pl = NULL;\n  q->pr = q+1;\n  for(int j=1, k=0; j<mpoint-1; j++){\n    q++;\n    if(j%2){\n        /* point on log density */\n        q->x = params->xinit[k++];\n        q->y = perfunc(params,q->x);\n        Apop_assert(isfinite(q->x), \"the initial param is %g\", q->x);\n        Apop_assert(isfinite(q->y), \"f(an initial parameter)= %g\", q->y);\n        q->f = 1;\n    } else // intersection point\n        q->f = 0;\n    q->pl = q-1;\n    q->pr = q+1;\n  }\n  /* right bound */\n  q++;\n  q->x = params->xr;\n  q->f = 0;\n  q->pl = q-1;\n  q->pr = NULL;\n\n  assert(isfinite(q->x));\n  /* calculate intersection points */\n  q = env->p;\n  for (int j=0; j<mpoint; j=j+2, q=q+2)\n    Apop_assert_c(!meet(q,env, params), 2000, 0, \"envelope violation without metropolis\");\n\n  cumulate(env); // exponentiate and integrate envelope\n  env->cpoint = mpoint; // note number of POINTs currently in envelope\n  return 0;\n}\n\nvoid invert(double prob, arms_state *env, POINT *p){\n/* to obtain a point corresponding to a given cumulative probability   \n   prob    : cumulative probability under envelope   \n   *env    : envelope attributes   \n   *p      : a working POINT to hold the sampled value */\n\n  double u,xl=0,xr=0,yl,yr,eyl,eyr,prop;\n\n  /* find rightmost point in envelope */\n  POINT *q = env->p;\n  while(q->pr != NULL)q = q->pr;\n\n  /* find exponential piece containing point implied by prob */\n  u = prob * q->cum;\n  while(q->pl->cum > u)q = q->pl;\n\n  /* piece found: set left and right POINTs of p, etc. */\n  p->pl = q->pl;\n  p->pr = q;\n  p->f = 0;\n  p->cum = u;\n\n  /* calculate proportion of way through integral within this piece */\n  prop = (u - q->pl->cum) / (q->cum - q->pl->cum);\n\n  /* get the required x-value */\n  if (q->pl->x == q->x){\n    /* interval is of zero length */\n    p->x = q->x;\n    p->y = q->y;\n    p->ey = q->ey;\n  } else {\n    xl = q->pl->x;\n    xr = q->x;\n    yl = q->pl->y;\n    yr = q->y;\n    eyl = q->pl->ey;\n    eyr = q->ey;\n    if(fabs(yr - yl) < YEPS){\n      /* linear approximation was used in integration in function cumulate */\n      if(fabs(eyr - eyl) > EYEPS*fabs(eyr + eyl))\n        p->x = xl + ((xr - xl)/(eyr - eyl)) * (-eyl + sqrt((1. - prop)*eyl*eyl + prop*eyr*eyr));\n      else \n        p->x = xl + (xr - xl)*prop;\n      p->ey = ((p->x - xl)/(xr - xl)) * (eyr - eyl) + eyl;\n      p->y = logshift(p->ey, env->ymax);\n    } else {\n      /* piece was integrated exactly in function cumulate */\n      p->x = xl + ((xr - xl)/(yr - yl))\n\t      * (-yl + logshift(((1.-prop)*eyl + prop*eyr), env->ymax));\n      p->y = ((p->x - xl)/(xr - xl)) * (yr - yl) + yl;\n      p->ey = expshift(p->y, env->ymax);\n    }\n  }\n  assert(isfinite(p->x));\n  assert(isfinite(p->y));\n  assert(isfinite(q->x));\n  assert(isfinite(q->y));\n\n  /* guard against imprecision yielding point outside interval */\n  Apop_stopif( ((p->x < xl) || (p->x > xr)), return,-5, \"imprecision yields point outside interval\");\n}\n\nint test(arms_state *env, POINT *p, apop_arms_settings *params, gsl_rng *r){\n/* to perform rejection, squeezing, and metropolis tests   \n   *env        : state data\n   *p            : point to be tested   */\nassert(p->pl && p->pr);\n\n  double u,y,ysqueez,ynew,yold,znew,zold,w;\n  POINT *ql,*qr;\n  \n  /* for rejection test */\n  u = gsl_rng_uniform(r) * p->ey;\n  y = logshift(u,env->ymax);\n\n  if(params->do_metro !='y' && (p->pl->pl != NULL) && (p->pr->pr != NULL)){\n    /* perform squeezing test */\n    ql = p->pl->f ? p->pl : p->pl->pl;\n    qr = p->pr->f ? p->pr : p->pr->pr;\n    ysqueez = (qr->y * (p->x - ql->x) + ql->y * (qr->x - p->x))\n               /(qr->x - ql->x);\n    if(y <= ysqueez) // accept point at squeezing step\n        return 1;\n  }\n\n  /* evaluate log density at point to be tested */\n  ynew = perfunc(params,p->x);\nassert(isfinite(p->x));\nassert(p->pl && p->pr);\nApop_notify(3, \"tested (%g, %g); \", p->x, ynew);\n  \n  /* perform rejection test */\n  if(params->do_metro != 'y' || (params->do_metro == 'y' && (y >= ynew))){\n    /* update envelope */\n    p->y = ynew;\n    p->ey = expshift(p->y,env->ymax);\n    p->f = 1;\n    if(update(env,p, params)) \n        Apop_assert_c(0, -1, 0, \"envelope violation without metropolis\");\n    /* perform rejection test: accept iff y < ynew */\n    return (y < ynew);\n  }\n\n  /* continue with metropolis step */\n  yold = env->metro_yprev;\n  /* find envelope piece containing metrop->xprev */\n  ql = env->p;\n  while(ql->pl != NULL) ql = ql->pl;\n  while(ql->pr->x < env->metro_xprev) ql = ql->pr;\n  qr = ql->pr;\n  /* calculate height of envelope at metrop->xprev */\n  w = (env->metro_xprev - ql->x)/(qr->x - ql->x);\n  zold = ql->y + w*(qr->y - ql->y);\n  znew = p->y;\n  if(yold < zold)zold = yold;\n  if(ynew < znew)znew = ynew;\n  w = ynew-znew-yold+zold;\n  w = GSL_MIN(w, 0.0);\n  w = (w > -YCEIL) ?  exp(w) : 0.0;\n  u = gsl_rng_uniform(r);\n  if(u > w){\n      /* metropolis says don't move, so replace current point with previous */\n      /* markov chain iterate */\n      p->x = env->metro_xprev;\n      p->y = env->metro_yprev;\n      Apop_notify(3, \"metro step (%g) rejected with w=%g, \"\n                \"ynew=%g, yold=%g, znew = %g, zold=%g; \", p->x, w, ynew, yold, znew, zold);\n      p->ey = expshift(p->y,env->ymax);\nassert(isfinite(p->x));\nassert(isfinite(p->y));\nassert(isfinite(p->ey));\n      p->f = 1;\n      p->pl = ql;\n      p->pr = qr;\n  } else {\n      /* trial point accepted by metropolis, so update previous markov */\n      /* chain iterate */\n      env->metro_xprev = p->x;\n      env->metro_yprev = ynew;\n  }\n  return 1;\n}\n\nint update(arms_state *env, POINT *p, apop_arms_settings *params){\n/* to update envelope to incorporate new point on log density\n   *env          : state information\n   *p            : point to be incorporated \n*/\n\n  POINT *m,*ql,*qr,*q;\n\n  if(!(p->f) || (env->cpoint > env->npoint - 2))\n    /* y-value has not been evaluated or no room for further points */\n    return 0; // ignore this point\n\n  /* copy working POINT p to a new POINT q */\n  q = env->p + env->cpoint++;\n  q->x = p->x;\n  q->y = p->y;\n  q->f = 1;\n\n  /* allocate an unused POINT for a new intersection */\n  m = env->p + env->cpoint++;\n  m->f = 0;\n  if((p->pl->f) && !(p->pr->f)){\n    /* left end of piece is on log density; right end is not */\n    /* set up new intersection in interval between p->pl and p */\n    m->pl = p->pl;\n    m->pr = q;\n    q->pl = m;\n    q->pr = p->pr;\n    m->pl->pr = m;\n    q->pr->pl = q;\n  } else if (!(p->pl->f) && (p->pr->f)){\n    /* left end of interval is not on log density; right end is */\n    /* set up new intersection in interval between p and p->pr */\n    m->pr = p->pr;\n    m->pl = q;\n    q->pr = m;\n    q->pl = p->pl;\n    m->pr->pl = m;\n    q->pl->pr = q;\n  } else\n    Apop_stopif(1, return 1,-5, \"unexpected event\"); // this should be impossible\n\n  /* now adjust position of q within interval if too close to an endpoint */\n  ql = q->pl->pl ? q->pl->pl : q->pl;\n  qr = q->pr->pr ? q->pr->pr : q->pr;\n  if (q->x < (1. - XEPS) * ql->x + XEPS * qr->x){\n    /* q too close to left end of interval */\n    q->x = (1. - XEPS) * ql->x + XEPS * qr->x;\n    q->y = perfunc(params,q->x);\n  } else if (q->x > XEPS * ql->x + (1. - XEPS) * qr->x){\n    /* q too close to right end of interval */\n    q->x = XEPS * ql->x + (1. - XEPS) * qr->x;\n    q->y = perfunc(params,q->x);\n  }\n\n  /* revise intersection points */\n  if(meet(q->pl,env, params) /* envelope violations without metropolis */\n        || meet(q->pr,env, params) \n        || (q->pl->pl != NULL && meet(q->pl->pl->pl,env, params))\n        || (q->pr->pr != NULL && meet(q->pr->pr->pr,env, params)))\n     return 1;\n\n  /* exponentiate and integrate new envelope */\n  cumulate(env);\n  return 0;\n}\n\nstatic void cumulate(arms_state *env){\n/* to exponentiate and integrate envelope */\n/* *env     : envelope attributes */\n\n  POINT *q,*qlmost;\n\n  qlmost = env->p;\n  /* find left end of envelope */\n  while(qlmost->pl) qlmost = qlmost->pl;\n\n  /* find maximum y-value: search envelope */\n  env->ymax = qlmost->y;\n  for(q = qlmost->pr; q != NULL; q = q->pr)\n    if(q->y > env->ymax)\n        env->ymax = q->y;\n\n  /* exponentiate envelope */\n  for(q = qlmost; q != NULL; q = q->pr)\n      q->ey = expshift(q->y,env->ymax);\n\n  /* integrate exponentiated envelope */\n  qlmost->cum = 0.;\n  for(q = qlmost->pr; q != NULL; q = q->pr)\n      q->cum = q->pl->cum + area(q);\n}\n\nint meet (POINT *q, arms_state *env, apop_arms_settings *params){ \n/* To find where two chords intersect \n   q         : to store point of intersection \n   *env      : state attributes \n*/\n  double gl=0,gr=0,grl=0,dl=0,dr=0;\n  int il=0,ir=0,irl=0;\n\n  Apop_assert(!(q->f), \"error 30: this is not an intersection point.\");\n\n  /* calculate coordinates of point of intersection */\n  if ((q->pl != NULL) && (q->pl->pl->pl != NULL)){\n      /* chord gradient can be calculated at left end of interval */\n      gl = (q->pl->y - q->pl->pl->pl->y)/(q->pl->x - q->pl->pl->pl->x);\n      il = 1;\n  } else // no chord gradient on left \n      il = 0;\n\n  if ((q->pr != NULL) && (q->pr->pr->pr != NULL)){\n      /* chord gradient can be calculated at right end of interval */\n      gr = (q->pr->y - q->pr->pr->pr->y)/(q->pr->x - q->pr->pr->pr->x);\n      ir = 1;\n  } else // no chord gradient on right\n      ir = 0;\n\n  if ((q->pl != NULL) && (q->pr != NULL)){\n      /* chord gradient can be calculated across interval */\n      grl = (q->pr->y - q->pl->y)/(q->pr->x - q->pl->x);\n      irl = 1;\n  } else \n      irl = 0;\n\n  if(irl && il && (gl<grl)){\n    /* convexity on left exceeds current threshold */\n    if(params->do_metro !='y') // envelope violation without metropolis\n        return 1;\n    gl = gl + (1.0 + *(env->convex)) * (grl - gl); // adjust left gradient \n  }\n\n  if(irl && ir && (gr>grl)){\n    /* convexity on right exceeds current threshold */\n    if(params->do_metro !='y') // envelope violation without metropolis \n        return 1;\n    gr = gr + (1.0 + *(env->convex)) * (grl - gr); // adjust right gradient \n  }\n\n  if(il && irl){\n    dr = (gl - grl) * (q->pr->x - q->pl->x);\n    if(dr < YEPS) // adjust dr to avoid numerical problems\n      dr = YEPS;\n  }\n\n  if(ir && irl){\n    dl = (grl - gr) * (q->pr->x - q->pl->x);\n    if(dl < YEPS) // adjust dl to avoid numerical problems \n      dl = YEPS;\n  }\n\n  if(il && ir && irl){\n    /* gradients on both sides */\n    q->x = (dl * q->pr->x + dr * q->pl->x)/(dl + dr);\n    q->y = (dl * q->pr->y + dr * q->pl->y + dl * dr)/(dl + dr);\n  } else if (il && irl){\n    /* gradient only on left side, but not right hand bound */\n    q->x = q->pr->x;\n    q->y = q->pr->y + dr;\n  } else if (ir && irl){\n    /* gradient only on right side, but not left hand bound */\n    q->x = q->pl->x;\n    q->y = q->pl->y + dl;\n  } else if (il)\n    q->y = q->pl->y + gl * (q->x - q->pl->x); // right hand bound \n  else if (ir)\n    q->y = q->pr->y - gr * (q->pr->x - q->x); // left hand bound \n  else \n     Apop_assert(0, \"error 31: gradient on neither side - should be impossible.\");\n  if(((q->pl != NULL) && (q->x < q->pl->x)) ||\n     ((q->pr != NULL) && (q->x > q->pr->x))){\n     Apop_assert(0, \"error 32: intersection point outside interval (through imprecision)\");\n  }\n  return 0; // successful exit : intersection has been calculated\n}\n\ndouble area(POINT *q){\n/* To integrate piece of exponentiated envelope to left of POINT q */ \n\n  if(q->pl == NULL) // this is leftmost point in envelope\n      Apop_stopif(1, return GSL_NAN,-5, \"leftmost point in envelope\");\n  if(q->pl->x == q->x) // interval is zero length\n      return 0.;\n  if (fabs(q->y - q->pl->y) < YEPS) // integrate straight line piece\n      return 0.5*(q->ey + q->pl->ey)*(q->x - q->pl->x);\n  // integrate exponential piece \n  return ((q->ey - q->pl->ey)/(q->y - q->pl->y))*(q->x - q->pl->x);\n}\n\ndouble expshift(double y, double y0) {\n/* to exponentiate shifted y without underflow */\n  if (y - y0 > -2.0 * YCEIL)\n      return exp(y - y0 + YCEIL);\n  else\n      return 0.0;\n}\n\ndouble logshift(double y, double y0){\n/* inverse of function expshift */\n  return (log(y) + y0 - YCEIL);\n}\n\ndouble perfunc(apop_arms_settings *params, double x){\n// to evaluate log density and increment count of evaluations \n    Staticdef( apop_data *, d , apop_data_alloc(1,1));\n    d->matrix->data[0] = x;\n  double y = apop_log_likelihood(d, params->model);\n  Apop_assert(isfinite(y), \"Evaluating the log likelihood at %g returned %g.\", x, y);\n  (params->neval)++; // increment count of function evaluations\n  return y;\n}\n"
  },
  {
    "path": "apop_asst.m4.c",
    "content": "/** \\file apop_asst.c  The odds and ends bin. \nCopyright (c) 2005--2007, 2010 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <gsl/gsl_math.h>\n#include <gsl/gsl_randist.h>\n#include <regex.h>\n#ifdef _OPENMP\n    #include <omp.h>\n    #define omp_threadnum omp_get_thread_num()\n#else\n    #define omp_threadnum 0\n\n#endif\n\nextern char *apop_nul_string;\n\n//more efficient than xprintf, but a little less versatile.\nstatic void apop_tack_on(char **in, char *addme){\n    if (!addme) return;\n    size_t inlen = *in? strlen(*in): 0;\n    size_t total_len= inlen + strlen(addme);\n    *in = realloc(*in, total_len+1);\n    strcpy(*in+inlen, addme);\n}\n\ntypedef int (*apop_fn_riip)(apop_data*, int, int, void*);\n\n/** Join together the \\c text grid of an \\ref apop_data set into a single string.\n\nFor example, say that we have a data set with some text: row 0 has\n\\c \"a0\", \\c \"b0\", \\c \"c0\"; row 2 has \n\\c \"a1\", \\c \"b1\", \\c \"c1\"; and so on. We would like to produce\n\n\\code\ninsert into tab values ('a0', 'b0', 'c0');\ninsert into tab values ('a1', 'b1', 'c1');\n...\n\\endcode\n\nThis could be sent to an SQL engine to copy the data to a database (but this is just an example\nfor demonstration---use \\ref apop_data_print to write to a database table).\n\nTo construct this single string from the text grid, we would need to add:\n\\li before the text, <tt>Insert into tab values ('</tt>.\n\\li between each element on a row: <tt>', '</tt>\n\\li between rows: <tt>'); \\\\ninsert into tab values('</tt>\n\\li at the tail end: <tt>');'\n\nThus, do the conversion via:\n\\code\nchar *insert_string = apop_text_paste(indata,\n    .before=\"Insert into tab values ('\",\n    .between=\"', '\",\n    .between_cols=\"'); \\\\ninsert into tab values(',\n    .after=\"');'\"\n);\n\\endcode\n\n\n\\param strings  An \\ref apop_data set with a grid of text to be combined into a single string\n\\param between  The text to put in between the rows of the table, such as \", \". (Default is a single space: \" \")\n\\param before   The text to put at the head of the string. For the query example, this would be <tt>.before=\"select \"</tt>. (Default: NULL)\n\\param after   The text to put at the tail of the string. For the query example, <tt>.after=\" from data_table\"</tt>. (Default: NULL)\n\\param between_cols The text to insert between columns of text. See below for an example (Default is set to equal <tt>.between</tt>)\n\\param prune If you don't want to use the entire text set, you can provide a function to indicate which elements should be pruned out. Some examples:\n\\code\n//Just use column 3\nint is_not_col_3(apop_data *indata, int row, int col, void *ignore){\n    return col!=3;\n}\n\n//Jump over blanks as if they don't exist.\nint is_blank(apop_data *indata, int row, int col, void *ignore){\n    return strlen(indata->text[row][col])==0;\n}\n\\endcode\n\\param prune_parameter A void pointer to pass to your \\c prune function.\n\n\\return A single string with the elements of the \\c strings table joined as per your\nspecification. Allocated by the function, to be freed by you if desired.\n\n  \\li If the table of strings is \\c NULL or has no text, the output string will have\nonly the <tt>.before</tt> and <tt>.after</tt> parts with nothing in between.\n  \\li if <tt> apop_opts.verbose >=3</tt>, then print the pasted text to stderr.\n  \\li It is sometimes useful to use \\c Apop_r and \\c Apop_rs to get a view of only\none or a few rows in conjunction with this function.\n\n  \\li This function uses the \\ref designated syntax for inputs.\n\nThis sample snippet generates the SQL for a query using a list of column names (where\nthe query begins with <tt>select </tt>, ends with <tt>from datatab</tt>, and has commas\nin between each element), re-processes the same list to produce the head of an HTML\ntable, then produces the body of the table with the query result.\n\n\\include sql_to_html.c\n*/\nAPOP_VAR_HEAD char *apop_text_paste(apop_data const *strings, char *between, char *before, char *after, char *between_cols, apop_fn_riip prune, void *prune_parameter){\n    apop_data const *apop_varad_var(strings, NULL);\n    char *apop_varad_var(between, \" \");\n    char *apop_varad_var(before, NULL);\n    char *apop_varad_var(after, NULL);\n    char *apop_varad_var(between_cols, between);\n    apop_fn_riip apop_varad_var(prune, NULL);\n    void *apop_varad_var(prune_parameter, NULL);\nAPOP_VAR_ENDHEAD\n    char *prior_line=NULL, *oneline=NULL, *out = before ? strdup(before) : NULL;\n    for (int i=0; i< ((!strings || !*strings->textsize)? 0 : *strings->textsize); i++){\n        free(oneline); oneline = NULL;\n        for (int j=0; j< strings->textsize[1]; j++){\n            if (prune && !prune((apop_data*)strings, i, j, prune_parameter)) continue;\n            apop_tack_on(&oneline, strings->text[i][j]);\n            if (j <strings->textsize[1]-1)  apop_tack_on(&oneline, between_cols);\n        }\n        apop_tack_on(&out, prior_line);\n        if (prior_line && oneline) apop_tack_on(&out, between);\n        free(prior_line);\n        prior_line=oneline ? strdup(oneline): NULL;\n        //if (i <strings->textsize[0]-1)  apop_tack_on(&out, between);\n        //if (oneline)  apop_tack_on(&out, oneline);\n    }\n    apop_tack_on(&out, oneline); //the final one never got a chance to be prior_line\n    apop_tack_on(&out, after);\n    Apop_notify(3, \"%s\", out);\n    return out;\n}\n\n/** Calculate \\f$\\sum_{n=1}^N {1\\over n^s}\\f$\n\n\\li There are no doubt efficient shortcuts do doing this, but I use brute force. [Though Knuth's Art of Programming v1 doesn't offer anything, which is strong indication of nonexistence.] To speed things along, I save the results so that they can just be looked up should you request the same calculation. \n\n\\li If \\c N is zero or negative, return NaN. Notify the user if <tt>apop_opts.verbosity >=0</tt>\n\nFor example: \n\n\\include test_harmonic.c\n*/\nlong double apop_generalized_harmonic(int N, double s){\n/* \nEach row in the saved-results structure is an \\f$s\\f$, and each column is \\f$1\\dots n\\f$, up to the largest \\f$n\\f$ calculated to date.\n\nWhen reading the code, remember that the zeroth element holds the value for N=1, and so on.\n*/\n    Apop_stopif(N<=0, return GSL_NAN, 0, \"N is %i, but must be greater than 0.\", N);\n    static double *  eses\t= NULL;\n    static int * \t lengths= NULL;\n    static int\t\t count\t= 0;\n    static long double ** precalced=NULL;\n    int\told_len, i;\n    OMP_critical(generalized_harmonic)\n    { //Due to memoization, this can't parallelize.\n\tfor (i=0; i< count; i++)\n\t\tif (eses == NULL || eses[i] == s) \t\n            break;\n\tif (i == count){\t//you need to build the vector from scratch.\n\t\tcount\t\t\t++;\n        i               = count - 1;\n\t\tprecalced \t\t= realloc(precalced, sizeof (long double*) * count);\n\t\tlengths \t\t= realloc(lengths, sizeof (int*) * count);\n\t\teses \t\t\t= realloc(eses, sizeof (double) * count);\n\t\tprecalced[i]\t= malloc(sizeof(long double) * N);\n\t\tlengths[i]\t    = N;\n\t\teses[i]\t\t    = s;\n\t\tprecalced[i][0]\t= 1;\n\t\told_len\t\t\t= 1;\n\t}\n\telse {\t//then you found it.\n\t\told_len = lengths[i];\n\t}\n\tif (N-1 >= old_len){\t//It's there, but you need to extend what you have.\n\t\tprecalced[i] = realloc(precalced[i], sizeof(long double) * N);\n\t\tfor (int j = old_len; j<N; j++)\n\t\t\tprecalced[i][j] = precalced[i][j-1] + 1/pow((j+1),s);\n\t}\n    }\n\treturn \tprecalced[i][N-1];\n}\n\n/** Call \\c system(), but with <tt>printf</tt>-style arguments. E.g.,\n  \n\\code\nchar filenames[] = \"apop_asst.c apop_asst.o\"\napop_system(\"ls -l %s\", filenames);\n\\endcode\n\n\\return The return value of the \\c system() call.\n */\nint apop_system(const char *fmt, ...){\n    char *q;\n    va_list argp;\n\tva_start(argp, fmt);\n\tApop_stopif(vasprintf(&q, fmt, argp)==-1,  return -1, 0, \"Trouble writing to a string.\");\n\tva_end(argp);\n    int out = system(q);\n    free(q);\n    return out;\n}\n\nstatic int count_parens(const char *string){\n    int out = 0;\n    int last_was_backslash = 0;\n    for(const char *step =string; *step !='\\0'; step++){\n        if (*step == '\\\\' && !last_was_backslash){\n            last_was_backslash = 1;\n            continue;\n        }\n        if (*step == ')' && !last_was_backslash)\n            out++;\n        last_was_backslash = 0;\n    }\n    return out;\n}\n\n/** Extract subsets from a string via regular expressions.\n\nThis function takes a regular expression and repeatedly applies it to an input string. It returns the count of matches, and optionally returns the matches themselves organized into the \\c text grid of an \\ref apop_data set.\n\n\\li There are three common flavors of regular expression: Basic, Extended,\nand Perl-compatible (BRE, ERE, PCRE). I use EREs, as per the specs of\nyour C library, which should match POSIX's ERE specification. \n\nFor example, \"p.val\" will match \"P value\", \"p.value\", \"p values\" (and even \"tempeval\", so be\ncareful).\n\nIf you give a non-\\c NULL address in which to place a table of paren-delimited substrings, I'll return them as a row in the text element of the returned \\ref apop_data set. I'll return <em>all</em> the matches, filling the first row with substrings from the first application of your regex, then filling the next row with another set of matches (if any), and so on to the end of the string. Useful when parsing a list of items, for example.\n\n\n\\param string        The string to search (no default)\n\\param regex       The regular expression (no default)\n\\param substrings   Parens in the regex indicate that I should return matching substrings. Give me the _address_ of an \\ref apop_data* set, and I will allocate and fill the text portion with matches. Default= \\c NULL, meaning do not return substrings (even if parens exist in the regex). If no match, return an empty \\ref apop_data set, so <tt>output->textsize[0]==0</tt>.\n\\param use_case         Should I be case sensitive, \\c 'y' or \\c 'n'? (default = \\c 'n', which is not the POSIX default.)\n\n\\return         Count of matches found. 0 == no match. \\c substrings may be allocated and filled if needed.\n\n\\li If <tt>apop_opts.stop_on_warning='n'</tt> returns -1 on error (e.g., regex \\c NULL or didn't compile).\n\\li If <tt>strings==NULL</tt>, I return 0---no match---and if \\c substrings is provided, set it to \\c NULL.\n\n\\li Here is the test function. Notice that the substring-pulling\nfunction call passes \\c &subs, not plain \\c subs. \n\n\n\\include test_regex.c\n\n\\li Each set of matches will be one row of the output data. E.g., given the regex <tt>([A-Za-z])([0-9])</tt>, the column zero of <tt>outdata</tt> will hold letters, and column one will hold numbers.\nUse \\ref apop_data_transpose to reverse this so that the letters are in <tt>outdata->text[0]</tt> and numbers in <tt>outdata->text[1]</tt>.\n*/\nAPOP_VAR_HEAD int  apop_regex(const char *string, const char* regex, apop_data **substrings, const char use_case){\n    const char * apop_varad_var(string, NULL);\n    apop_data **apop_varad_var(substrings, NULL);\n    if (!string) {\n        if (substrings) *substrings=NULL;\n        return 0;\n    }\n    const char * apop_varad_var(regex, NULL);\n    Apop_stopif(!regex, return -1, 0, \"You gave me a NULL regex.\");\n    const char apop_varad_var(use_case, 'n');\nAPOP_VAR_ENDHEAD\n    regex_t re;\n    int matchcount=count_parens(regex);\n    int found, found_ct=0;\n    regmatch_t result[matchcount+1];\n    int compiled_ok = !regcomp(&re, regex, REG_EXTENDED \n                                            + (use_case=='y' ? 0 : REG_ICASE)\n                                            + (substrings ? 0 : REG_NOSUB) );\n    Apop_stopif(!compiled_ok, return -1, 0, \"This regular expression didn't compile: \\\"%s\\\"\", regex);\n\n    int matchrow = 0;\n    if (substrings) *substrings = apop_data_alloc();\n    do {\n        found_ct+=\n        found    = !regexec(&re, string, matchcount+1, result, matchrow ? REG_NOTBOL : 0);\n        if (substrings && found){\n            *substrings = apop_text_alloc(*substrings, matchrow+1, matchcount);\n            //match zero is the whole string; ignore.\n            for (int i=0; i< matchcount; i++){\n                if (result[i+1].rm_eo > 0){//GNU peculiarity: match-to-empty marked with -1.\n                    int length_of_match = result[i+1].rm_eo - result[i+1].rm_so;\n                    if ((*substrings)->text[matchrow][i] != apop_nul_string) free((*substrings)->text[matchrow][i]);\n                    (*substrings)->text[matchrow][i] = malloc(strlen(string)+1);\n                    memcpy((*substrings)->text[matchrow][i], string + result[i+1].rm_so, length_of_match);\n                    (*substrings)->text[matchrow][i][length_of_match] = '\\0';\n                } //else matches nothing; apop_text_alloc already made this cell this NULL.\n            }\n            string += result[0].rm_eo; //end of whole match;\n            matchrow++;\n        }\n    } while (substrings && found && string[0]!='\\0');\n    regfree(&re);\n    return found_ct;\n}\n\n/** RNG from a Generalized Hypergeometric type B3.\n\nDevroye uses this as the base for many of his distribution-generators, including the Waring.\n\n\\li If one of the inputs is <=0, error; return NaN and print a warning.\n*/  //Header in stats.h\ndouble apop_rng_GHgB3(gsl_rng * r, double* a){\n    Apop_stopif(!((a[0]>0) && (a[1] > 0) && (a[2] > 0)), return NAN, 0, \"all inputs must be positive.\");\n    double aa = gsl_ran_gamma(r, a[0], 1),\n\t\t   b  = gsl_ran_gamma(r, a[1], 1),\n\t\t   c  = gsl_ran_gamma(r, a[2], 1);\n    int\tp = gsl_ran_poisson(r, aa*b/c);\n\treturn p;\n}\n\n/** The Beta distribution is useful for modeling because it is bounded between zero and one, and can be either unimodal (if the variance is low) or bimodal (if the variance is high), and can have either a slant toward the bottom or top of the range (depending on the mean).\n\nThe distribution has two parameters, typically named \\f$\\alpha\\f$ and \\f$\\beta\\f$, which can be difficult to interpret. However, there is a one-to-one mapping between (alpha, beta) pairs and (mean, variance) pairs. Since we have good intuition about the meaning of means and variances, this function takes in a mean and variance, calculates alpha and beta behind the scenes, and returns the appropriate Beta distribution.\n\n\\param m\nThe mean the Beta distribution should have. Notice that m\nis in [0,1].\n\n\\param v\nThe variance which the Beta distribution should have. It is in (0, 1/12), where (1/12) is the variance of a Uniform(0,1) distribution. Funny things happen with variance near 1/12 and mean far from 1/2.\n\n\\return Returns an \\ref apop_model produced by copying the \\c apop_beta model and\nsetting its parameters appropriately.\n\n\\exception out->error=='r' Range error: mean is not within [0, 1].\n*/\napop_model *apop_beta_from_mean_var(double m, double v){\n    Apop_stopif(m>=1|| m<=0, apop_model *out = apop_model_copy(apop_beta);\n                        out->error='r'; return out,\n                       0, \"You asked for a beta distribution \"\n                        \"with mean %g, but the mean of the beta will always \"\n                        \"be strictly between zero and one.\", m);\n    double k     = (m * (1- m)/ v) -1;\n    double alpha = m*k;\n    double beta  = k * (1-m);\n    return apop_model_set_parameters(apop_beta, alpha, beta);\n}\n\n/** \\def apop_rng_get_thread\nThe \\c gsl_rng is not itself thread-safe, in the sense that it can not be used\nsimultaneously by multiple threads. However, if each thread has its own \\c gsl_rng,\nthen each will safely operate independently.\n\nThus, Apophenia keeps an internal store of RNGs for use by threaded functions. If the\ninput to this function, \\c thread, is greater than any previous input, then the array\nof <tt>gsl_rng</tt>s is extended to length \\c thread, and each element extended using\n<tt>++apop_opts.rng_seed</tt> (i.e., the seed is incremented before use).\n\nThis function can be used anywhere a \\c gsl_rng would be used.\n\n\\param thread_in The number of the RNG to retrieve, starting at zero (which is\nhow OpenMP numbers its threads). If -1, I'll look up the current thread (via \\c\nomp_get_thread_num) for you.\n\nSee \\ref threading for additional notes. In most cases, you want to use <tt>apop_rng_get_thread(-1)</tt>.\n\n\\return The appropriate RNG, initialized if necessary.\n\\hideinitializer\n*/\ngsl_rng *apop_rng_get_thread_base(int thread){\n    static gsl_rng **rngs;\n    static int rng_ct = -1;\n\n    if (thread==-1){\n        #ifdef OpenMP\n            thread = omp_get_thread_num();\n        #else\n            thread = 0;\n        #endif\n    }\n\n    OMP_critical(rng_get_thread)\n    if (thread > rng_ct)\n        {\n            rngs = realloc(rngs, sizeof(gsl_rng*)*(thread+1));\n            for (int i=rng_ct+1; i<= thread; i++)\n                rngs[i] = apop_rng_alloc(++apop_opts.rng_seed);\n            rng_ct = thread;\n        }\n    return rngs[thread];\n}\n\n/** Make a set of random draws from a model and write them to an \\ref apop_data set.\n\n\\param model The model from which draws will be made. Must already be prepared and/or estimated.\n\n\\param count The number of draws to make. If \\c draw_matrix is not \\c NULL, then this is ignored and <tt>count=draw_matrix->matrix->size1</tt>. default=1000.\n\n\\param draws If not \\c NULL, a pre-allocated data set whose \\c matrix element will be filled with draws. \n\n\\return An \\ref apop_data set with the matrix filled with \\c size draws. If <tt>draw_matrix!=NULL</tt>, then return a pointer to it.\n\n\\exception out->error=='m' Input model isn't good for making draws: it is \\c NULL, or <tt>m->dsize=0</tt>.\n\n\\exception out->error=='s' You gave me a \\c draws matrix, but its size is less than the size of a single draw from the data, <tt>model->dsize</tt>.\n\n\\exception out->error=='d' Trouble drawing from the distribution for at least one row. That row is set to all \\c NAN.\n\n\\li Prints a warning if you send in a non-<tt>NULL apop_data</tt> set, but its \\c matrix element is \\c NULL, when <tt>apop_opts.verbose>=1</tt>.\n\\li See also \\ref apop_draw, which makes a single draw.\n\\li Random numbers are generated using RNGs from \\ref apop_rng_get_thread, qv.\n\nHere is a two-line program to draw a different set of ten Standard Normals on every run (provided runs are more than a second apart):\n\n\\include draw_some_normals.c\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data *apop_model_draws(apop_model *model, int count, apop_data *draws){\n    apop_model * apop_varad_var(model, NULL);\n    Apop_stopif(!model, apop_return_data_error(n), 0, \"Input model is NULL.\");\n    Apop_stopif(!model->dsize, apop_return_data_error(n), 0, \"Input model has dsize==0.\");\n    apop_data * apop_varad_var(draws, NULL);\n    int apop_varad_var(count, 1000);\n    if (draws) {\n        Apop_stopif(!draws->matrix, draws->error='m'; return draws, 1, \"Input data set's matrix is NULL.\");\n        Apop_stopif((int)draws->matrix->size2 < model->dsize, draws->error='s'; draws->error='m'; return draws,\n                1, \"Input data set's matrix column count is less than model->dsize.\");\n        count = draws->matrix->size1;\n    } else\n        Apop_stopif(model->dsize<=0, apop_return_data_error(n), 0, \"model->dsize<=0, so I don't know the size of matrix to allocate.\");\nAPOP_VAR_ENDHEAD\n    apop_data *out = draws ? draws : apop_data_alloc(count, model->dsize);\n\n    OMP_for (int i=0; i< count; i++){\n        apop_data *onerow = Apop_r(out, i);\n        Apop_stopif(apop_draw(onerow->matrix->data, apop_rng_get_thread(omp_threadnum), model),\n                gsl_matrix_set_all(onerow->matrix, GSL_NAN); out->error='d',\n                0, \"Trouble drawing for row %i. \"\n                \"I set it to all NANs and set out->error='d'.\", i);\n    }\n    return out;\n}\n"
  },
  {
    "path": "apop_bootstrap.m4.c",
    "content": "/** \\file apop_bootstrap.c\n\nCopyright (c) 2006--2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n\n/** Initialize a \\c gsl_rng.\n \n  Uses the Tausworth routine.\n\n\\param  seed    The seed. No need to get funny with it: 0, 1, and 2 will produce wholly different streams.\n\\return The RNG ready for your use.\n\n\\li If you are confident that your code is debugged and would like a new stream of values every time your program runs (provided your runs are more than a second apart), seed with the time:\n\n\\include draw_some_normals.c\n*/\ngsl_rng *apop_rng_alloc(int seed){\n    static int first_use = 1;\n    if (first_use){\n       first_use = 0;\n       OMP_critical(rng_env_setup) //GSL makes vague promises about thread-safety\n       gsl_rng_env_setup();\n    }\n    gsl_rng *setme = gsl_rng_alloc(gsl_rng_taus2);\n    gsl_rng_set(setme, seed);\n    return setme;\n}\n\n/** Give me a data set and a model, and I'll give you the jackknifed covariance matrix of the model parameters.\n\nThe basic algorithm for the jackknife (glossing over the details): create a sequence of data\nsets, each with exactly one observation removed, and then produce a new set of parameter estimates \nusing that slightly shortened data set. Then, find the covariance matrix of the derived parameters.\n\n\\li Jackknife or bootstrap? As a broad rule of thumb, the jackknife works best on models\n    that are closer to linear. The worse a linear approximation does (at the given data),\n    the worse the jackknife approximates the variance.\n\n\\param in\t    The data set. An \\ref apop_data set where each row is a single data point.\n\\param model    An \\ref apop_model, that will be used internally by \\ref apop_estimate.\n            \n\\exception out->error=='n'   \\c NULL input data.\n\\return         An \\c apop_data set whose matrix element is the estimated covariance matrix of the parameters.\n\\see apop_bootstrap_cov\n\nFor example:\n\\include jack.c\n*/\napop_data * apop_jackknife_cov(apop_data *in, apop_model *model){\n    Apop_stopif(!in, apop_return_data_error(n), 0, \"The data input can't be NULL.\");\n    Get_vmsizes(in); //msize1, msize2, vsize\n    apop_model *e = apop_model_copy(model);\n    int i, n = GSL_MAX(msize1, GSL_MAX(vsize, in->textsize[0]));\n    apop_model *overall_est = e->parameters ? e : apop_estimate(in, e);//if not estimated, do so\n    gsl_vector *overall_params = apop_data_pack(overall_est->parameters);\n    gsl_vector_scale(overall_params, n); //do it just once.\n    gsl_vector *pseudoval = gsl_vector_alloc(overall_params->size);\n\n    //Copy the original, minus the first row.\n    apop_data *subset = apop_data_copy(Apop_rs(in, 1, n-1));\n    apop_name *tmpnames = in->names; \n    in->names = NULL;  //save on some copying below.\n\n    apop_data *array_of_boots = apop_data_alloc(n, overall_params->size);\n\n    for(i = -1; i< n-1; i++){\n        //Get a view of row i, and copy it to position i-1 in the short matrix.\n        if (i >= 0) apop_data_memcpy(Apop_r(subset, i), Apop_r(in, i));\n        apop_model *est = apop_estimate(subset, e);\n        gsl_vector *estp = apop_data_pack(est->parameters);\n        gsl_vector_memcpy(pseudoval, overall_params);// *n above.\n        gsl_vector_scale(estp, n-1);\n        gsl_vector_sub(pseudoval, estp);\n        gsl_matrix_set_row(array_of_boots->matrix, i+1, pseudoval);\n        apop_model_free(est);\n        gsl_vector_free(estp);\n    }\n    in->names = tmpnames;\n    apop_data *out = apop_data_covariance(array_of_boots);\n    gsl_matrix_scale(out->matrix, 1./(n-1.));\n    apop_data_free(subset);\n    gsl_vector_free(pseudoval);\n    apop_data_free(array_of_boots);\n    if (e!=overall_est)\n        apop_model_free(overall_est);\n    apop_model_free(e);\n    gsl_vector_free(overall_params);\n    return out;\n}\n\n/** Give me a data set and a model, and I'll give you the bootstrapped covariance matrix of the parameter estimates.\n\n\\param data\t    The data set. An \\c apop_data set where each row is a single data point. (No default)\n\\param model    An \\ref apop_model, whose \\c estimate method will be used here. (No default)\n\\param iterations How many bootstrap draws should I make? (default: 1,000) \n\\param rng        An RNG that you have initialized, probably with \\c apop_rng_alloc. (Default: an RNG from \\ref apop_rng_get_thread)\n\\param boot_store  If not \\c NULL, put the list of drawn parameter values here, with one parameter set per row. Sample use: \n\\code\napop_data *boots;\napop_bootstrap_cov(data, model, .boot_store=&boots);\napop_data_print(boots);\n\\endcode\nThe rows are packed via \\ref apop_data_pack, so use \\ref apop_data_unpack if needed. (Default: \\c NULL)\n\\param ignore_nans If \\c 'y' and any of the elements in the estimation return \\c NaN, then I will throw out that draw and try again. If \\c 'n', then I will write that set of statistics to the list, \\c NaN and all. I keep count of throw-aways; if there are more than \\c iterations elements thrown out, then I throw an error and return with estimates using data I have so far. That is, I assume that \\c NaNs are rare edge cases; if they are as common as good data, you might want to rethink how you are using the bootstrap mechanism. (Default: 'n')\n\\return         An \\c apop_data set whose matrix element is the estimated covariance matrix of the parameters.\n\\exception out->error=='n'   \\c NULL input data.\n\\exception out->error=='N'   \\c too many NaNs.\n\n\\li This function uses the \\ref designated syntax for inputs.\n\nThis example is a sort of demonstration of the Central Limit Theorem. The model is\na simulation, where each call to the estimation routine produces the mean/std dev of\na set of draws from a Uniform Distribution. Because the simulation takes no inputs,\n\\ref apop_bootstrap_cov simply re-runs the simulation and calculates a sequence of\nmean/std dev pairs, and reports the covariance of that generated data set.\n\n\\include boot_clt.c\n\n\\see apop_jackknife_cov\n */\nAPOP_VAR_HEAD apop_data * apop_bootstrap_cov(apop_data * data, apop_model *model, gsl_rng *rng, int iterations, char keep_boots, char ignore_nans, apop_data **boot_store) {\n    apop_data * apop_varad_var(data, NULL);\n    apop_model *model = varad_in.model;\n    int apop_varad_var(iterations, 1000);\n    gsl_rng * apop_varad_var(rng, apop_rng_get_thread());\n    char apop_varad_var(keep_boots, 'n');\n    apop_data** apop_varad_var(boot_store, NULL);\n    char apop_varad_var(ignore_nans, 'n');\nAPOP_VAR_ENDHEAD\n    Get_vmsizes(data); //vsize, msize1, msize2\n    apop_model *e = apop_model_copy(model);\n    apop_data *subset = apop_data_copy(data);\n    apop_data *array_of_boots = NULL,\n              *summary;\n    //prevent and infinite regression of covariance calculation.\n    Apop_model_add_group(e, apop_parts_wanted); //default wants for nothing.\n    size_t i, nan_draws=0;\n    apop_name *tmpnames = (data && data->names) ? data->names : NULL; //save on some copying below.\n    if (data && data->names) data->names = NULL;\n\n    int height = GSL_MAX(msize1, GSL_MAX(vsize, (data?(*data->textsize):0)));\n\tfor (i=0; i<iterations && nan_draws < iterations; i++){\n\t\tfor (size_t j=0; j< height; j++){       //create the data set\n\t\t\tsize_t randrow\t= gsl_rng_uniform_int(rng, height);\n            apop_data_memcpy(Apop_r(subset, j), Apop_r(data, randrow));\n\t\t}\n\t\t//get the parameter estimates.\n\t\tapop_model *est = apop_estimate(subset, e);\n        gsl_vector *estp = apop_data_pack(est->parameters);\n        if (!gsl_isnan(apop_sum(estp))){\n            if (i==0){\n                array_of_boots\t      = apop_data_alloc(iterations, estp->size);\n                apop_name_stack(array_of_boots->names, est->parameters->names, 'c', 'v');\n                apop_name_stack(array_of_boots->names, est->parameters->names, 'c', 'c');\n                apop_name_stack(array_of_boots->names, est->parameters->names, 'c', 'r');\n            }\n            gsl_matrix_set_row(array_of_boots->matrix, i, estp);\n        } else if (ignore_nans=='y'){\n            i--; \n            nan_draws++;\n        }\n        apop_model_free(est);\n        gsl_vector_free(estp);\n\t}\n    if(data) data->names = tmpnames;\n    apop_data_free(subset);\n    apop_model_free(e);\n    int set_error=0;\n    Apop_stopif(i == 0 && nan_draws == iterations, apop_return_data_error(N),\n                1, \"I ran into %i NaNs and no not-NaN estimations, and so stopped. \"\n                       , iterations);\n    Apop_stopif(nan_draws == iterations,  set_error++;\n            apop_matrix_realloc(array_of_boots->matrix, i, array_of_boots->matrix->size2),\n                1, \"I ran into %i NaNs, and so stopped. Returning results based \"\n                       \"on %zu bootstrap iterations.\", iterations, i);\n\tsummary\t= apop_data_covariance(array_of_boots);\n    if (boot_store) *boot_store = array_of_boots;\n    else            apop_data_free(array_of_boots);\n    if (set_error) summary->error = 'N';\n\treturn summary;\n}\n"
  },
  {
    "path": "apop_conversions.m4.c",
    "content": "/** \\file apop_conversions.c\tThe various functions to convert from one format to another. */\n/* Copyright (c) 2006--2010, 2012 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n#include \"apop_internal.h\"\n#include <gsl/gsl_math.h> //GSL_NAN\n#include <assert.h>\n#include <stdbool.h>\n#include <libgen.h>\n\n/*extend a string. this prevents a minor leak you'd get if you did\n asprintf(&q, \"%s is a teapot.\", q);\n q may be NULL, which prints the string \"null\", so use the little XN macro below when using this function.\n\n This is internal to apop. right now. \n*/\nvoid xprintf(char **q, char *format, ...){ \n    va_list ap; \n    char *r = *q; \n    va_start(ap, format); \n    Apop_stopif(vasprintf(q, format, ap)==-1, , 0, \"Trouble writing to a string.\");\n    va_end(ap);\n    free(r);\n}\n\n/** Copies a one-dimensional array to a <tt>gsl_vector</tt>. The input array is undisturbed.\n\n\\param in     An array of <tt>double</tt>s. (No default. Must not be \\c NULL);\n\\param size \tHow long \\c line is. If this is zero or omitted, I'll\nguess using the <tt>sizeof(line)/sizeof(line[0])</tt> trick, which will\nwork for most arrays allocated using <tt>double []</tt> and won't work\nfor those allocated using <tt>double *</tt>. (default = auto-guess)\n\\return   A <tt>gsl_vector</tt>, allocated and filled with a copy of (not a pointer to) the input data.\n\n\\li If you send in a \\c NULL vector, you get a \\c NULL pointer in return. I warn you of this if <tt>apop_opts.verbosity >=1 </tt>.\n\n\\li This function uses the \\ref designated syntax for inputs.\n\\see \\ref apop_data_falloc\n*/ \nAPOP_VAR_HEAD gsl_vector * apop_array_to_vector(double *in, int size){\n    double * apop_varad_var(in, NULL);\n    Apop_assert_c(in, NULL, 1, \"You sent me NULL data; returning NULL.\");\n    int apop_varad_var(size, sizeof(in)/sizeof(in[0]));\nAPOP_VAR_ENDHEAD\n    gsl_vector *out = gsl_vector_alloc(size);\n    gsl_vector_view\tv = gsl_vector_view_array((double*)in, size);\n\tgsl_vector_memcpy(out,&(v.vector));\n    return out;\n}\n\n/** This function copies the data in a vector to a new one-column (or one-row) matrix\nand returns the newly-allocated and filled matrix.\n\n  For the reverse, try \\ref apop_data_pack.\n\n\\param in a \\c gsl_vector (No default. If \\c NULL, I return \\c NULL, with a warning if <tt>apop_opts.verbose >=1 </tt>)\n\\param row_col If \\c 'r', then this will be a row (1 x N) instead of the default, a column (N x 1). (default: \\c 'c')\n\\return a newly-allocated <tt>gsl_matrix</tt> with one column (or row).\n\n\\li If you send in a \\c NULL vector, you get a \\c NULL pointer in return. I warn you of this if <tt>apop_opts.verbosity >=2 </tt>.\n\\li If \\c gsl_matrix_alloc fails you get a \\c NULL pointer in return.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD gsl_matrix * apop_vector_to_matrix(const gsl_vector *in, char row_col){\n    const gsl_vector * apop_varad_var(in, NULL);\n    Apop_assert_c(in, NULL, 2, \"Converting NULL vector to NULL matrix.\");\n    char apop_varad_var(row_col, 'c');\nAPOP_VAR_ENDHEAD\n    bool isrow = (row_col == 'r' || row_col == 'R');\n    gsl_matrix *out = isrow ? gsl_matrix_alloc(1, in->size)\n                            : gsl_matrix_alloc(in->size, 1);\n    Apop_assert(out, \"gsl_matrix_alloc failed; probably out of memory.\");\n    (isrow ? gsl_matrix_set_row\n           : gsl_matrix_set_col)(out, 0, in);\n    return out;\n}\n\nstatic int find_cat_index(char **d, char * r, int start_from, int size){\n//used for apop_db_to_crosstab.\n    int i = start_from % size;\t//i is probably the same or i+1.\n\tdo {\n\t\tif(!strcmp(d[i], r)) return i;\n\t\ti++;\n\t\ti %= size;\t//loop around as necessary.\n\t} while (i!=start_from); \n    Apop_assert_c(0, -2, 0, \"Something went wrong in the crosstabbing; couldn't find %s.\", r);\n}\n\n/**Give the name of a table in the database, and optional names of three of its columns:\nthe x-dimension, the y-dimension, and the data. The output is a 2D matrix with rows\nindexed by 'row' and cols by 'col' and the cells filled with the entry in the 'data' column.\n\n\\param tabname The database table I'm querying. Anything that will work inside a \\c from clause is OK, such as a subquery in parens. (no default; must not be \\c NULL)\n\\param row The column of the data set that will indicate the rows of the output crosstab (no default; must not be \\c NULL)\n\\param col The column of the data set that will indicate the columns of the output crosstab (no default; must not be \\c NULL)\n\\param data The column of the data set holding the data for the cells of the crosstab (default: <tt>count(*)</tt>)\n\\param is_aggregate Set to \\c 'y' if the \\c data is a function like <tt>count(*)</tt>\n    or <tt>sum(col)</tt>. That is, set to \\c 'y' if querying this would require a <tt>group\n    by</tt> clause. (default: if I find an end-paren in \\c datacol, \\c 'y'; else \\c 'n'.)\n\n\\li  If the query to get data to fill the table (select row, col, data from\n    tabname) returns an empty data set, then I will return a \\c NULL data set and if\n    <tt>apop_opts.verbosity >= 1</tt> print a warning.\n\n\\exception out->error='n' Name not found error.\n\\exception out->error='q' Query returned an empty table (which might mean that it just failed).\n\n\\li The simplest use is to get a tally of how often (r1, r2) appears in the data via <tt>apop_db_to_crosstab(\"datatab\", \"r1\", \"r2\")</tt>.\n\\li If you want a 1-D crosstab, omit the other dimension. Or omit both to get a grand tally of your statistic for the entire table.\n\\li There is a commnad-line tool, <tt>apop_db_to_crosstab</tt> that calls this function.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data *apop_db_to_crosstab(char const*tabname, char const*row, char const* col, char const*data, char is_aggregate){\n    char const* apop_varad_var(tabname, NULL);\n    Apop_stopif(!tabname, return NULL, 1, \"Missing tabname. Returning NULL.\");\n    char const* apop_varad_var(row, \"1\");\n    char const* apop_varad_var(col, \"1\");\n    char const* apop_varad_var(data, \"count(*)\");\n    //This '(' balances the end-paren below, keeping m4 from losing the thread.\n    //Note the transitional check for \"group by\", which we should one day remove.\n    char apop_varad_var(is_aggregate, (strchr(data, ')') && !strstr(data, \"group by\"))?'y':'n');\nAPOP_VAR_ENDHEAD\n    gsl_matrix *out=NULL;\n    int\ti, j=0;\n    apop_data *pre_d1=NULL, *pre_d2=NULL, *datachars=NULL;\n    apop_data *outdata = apop_data_alloc();\n\n    char* p = apop_opts.db_name_column;\n    apop_opts.db_name_column = NULL;//we put this back at the end.\n    char *Q;\n\n    Asprintf(&Q, \"select %s, %s, %s from %s %s %s %s %s\", row, col, data, tabname,\n                                    is_aggregate!='n' ? \"group by\" : \"\",\n                                    is_aggregate!='n' ? row : \"\",\n                                    is_aggregate!='n' ? \",\" : \"\",\n                                    is_aggregate!='n' ? col : \"\");\n    datachars = apop_query_to_text(\"%s\", Q);\n    Apop_stopif(!datachars, free(Q); return NULL, 2, \"[%s] returned an empty table.\", Q);\n    Apop_stopif(datachars->error, free(Q); goto bailout, 0, \"error from [%s].\", Q);\n\n    //A bit inefficient, but well-encapsulated.\n    //Pull the distinct (sorted) list of headers, copy into outdata->names.\n    pre_d1 = apop_query_to_text(\"select distinct %s, 1 from %s order by %s\", row, tabname, row);\n    Apop_stopif(!pre_d1||pre_d1->error, outdata->error='q'; goto bailout, 0, \"Error querying %s from %s.\", row, tabname);\n    for (i=0; i < pre_d1->textsize[0]; i++)\n        apop_name_add(outdata->names, pre_d1->text[i][0], 'r');\n\n\tpre_d2 = apop_query_to_text(\"select distinct %s from %s order by %s\", col, tabname, col);\n    Apop_stopif(!pre_d2||pre_d2->error, outdata->error='q'; goto bailout, 0, \"Error querying %s from %s.\", row, tabname);\n    for (i=0; i < pre_d2->textsize[0]; i++)\n        apop_name_add(outdata->names, pre_d2->text[i][0], 'c');\n\n\tout\t= gsl_matrix_calloc(pre_d1->textsize[0], pre_d2->textsize[0]);\n\tfor (size_t k =0; k< datachars->textsize[0]; k++){\n\t\ti = find_cat_index(outdata->names->row, datachars->text[k][0], i, pre_d1->textsize[0]);\n\t\tj = find_cat_index(outdata->names->col, datachars->text[k][1], j, pre_d2->textsize[0]);\n        Apop_stopif(i==-2 || j == -2, outdata->error='n'; goto bailout, 0, \"Something went wrong in the crosstabbing; \"\n                                                 \"couldn't find %s or %s.\", datachars->text[k][0], datachars->text[k][1]);\n\t\tgsl_matrix_set(out, i, j, atof(datachars->text[k][2]));\n\t}\n    bailout:\n    apop_data_free(pre_d1);\n    apop_data_free(pre_d2);\n    apop_data_free(datachars);\n    outdata->matrix = out;\n    apop_opts.db_name_column = p;\n\treturn outdata;\n}\n\n/** See \\ref apop_db_to_crosstab for the storyline; this is the complement, which takes a\n  crosstab and writes its values to the database.\n\nFor example, I would take\n<table frame=box>                                                                                                              \n<tr><td> </td><td> c0</td><td>c1</td></tr>\n<tr><td>r0</td><td>2</td><td>3</td></tr> \n<tr><td>r1</td><td>0</td><td>4</td></tr> \n</table> \n\nand do the following writes to the database:\n\n\\code\ninsert into your_table values ('r0', 'c0', 2);\ninsert into your_table values ('r0', 'c1', 3);\ninsert into your_table values ('r1', 'c0', 3);\ninsert into your_table values ('r1', 'c1', 4);\n\\endcode\n\n\n\\li If your data set does not have names (or not enough names), I will use the scheme above, filling in names of the form <tt>r0</tt>, <tt>r1</tt>, ... <tt>c0</tt>, <tt>c1</tt>, .... Text columns get their own names, <tt>t0</tt>, <tt>t1</tt>.\n\n\\li This function handles only the matrix and text. \n */\nvoid apop_crosstab_to_db(apop_data *in,  char *tabname, char *row_col_name, \n\t\t\t\t\t\tchar *col_col_name, char *data_col_name){\n    apop_name *n = in->names;\n    char *colname, *rowname;\n    Get_vmsizes(in); //msize1, msize2\n    int maxcol= GSL_MAX(msize2, in->textsize[1]);\n    char sparerow[msize1 > 0 ? (int)log10(msize1)+1 : 0];\n    char sparecol[maxcol > 0 ? (int)log10(maxcol)+1 : 0];\n#define DbType apop_opts.db_engine=='m' ? \"text\" : \"character\"\n#define DbType2 apop_opts.db_engine=='m' ? \"double\" : \"numeric\"\n\tapop_query(\"CREATE TABLE %s (%s %s, %s %s, %s %s)\", tabname, \n                        row_col_name, DbType, col_col_name, DbType, data_col_name, DbType2);\n\tapop_query(\"begin\");\n    for (int i=0; i< msize1; i++){\n        rowname = (n->rowct > i) ?  n->row[i] : (sprintf(sparerow, \"r%i\", i), sparerow);\n        for (int j=0; j< msize2; j++){\n            colname = (n->colct > j) ? n->col[j] : (sprintf(sparecol, \"c%i\", j), sparecol);\n            double x = gsl_matrix_get(in->matrix, i, j); \n            if (!isnan(x)) apop_query(\"INSERT INTO %s VALUES ('%s', '%s', %g)\", \n                                                tabname, rowname, colname, x);\n            else apop_query(\"INSERT INTO %s VALUES ('%s', '%s', 0/0)\", \n                                        tabname, rowname, colname);\n        }\n    }\n    for (int i=0; i< in->textsize[0]; i++){\n        rowname = (n->rowct > i) ? n->row[i] : (sprintf(sparerow, \"r%i\", i), sparerow);\n        for (int j=0; j< in->textsize[1]; j++){\n            colname = (n->textct > j) ? n->text[j] : (sprintf(sparecol, \"t%i\", j), sparecol);\n            apop_query(\"INSERT INTO %s VALUES ('%s', '%s', '%s')\", tabname, \n                rowname, colname, in->text[i][j]);\n        }\n    }\n\tapop_query(\"commit\");\n}\n\n\n/** One often finds data where the column indicates the value of the data point. There may\nbe two columns, and a mark in the first indicates a miss while a mark in the second is a\nhit. Or say that we have the following list of observations:\n\n\\code\n2 3 3 2 1 1 2 1 1 2 1 1\n\\endcode\nThen we could write this as:\n\\code\n0  1  2  3\n----------\n0  6  4  2\n\\endcode\nbecause there are six 1s observed, four 2s observed, and two 3s observed. We call this\nrank format, because 1 (or zero) is typically the most common, 2 is second most common, et cetera.\n\nThis function takes in a list of observations, and aggregates them into a single row in rank format.\n\n\\li For the complement, see \\ref apop_data_rank_expand.\n\n\\li See also \\ref apop_data_to_factors to convert real numbers or text into a\nmatrix of categories.\n\n\\param in The input \\ref apop_data set. If \\c NULL, return \\c NULL.\n\\param min_bins If this is omitted, the number of bins is simply the largest number\nfound. So if there are bins {0, 1, 2} and your data set happens to consist of <tt>0 0\n1 1 0</tt>, then I won't know to generate results with three bins where the last bin\nhas a count of zero. Set <tt>.min_bins=2</tt> to ensure that bin is included.\n\n\\include test_ranks.c\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data * apop_data_rank_compress (apop_data *in, int min_bins){\n    apop_data * apop_varad_var(in, NULL);\n    if (!in) return NULL;\n    int apop_varad_var(min_bins, 0);\nAPOP_VAR_ENDHEAD\n    Get_vmsizes(in);\n    int upper_bound = GSL_MAX(in->matrix ? gsl_matrix_max(in->matrix) : min_bins,\n                              in->vector ? gsl_vector_max(in->vector) : min_bins);\n    apop_data *out = apop_data_calloc(1, upper_bound+1);\n    for (int i=0; i< msize1; i++)\n        for (int j=0; j< msize2; j++) \n            (*gsl_matrix_ptr(out->matrix, 0, apop_data_get(in, i, j)))++;\n    for (int i=0; i< vsize; i++) \n        (*gsl_matrix_ptr(out->matrix, 0, apop_data_get(in, i, -1)))++;\n    return out;\n}\n\n/** The complement to this is \\ref apop_data_rank_compress; see that function's\n  documentation for the story and an example.\n\n  This function takes in a data set where the zeroth column includes the count(s)\n  of times that zero was observed, the first gives the count(s) of times that one was\n  observed, et cetera. It outputs a data set whose vector element includes a list that\n  has exactly the given frequency of zeros, ones, et cetera.\n*/\napop_data *apop_data_rank_expand (apop_data *in){\n    int total_ct = (in->matrix ? apop_matrix_sum(in->matrix) : 0)\n                 + (in->vector ? apop_vector_sum(in->vector) : 0);\n    if (total_ct == 0)\n        return NULL;\n    apop_data *out = apop_data_alloc(total_ct);\n    int posn = 0;\n    for (int i=0; i< in->matrix->size1; i++)\n        for (int k=0; k< in->matrix->size2; k++)\n            for (int j=0; j< gsl_matrix_get(in->matrix, i, k); j++)\n                gsl_vector_set(out->vector, posn++, k); \n    return out;\n}\n\n/** Copy one  <tt>gsl_vector</tt> to another. That is, all data is duplicated.\nUnlike <tt>gsl_vector_memcpy</tt>, this function allocates and returns the destination,\nso you can use it like this:\n\\code\ngsl_vector *a_copy = apop_vector_copy(original);\n\\endcode\n\n\\param in   The input vector\n\\return     A structure that this function will allocate and fill. If \\c gsl_vector_alloc fails, returns \\c NULL and print a warning.\n*/\ngsl_vector *apop_vector_copy(const gsl_vector *in){\n    if (!in) return NULL;\n    gsl_vector *out = gsl_vector_alloc(in->size);\n    Apop_stopif(!out, return NULL, 0, \"failed to allocate a gsl_vector of size %zu. Out of memory?\", in->size);\n    gsl_vector_memcpy(out, in);\n    return out;\n}\n\n/** Copy one <tt>gsl_matrix</tt> to another. That is, all data are duplicated.\nUnlike <tt>gsl_matrix_memcpy</tt>, this function allocates and returns the destination,\nso you can use it like this:\n\\code\ngsl_matrix *a_copy = apop_matrix_copy(original);\n\\endcode\n\n\\param in  the input data\n\\return  A structure that this function will allocate and fill. If \\c gsl_matrix_alloc fails, returns \\c NULL.\n*/\ngsl_matrix *apop_matrix_copy(const gsl_matrix *in){\n    if (!in) return NULL;\n    gsl_matrix *out = gsl_matrix_alloc(in->size1, in->size2);\n    Apop_stopif(!out, return NULL, 0, \"failed to allocate a gsl_matrix of size %zu x %zu. Out of memory?\", in->size1, in->size2);\n    gsl_matrix_memcpy(out, in);\n    return out;\n}\n\n\n///////////////The text processing section\n\n/** \\page text_format Input text file formatting\n\nThis reference section describes the assumptions made by \\ref apop_text_to_db and \\ref apop_text_to_data.\n\nEach row of the file will be converted to one record in the database or one row in the\nmatrix. Values on one row are separated by delimiters. Fixed-width input is also OK;\nsee below.\n\nBy default, the delimiters are set to \"|,\\t\", meaning that a pipe, comma, or tab\nwill delimit separate entries.  To change the default, use an argument to\n\\ref apop_text_to_db or \\ref apop_text_to_data like <tt>.delimiters=\" \\t\"</tt> or\n<tt>.delimiters=\"|\"</tt>.\n\nThe input text file must be UTF-8 or traditional ASCII encoding. Delimiters must be ASCII characters. \nIf your data is in another encoding, try the POSIX-standard \\c iconv program to filter the data to UTF-8.\n\n  \\li The character after a backslash is read as a normal character, even if it is a delimiter, \\c #, or \\c \".\n\\li If a field contains several such special characters, surround it by \\c \"s. The\nsurrounding marks are stripped and the text read verbatim.\n  \\li Text does not need to be delimited by quotes (unless there are special characters). If a text field is quote-delimited, I'll strip them.\nE.g., \"Males, 30-40\", is an OK column name, as is \"Males named \\\"Joe\\\\\"\".\n  \\li Everything after an unprotected \\c # is taken to be comments and ignored. \n  \\li Blank lines (empty or consisting only of white space) are also ignored.\n  \\li If you are reading into the <tt>gsl_matrix</tt> element of an \\ref apop_data set,\nall text fields are taken as zeros. You will be warned of such substitutions unless\nyou set <tt>apop_opts.verbose==0</tt> beforehand. For mixed text/numeric data,\ntry using \\ref apop_text_to_db and then \\ref apop_query_to_mixed_data.\n  \\li There are often two delimiters in a row, e.g., \"23, 32,, 12\". When it's two commas\nlike this, the user typically means that there is a missing value and the system should\ninsert a NAN; when it is two tabs in a row, this is typically just a formatting\nglitch. Thus, if there are multiple delimiters in a row, I check whether the second\n(and subsequent) is a space or a tab; if it is, then it is ignored, and if it is any\nother delimiter (including the end of the line) then a NaN is inserted.\n\nIf this rule doesn't work for your situation, you can explicitly insert a note that there is a missing data\npoint. E.g., try: \\code\n\t\tperl -pi.bak -e 's/,,/,NaN,/g' data_file\n\\endcode\n\nIf you have missing data delimiters, you will need to set \\ref apop_opts_type\n\"apop_opts.nan_string\" to text that matches the given format. E.g.,\n\n\\code\n//Apophenia's default NaN string, matching NaN, nan, or NAN, but not Nancy:\napop_opts.nan_string = \"NaN\";\n//Popular alternatives:\napop_opts.nan_string = \"Missing\";\napop_opts.nan_string = \".\";\n\n//Or, turn off nan-string checking entirely with:\napop_opts.nan_string = NULL;\n\\endcode\n\nSQLite stores these NaN-type values internally as \\c NULL; that means that functions like\n\\ref apop_query_to_data will convert both your \\c nan_string string and \\c NULL to \\c NaN.\n\n  \\li The system uses the standards for C's \\c atof() function for\nfloating-point numbers: INFINITY, -INFINITY, and NaN work as expected.\n  \\li If there are row names and column names, then the input will not be perfectly square:\nthere should be no first entry in the sequence of column names like <tt>row names</tt>. That is,\nfor a 100x100 data set with row and column names, there are 100 names in the top row,\nand 101 entries in each subsequent row (name plus 100 data points).\n  \\li White space before or after a field is ignored. So <tt>1, 2,3, 4 , 5, \" six \",7 </tt>\nis eqivalent to <tt>1,2,3,4,5,\" six \",7</tt>.\n  \\li NUL characters (<tt>'\\0'</tt>) are treated as white space, so if your fields have NULs as padding, you should have no problem. NULs inside of a string terminates the string as it always does in C.\n  \\li Fixed-width formats are supported (for plain ASCII encoding only), but you have to provide a list of field ending positions. For example, given\n\\code\nNUMLEOL\n123AABB\n456CCDD\n\\endcode\nand <tt>.field_ends=(int[]){3, 5, 7}</tt>, we have three columns, named NUM, LE,\nand OL. The names can be read from the first row by setting <tt>.has_row_names='y'</tt>.\n*/\n\nstatic int prep_text_reading(char const *text_file, FILE **infile){\n    *infile = !strcmp(text_file, \"-\")\n                    ? stdin\n\t                : fopen(text_file, \"r\");\n    Apop_assert_c(*infile, 1,  0, \"Trouble opening %s. Returning NULL.\", text_file);\n    return 0;\n}\n\n/////New text file reading\n/** \\cond doxy_ignore */\nextern char *apop_nul_string;\n\n#define Textrealloc(str, len) (str) =         \\\n            (str) != apop_nul_string          \\\n                ? realloc((str), (len))       \\\n                : (((len) > 0) ? malloc(len) : apop_nul_string);\n\ntypedef struct {int ct; int eof;} line_parse_t;\n/** \\endcond */\n\nstatic line_parse_t parse_a_fixed_line(FILE *infile, apop_data *fn, int const *field_ends){\n    int c = fgetc(infile);\n    int ct = 0, posn=0, thisflen=0, needfield=1;\n    while(c!='\\n' && c !=EOF){\n        posn++;\n        if (needfield){//start a new field\n            if (++ct > fn->textsize[0])\n                apop_text_alloc(fn, ct, 1);//realloc text portion.\n            thisflen = \n            needfield = 0;\n        }\n\n        //extend field:\n        thisflen++;\n        Textrealloc(*fn->text[ct-1], thisflen);\n        fn->text[ct-1][0][thisflen-1] = c;\n\n        if (posn==*field_ends){ //close off this field.\n            Textrealloc(*fn->text[ct-1], thisflen+1);\n            fn->text[ct-1][0][thisflen] = '\\0';\n            thisflen = 0;\n            field_ends++;\n            needfield=1;\n        } \n        c = fgetc(infile);\n    }\n    if (needfield==0){//user didn't give last field end.\n        Textrealloc(*fn->text[ct-1], thisflen+1);\n        fn->text[ct-1][0][thisflen] = '\\0';\n    }\n    return (line_parse_t) {.ct=ct, .eof= (c == EOF)};\n}\n\n/** \\cond doxy_ignore */\ntypedef struct{\n    char c, type;\n} apop_char_info;\n/** \\endcond */\n\nstatic const size_t bs=1e5;\nstatic int get_next(char *buffer, size_t *ptr, FILE *infile){\n    int r;\n    if (*ptr>=bs){\n        size_t len=fread(buffer, 1, bs, infile);\n        if (len < bs) buffer[len]=(char)-1;\n        *ptr=0;\n    }\n    r = buffer[(*ptr)++];\n    return r == (char)-1 ? EOF : r;\n}\n\nstatic apop_char_info parse_next_char(char *buffer, size_t *ptr, FILE *f, char const *delimiters){\n    int c = get_next(buffer, ptr, f);\n    int is_delimiter = !!strchr(delimiters, c);\n    return (apop_char_info){.c=c, \n            .type = (c==' '||c=='\\r' ||c=='\\t' || c==0)? (is_delimiter ? 'W'  : 'w')\n                    :is_delimiter    ? 'd'\n                    :(c == '\\n')     ? 'n'\n                    :(c == '\"')      ? '\"'\n                    :(c == '\\\\')     ? '\\\\'\n                    :(c == EOF)      ? 'E'\n                    :(c == '#')      ? '#'\n                                     : 'r'\n            };\n}\n\n//fills fn with a list of strings.\n//returns the count of elements. Negate the count if we're at EOF.\n//fn must already be allocated via apop_data_alloc() [no args].\nstatic line_parse_t parse_a_line(FILE *infile, char *buffer, size_t *ptr, apop_data *fn, int const *field_ends, char const *delimiters){\n    int ct=0, thisflen=0, inqq=0, infield=0, mlen=5,\n            lastwhite=0, lastnonwhite=0; \n    if (field_ends) return parse_a_fixed_line(infile, fn, field_ends);\n    apop_char_info ci;\n    do {\n        ci = parse_next_char(buffer, ptr, infile, delimiters);\n        //comments are to end of line, so they're basically a newline.\n        if (ci.type=='#' && !inqq){\n            for(int c='x'; (c!='\\n' && c!=EOF); )\n                c = get_next(buffer, ptr, infile);\n            ci.type='n';\n        }\n\n        //The escape-type cases: \\\\ and \"\".\n        //If one applies, set the type to regular\n        if (ci.type=='\\\\'){\n            ci=parse_next_char(buffer, ptr, infile, delimiters);\n            if (ci.type!='E')\n                ci.type='r';\n        }\n        if ((inqq && ci.type !='\"') && ci.type !='E')\n            ci.type='r';\n        else if (ci.type=='\"') inqq = !inqq;\n\n        if (ci.type=='W' && lastwhite==1) \n            continue; //compress these.\n        lastwhite=(ci.type=='W');\n\n        if (!infield){\n            if (ci.type=='w') continue; //eat leading spaces.\n            if (ci.type=='r' || ci.type=='d'             //new field; if 'dnE', blank field. \n                   || (strchr(\"nE\", ci.type) && ct>0)){  //Blank fields only at end of lines that already have data; else all-blank line to ignore.\n                if (++ct > fn->textsize[0]) apop_text_alloc(fn, ct, 1);//realloc text portion.\n                Textrealloc(*fn->text[ct-1], 5);\n                thisflen = 0;\n                mlen=5;\n                infield=1;\n            } \n        } \n        if (infield){\n            if (ci.type=='d'||ci.type=='n' || ci.type=='E' || ci.type=='W'){\n                //delimiter; close off this field.\n                fn->text[ct-1][0][lastnonwhite] = '\\0';\n                infield =\n                thisflen =\n                lastnonwhite = 0;\n            } else if (ci.type=='w' || ci.type=='r'){ //extend field\n                thisflen++; //length of string\n                if (thisflen+2 > mlen){\n                    mlen *=2; //length of allocated memory\n                    Textrealloc(*fn->text[ct-1], mlen);\n                }\n                fn->text[ct-1][0][thisflen-1] = ci.c;\n                if (ci.type!='w')\n                    lastnonwhite = thisflen;\n            }\n        }\n    } while (ci.type != 'n' && ci.type != 'E');\n    return (line_parse_t) {.ct=ct, .eof= (ci.type == 'E')};\n}\n\n//On return, fn has copies of the field names, and add_this_line has the first data line.\nstatic void get_field_names(int has_col_names, char **field_names, FILE *infile, char *buffer, size_t *ptr,\n                                apop_data *add_this_line, apop_data *fn, int const *field_ends, char const *delimiters){\n    if (has_col_names && field_names == NULL){\n        while (fn->textsize[0] ==0) parse_a_line(infile, buffer, ptr, fn, field_ends, delimiters);\n        while (add_this_line->textsize[0] ==0) parse_a_line(infile, buffer, ptr, add_this_line, field_ends, delimiters);\n    } else{\n        while (add_this_line->textsize[0] ==0) \n            parse_a_line(infile, buffer, ptr, add_this_line, field_ends, delimiters);\n        fn\t= apop_text_alloc(fn, add_this_line->textsize[0], 1);\n        for (int i=0; i< fn->textsize[0]; i++)\n            if (field_names) apop_text_set(fn, i, 0, field_names[i]);\n            else             apop_text_set(fn, i, 0, \"col_%i\", i);\n    }\n}\n\n/** Read a delimited or fixed-wisdth text file into the matrix element of an \\ref apop_data set.\n\nSee \\ref text_format.\n\nSee also \\ref apop_text_to_db, which handles text data, and may othewise be a perferable approach to data management.\n\n\\param text_file  = \"-\"  The name of the text file to be read in. If \"-\" (the default), use stdin.\n\\param has_row_names Does the lines of data have row names? \\c 'y' =yes; \\c 'n' =no (default: 'n')\n\\param has_col_names  Is the top line a list of column names? See \\ref text_format for notes on dimension (default: 'y')\n\\param field_ends If fields have a fixed size, give the end of each field, e.g. <tt>.field_ends=(int[]){3, 8 11}</tt>. (default: \\c NULL, indicating not fixed width)\n\\param delimiters A string listing the characters that delimit fields. (default: <tt>\"|,\\t\"</tt>)\n\\return \tReturns an apop_data set.\n\\exception out->error=='a' allocation error\n\\exception out->error=='t' text-reading error\n\n<b>example:</b> See \\ref apop_ols.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data * apop_text_to_data(char const*text_file, int has_row_names, int has_col_names, int const *field_ends, char const *delimiters){\n    char const *apop_varad_var(text_file, \"-\")\n    int apop_varad_var(has_row_names, 'n')\n    int apop_varad_var(has_col_names, 'y')\n    if (has_row_names==1||has_row_names=='Y') has_row_names ='y';\n    if (has_col_names==1||has_col_names=='Y') has_col_names ='y';\n    int const * apop_varad_var(field_ends, NULL);\n    const char * apop_varad_var(delimiters, apop_opts.input_delimiters);\nAPOP_VAR_ENDHEAD\n    apop_data *set = NULL;\n    FILE *infile = NULL;\n    char *str;\n    char buffer[bs];\n    size_t ptr=bs;\n    apop_data *add_this_line= apop_data_alloc();\n    int row = 0,\n        hasrows = (has_row_names == 'y');\n    Apop_stopif(prep_text_reading(text_file, &infile), apop_return_data_error(t),\n            0, \"trouble opening %s\", text_file);\n\n    line_parse_t L={ };\n    //First, handle the top line, if we're told that it has column names.\n    if (has_col_names=='y'){\n        apop_data *field_names = apop_data_alloc();\n        get_field_names(1, NULL, infile, buffer, &ptr, add_this_line, field_names, field_ends, delimiters);\n        L.ct = *add_this_line->textsize;\n        set = apop_data_alloc(0,1, L.ct - hasrows);\n\t    set->names->colct = 0;\n\t    set->names->col = malloc(sizeof(char*));\n        for (int j=0; j< L.ct - hasrows; j++)\n            apop_name_add(set->names, *field_names->text[j], 'c');\n        apop_data_free(field_names);\n    } \n\n    //Now do the body.\n\twhile(!set || !L.eof || L.ct){\n        if (!L.ct) { //skip blank lines\n            L=parse_a_line(infile,buffer, &ptr,  add_this_line, field_ends, delimiters);\n            continue;\n        }\n        if (!set) set = apop_data_alloc(0, 1, L.ct-hasrows); //for .has_col_names=='n'.\n        row++;\n        int cols = set->matrix  ? set->matrix->size2 : L.ct - hasrows;\n        set->matrix = apop_matrix_realloc(set->matrix, row, cols);\n        Apop_stopif(!set->matrix, set->error='a'; return set, 0, \"allocation error.\");\n        if (hasrows) {\n            apop_name_add(set->names, *add_this_line->text[0], 'r');\n            Apop_stopif(L.ct-1 > set->matrix->size2, set->error='t'; return set, 1,\n                 \"row %i (not counting rownames) has %i elements (not counting the rowname), \"\n                 \"but I thought this was a data set with %zu elements per row. \"\n                 \"Stopping the file read; returning what I have so far.\", row, L.ct-1, set->matrix->size2);\n        } else Apop_stopif(L.ct > set->matrix->size2, set->error='t'; return set, 1,\n                 \"row %i has %i elements, \"\n                 \"but I thought this was a data set with %zu elements per row. \"\n                 \"Stopping the file read; returning what I have so far. Set has_row_names?\", row, L.ct, set->matrix->size2);\n        for (int col=hasrows; col < L.ct; col++){\n            char *thisstr = *add_this_line->text[col];\n            if (strlen(thisstr)){\n                double val = strtod(thisstr, &str);\n                if (thisstr != str)\n                    gsl_matrix_set(set->matrix, row-1, col-hasrows, val);\n                else {\n                    gsl_matrix_set(set->matrix, row-1, col-hasrows, GSL_NAN);\n                    Apop_notify(1, \"trouble converting data item %i on data line %i [%s]; writing NaN.\", col, row, thisstr);\n                }\n            } else gsl_matrix_set(set->matrix, row-1, col-hasrows, GSL_NAN);\n        }\n        if (L.eof) break;//hit when the last line has elements and is terminated by EOF.\n        L=parse_a_line(infile, buffer, &ptr, add_this_line, field_ends, delimiters);\n\t}\n    apop_data_free(add_this_line);\n    if (strcmp(text_file,\"-\")) fclose(infile);\n\treturn set;\n}\n\n/** This is the complement to \\ref apop_data_pack, qv. It writes the \\c gsl_vector\n    produced by that function back to the \\ref apop_data set you provide. It overwrites\n    the data in the vector and matrix elements and, if present, the \\c weights (and\n    that's it, so names or text are as before).\n\n\\param in A \\c gsl_vector of the form produced by \\ref apop_data_pack. No default; must not be \\c NULL.\n\\param d  That data set to be filled. Must be allocated to the correct size. No default; must not be \\c NULL.\n\\param use_info_pages Pages in XML-style brackets, such as <tt>\\<Covariance\\></tt> will\nbe ignored unless you set <tt>.use_info_pages='y'</tt>. Be sure that this is set to the\nsame thing when you both pack and unpack. (Default: \\c 'n').\n\n\\li If I get to the end of the first page of the \\c apop_data set and have more\n    entries in the vector to unpack, and the data to fill has a \\c more element,\n    then I will continue into subsequent pages.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD void apop_data_unpack(const gsl_vector *in, apop_data *d, char use_info_pages){\n    const gsl_vector * apop_varad_var(in, NULL);\n    apop_data* apop_varad_var(d, NULL);\n    Apop_stopif(!d, return, 0, \"the data set to be filled, d, must not be NULL\");\n    char apop_varad_var(use_info_pages, 'n');\nAPOP_VAR_ENDHEAD\n    int offset = 0;\n    gsl_vector vin, vout;\n    if(d->vector){\n        vin = gsl_vector_subvector((gsl_vector *)in, 0, d->vector->size).vector;\n        gsl_vector_memcpy(d->vector, &vin);\n        offset += d->vector->size;\n    }\n    if(d->matrix)\n        for (size_t i=0; i< d->matrix->size1; i++){\n            vin = gsl_vector_subvector((gsl_vector *)in, offset, d->matrix->size2).vector;\n            vout = gsl_matrix_row(d->matrix, i).vector;\n            gsl_vector_memcpy(&vout, &vin);\n            offset += d->matrix->size2;\n        }\n    if(d->weights){\n        vin = gsl_vector_subvector((gsl_vector *)in, offset, d->weights->size).vector;\n        gsl_vector_memcpy(d->weights, &vin);\n        offset += d->weights->size;\n    }\n    if (offset != in->size && d->more){\n        vin = gsl_vector_subvector((gsl_vector *)in, offset, in->size - offset).vector;\n        d = d->more;\n        if (use_info_pages=='n')\n            while (d && apop_regex(d->names->title, \"^<.*>$\"))\n                d = d->more;\n        Apop_stopif(!d, return, 0, \"The data set (without info pages, because you didn't ask\"\n                \" me to use them) is too short for the input vector.\");\n        apop_data_unpack(&vin, d);\n    }\n}\n\nstatic size_t sizecount(const apop_data *in, bool all_pp, bool use_info_pp){ \n    if (!in) return 0;\n    if (!use_info_pp && in->names && apop_regex(in->names->title, \"^<.*>$\"))\n        return (all_pp ? sizecount(in->more, all_pp, use_info_pp) : 0);\n    return (in->vector ? in->vector->size : 0)\n             + (in->matrix ? in->matrix->size1 * in->matrix->size2 : 0)\n             + (in->weights ? in->weights->size : 0)\n             + (all_pp ? sizecount(in->more, all_pp, use_info_pp) : 0);\n}\n\n/** This function takes in an \\ref apop_data set and writes it as a single column of\nnumbers, outputting a \\c gsl_vector.\n It is valid to use the \\c out_vector->data element as an array of \\c doubles of size\n \\c out_vector->data->size (i.e. its <tt>stride==1</tt>).\n\n The complement is \\c apop_data_unpack. I.e., \n\\code\napop_data_unpack(apop_data_pack(in_data), data_copy) \n\\endcode\nwill return the original data set (stripped of text and names).\n\n \\param in an \\c apop_data set. No default; if \\c NULL, return \\c NULL.\n \\param out If this is not \\c NULL, then put the output here. The dimensions must match exactly. If \\c NULL, then allocate a new data set. Default = \\c NULL. \n  \\param more_pages If \\c 'y', then follow the <tt> ->more</tt> pointer to fill subsequent\npages; else fill only the first page. Informational pages will still be ignored, unless you set <tt>.use_info_pages='y'</tt> as well.  Default = \\c 'y'. \n\\param use_info_pages Pages in XML-style brackets, such as <tt>\\<Covariance\\></tt> will\nbe ignored unless you set <tt>.use_info_pages='y'</tt>. Be sure that this is set to the\nsame thing when you both pack and unpack. Default: <tt>'n'</tt>.\n\n \\return A \\c gsl_vector with the vector data (if any), then each row of data (if any), then the weights (if any), then the same for subsequent pages (if any <tt>&& .more_pages=='y'</tt>). If \\c out is not \\c NULL, then this is \\c out.\n\\exception NULL If you give me a vector as input, and its size is not correct, returns \\c NULL.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD gsl_vector * apop_data_pack(const apop_data *in, gsl_vector *out, char more_pages, char use_info_pages){\n    const apop_data * apop_varad_var(in, NULL);\n    if (!in) return NULL;\n    gsl_vector * apop_varad_var(out, NULL);\n    char apop_varad_var(more_pages, 'y');\n    char apop_varad_var(use_info_pages, 'n');\n    if (out) {\n        size_t total_size = sizecount(in, (more_pages == 'y' || more_pages == 'Y'), (use_info_pages =='y' || use_info_pages =='Y'));\n        Apop_stopif(out->size != total_size, return NULL, 0, \"The input data set has %zu elements, \"\n               \"but the output vector you want to fill has size %zu. Please make \"\n               \"these sizes equal.\", total_size, out->size);\n    }\nAPOP_VAR_ENDHEAD\n    size_t total_size = sizecount(in, (more_pages == 'y' || more_pages == 'Y'), (use_info_pages =='y' || use_info_pages =='Y'));\n    if (!total_size) return NULL;\n    int offset = 0;\n    if (!out) out = gsl_vector_alloc(total_size);\n    gsl_vector vout, vin;\n    if (in->vector){\n        vout     = gsl_vector_subvector((gsl_vector *)out, 0, in->vector->size).vector;\n        gsl_vector_memcpy(&vout, in->vector);\n        offset  += in->vector->size;\n    }\n    if (in->matrix)\n        for (size_t i=0; i< in->matrix->size1; i++){\n            vin = gsl_matrix_row(in->matrix, i).vector;\n            vout= gsl_vector_subvector((gsl_vector *)out, offset, in->matrix->size2).vector;\n            gsl_vector_memcpy(&vout, &vin);\n            offset  += in->matrix->size2;\n        }\n    if (in->weights){\n        vout     = gsl_vector_subvector((gsl_vector *)out, offset, in->weights->size).vector;\n        gsl_vector_memcpy(&vout, in->weights);\n        offset  += in->weights->size;\n    }\n    if ((more_pages == 'y' ||more_pages =='Y') && in->more){\n        while (use_info_pages=='n' && in->more && apop_regex(in->more->names->title, \"^<.*>$\"))\n            in = in->more;\n        if (in->more){\n            vout = gsl_vector_subvector((gsl_vector *)out, offset, out->size - offset).vector;\n            apop_data_pack(in->more, &vout, more_pages, use_info_pages);\n        }\n    }\n    return out;\n}\n\n/** \\def apop_data_falloc\nAllocate a data set and fill it with values.  Put the data set dimensions (one, two,\nor three dimensions as per \\ref apop_data_alloc) in parens, then the data (as per \\ref\napop_data_fill). E.g.:\n\\code\napop_data *identity2 = apop_data_falloc((2,2),\n                         1, 0,\n                         0, 1);\n\napop_data *count_vector = apop_data_falloc((5), 0, 1, 2, 3, 4);\n\\endcode\n\nIf you forget the parens, you will get an obscure error during compilation.\n\n\\li This is a simple macro wrapping \\ref apop_data_fill and \\ref apop_data_alloc,\nbecause they appear together so often.  The second example expands to:\n\\code\napop_data *count_vector = apop_data_fill(apop_data_alloc(5), 0, 1, 2, 3, 4);\n\\endcode\n*/\n\n/** \\def apop_data_fill\nFill a pre-allocated data set with values.\n\n\\param adfin  An \\c apop_data set (that you have already allocated).\n\\param ...  A series of at least as many floating-point values as there are blanks in the data set.\n\\return     A pointer to the same data set that was input.\n\n\\li I need as many arguments as the size of the data set, and can't count them for\nyou. Too many will be ignored; too few will produce unpredictable results, which may\ninclude padding your matrix with garbage or a simple segfault.\n\n\\li Underlying this function is a base function that takes a single list, as opposed\nto the set of unassociated numbers sent to \\ref apop_data_fill. See the example below for a comparison.\n\n\\li This function assumes that if the \\ref apop_data set has both \\c vector and \\c\nmatrix, then <tt>vector->size==matrix->size1</tt>.\n\n\\li See also \\ref apop_data_falloc to allocate and fill on one line. E.g., to\ngenerate a unit vector for three dimensions:\n\\code\napop_data *unit_vector = apop_data_falloc((3), 1, 1, 1);\n\\endcode\n\nAn example, using both a loose list of numbers and an array.\n\n\\include data_fill.c\n\n\\see apop_text_fill, apop_data_falloc, apop_data_unpack\n*/\n\napop_data *apop_data_fill_base(apop_data *in, double ap[]){\n/* In conversions.h, you'll find this header, which turns all but the first input into an array of doubles of indeterminate length:\n#define apop_data_fill(in, ...) apop_data_fill_base((in), (double []) {__VA_ARGS__})\n*/\n    if (!in) return NULL;\n    int k=0, start=0, fin=0, height=0;\n    if (in->vector){\n        start   = -1;\n        height  = in->vector->size;\n    }\n    if (in->matrix){\n        fin   = in->matrix->size2;\n        height  = in->matrix->size1;\n    }\n    for (int i=0; i< height; i++)\n        for (int j=start; j< fin; j++)\n            apop_data_set(in, i, j, ap[k++]);\n    return in;\n}\n\n/** \\def apop_vector_fill\n Fill a pre-allocated \\c gsl_vector with values.\n\n  See \\ref apop_data_alloc for a relevant example. See also \\ref apop_matrix_alloc.\n\nWarning: I need as many arguments as the size of the vector, and can't count them for you. Too many will be ignored; too few will produce unpredictable results, which may include padding your vector with garbage or a simple segfault.\n\n\n\\param avfin   A \\c gsl_vector (that you have already allocated).\n\\param ...     A series of exactly as many values as there are spaces in the vector.\n\\return        A pointer to the same vector that was input.\n*/\ngsl_vector *apop_vector_fill_base(gsl_vector *in, double ap[]){\n    if (!in) return NULL;\n    for (int i=0; i< in->size; i++)\n        gsl_vector_set(in, i, ap[i]);\n    return in;\n}\n\n/** \\def apop_text_fill(in, ap)\nFill the text part of an already-allocated \\ref apop_data set with a list of strings. \n\n\\param dataset A data set that you already prepared with \\ref apop_text_alloc.\n\\param ... A list of strings. The first row is filled first, then the second, and so on to the end of the text grid.\n\n\\li If an element is \\c NULL, write <tt>apop_opts.nan_string</tt> at that point. You may prefer to use <tt>\"\"</tt> to express a blank.\n\\li If you provide more or fewer strings than are needed to fill the text grid and\n     <tt>apop_opts.verbose >=1</tt>, I print a warning and continue to \n     the end of the text grid or data set, whichever is shorter.\n\\li If the data set is \\c NULL, I return \\c NULL. If you provide a \\c NULL data set\n    but a non-NULL list of text elements, and <tt>apop_opts.verbose >=1</tt>, I print\n    a warning and return \\c NULL.\n\\li Remember that the C preprocessor concatenates two adjacent strings into one. Here\n    is an attempt to fill a \\f$ 2\\times 3\\f$ grid:\n\\code\n  apop_data *one23 = apop_text_fill(apop_text_alloc(NULL, 2, 3),\n                                     \"one\", \"two\", \"three\"   //missing comma!\n                                     \"two\", \"four\", \"six\");\n\\endcode\nThe preprocessor will join <tt>\"three\" \"two\"</tt> to form <tt>\"threetwo\"</tt>, leaving you with only five strings.\n\n\\li If you have a \\c NULL-delimited array of strings (not just a loose list as above),\nthen use \\c apop_text_fill_base. \n*/\napop_data *apop_text_fill_base(apop_data *data, char* text[]){\n    int textct = 0;\n    for (char **textptr = text; *textptr; textptr++) textct++;\n    Apop_stopif(!data && textct, return NULL, 1, \"NULL data set input; returning NULL.\");\n    if (!data) return NULL;\n    int gridsize = data ? data->textsize[0]*data->textsize[1] : 0;\n    Apop_stopif(textct != gridsize, /*continue*/, 1, \"Data set has a text grid \"\n            \"of size %i but you gave me %i strings.\", gridsize, textct);\n\n    int ctr=0;\n    for (int i=0; i< data->textsize[0]; i++)\n        for (int j=0; j< data->textsize[1]; j++)\n            apop_text_set(data, i, j, text[ctr++]);\n    return data;\n}\n\n\n///////The rest of this file is for apop_text_to_db\nextern sqlite3 *db;\n\nstatic char *get_field_conditions(char *var, apop_data *field_params){\n    if (field_params)\n        for (int i=0; i<field_params->textsize[0]; i++)\n            if (apop_regex(var, field_params->text[i][0]))\n                return field_params->text[i][1];\n    return (apop_opts.db_engine == 'm') ? \"varchar(100)\" : \"numeric\";\n}\n\nstatic int tab_create_mysql(char *tabname, int has_row_names, apop_data *field_params, char *table_params, apop_data const *fn){\n    char *q = NULL;\n    Asprintf(&q, \"create table %s\", tabname);\n    for (int i=0; i < *fn->textsize; i++){\n        if (i==0)\n             xprintf(&q, has_row_names ? \"%s (row_names varchar(100), \" : \"%s (\", q);\n        else xprintf(&q, \"%s %s, \", q, get_field_conditions(*fn->text[i-1], field_params));\n        xprintf(&q, \"%s %s\", q, *fn->text[i]);\n    }\n    xprintf(&q, \"%s %s%s%s)\", q, get_field_conditions(*fn->text[fn->textsize[0]-1], field_params)\n                                , table_params? \", \": \"\", XN(table_params));\n    apop_query(\"%s\", q);\n    Apop_stopif(!apop_table_exists(tabname), return -1, 0, \"query \\\"%s\\\" failed.\", q);\n    free(q);\n    return 0;\n}\n\nstatic int tab_create_sqlite(char *tabname, int has_row_names, apop_data *field_params, char *table_params, apop_data const *fn){\n    char  *q = NULL;\n    Asprintf(&q, \"create table %s\", tabname);\n    for (int i=0; i<fn->textsize[0]; i++){\n        if (i==0){\n            if (has_row_names) xprintf(&q, \"%s ('row_names', \", q);\n            else               xprintf(&q, \"%s (\", q);\n        } else xprintf(&q, \"%s' %s, \", q, get_field_conditions(*fn->text[i-1], field_params));\n        xprintf(&q, \"%s '%s\", q, *fn->text[i]);\n    }\n    xprintf(&q, \"%s' %s%s%s);\", q, get_field_conditions(*fn->text[fn->textsize[0]-1], field_params)\n                                , table_params? \", \": \"\", XN(table_params));\n    apop_query(\"%s\", q);\n    Apop_stopif(!apop_table_exists(tabname), return -1, 0, \"query \\\"%s\\\" failed.\", q);\n    free(q);\n    return 0;\n}\n\n/**\n--If the string has zero length, then it's probably a missing value.\n --If the string isn't a number, it needs quotes\n */\nchar *prep_string_for_sqlite(int prepped_statements, char const *astring){\n    if (!astring || astring[0]=='\\0' || \n            (apop_opts.nan_string && !strcasecmp(apop_opts.nan_string, astring)))\n        return NULL;\n\n    char *out  = NULL,\n\t\t *tail = NULL;\n\tif(strtod(astring, &tail)) \n        /*do nothing.*/;\n    if (*tail!='\\0'){\t//then it's not a number.\n        if (!prepped_statements){\n            if (strchr(astring, '\\''))\n                Asprintf(&out,\"\\\"%s\\\"\", astring);\n            else\n                Asprintf(&out,\"'%s'\", astring);\n        } else  out = strdup(astring);\n\t} else {\t    //number, maybe INF or NAN. Also, sqlite wants 0.1, not .1\n\t\tassert(*astring!='\\0');\n        if (isinf(atof(astring))==1)\n\t\t\tout = strdup(\"9e9999999\");\n        else if (isinf(atof(astring))==-1)\n\t\t\tout = strdup(\"-9e9999999\");\n        else if (gsl_isnan(atof(astring)))\n\t\t\tout = strdup(\"0.0/0.0\");\n        else if (astring[0]=='.')\n\t\t\tAsprintf(&out, \"0%s\",astring);\n\t\telse out = strdup(astring);\n\t}\n    return out;\n}\n\nstatic void line_to_insert(line_parse_t L, apop_data const*addme, char const *tabname, \n                             sqlite3_stmt *p_stmt, int row){\n    if (!L.ct) return;\n    int field = 1;\n    char comma = ' ';\n    char *q = NULL;\n    if (!p_stmt) Asprintf(&q, \"INSERT INTO %s VALUES (\", tabname);\n    for (int col=0; col < L.ct; col++){\n        char *prepped = prep_string_for_sqlite(!!p_stmt, *addme->text[col]);\n        if (p_stmt){\n            if (!prepped || !strlen(prepped))\n                field++; //leave NULL and cleared\n            else \n               Apop_stopif(sqlite3_bind_text(p_stmt, field++, prepped, -1, SQLITE_TRANSIENT)!=SQLITE_OK,\n                /*keep going */, 0, \"Something wrong on line %i, field %i [%s].\\n\"\n                                            , row, field-1, *addme->text[col]);\n        } else {\n            xprintf(&q, \"%s%c %s\", q, comma,  (prepped && strlen(prepped) ? prepped : \" NULL\"));\n            comma = ',';\n        }\n        free(prepped);\n    }\n    if (!p_stmt){\n        apop_query(\"%s)\",q); \n        free (q);\n    }\n}\n\nint apop_use_sqlite_prepared_statements(size_t col_ct){\n    #if SQLITE_VERSION_NUMBER < 3003009\n        return 0;\n    #else\n        return (sqlite3_libversion_number() >=3003009\n                    && !(apop_opts.db_engine == 'm')\n                    &&  col_ct <= 999); //Arbitrary SQLite limit on blanks in prepared statements.\n    #endif\n}\n\nint apop_prepare_prepared_statements(char const *tabname, size_t col_ct, sqlite3_stmt **statement){\n    #if SQLITE_VERSION_NUMBER < 3003009\n        Apop_stopif(1, return -1, 0, \"Attempting to prepapre prepared statements, but using a version of SQLite that doesn't support them.\");\n    #else\n        char *q=NULL;\n        Asprintf(&q, \"INSERT INTO %s VALUES (\", tabname);\n        for (size_t i = 0; i < col_ct; i++)\n            xprintf(&q, \"%s?%c\", q, i==col_ct-1 ? ')' : ',');\n        Apop_stopif(!db, return -1, 0, \"The database should be open by now but isn't.\");\n        Apop_stopif(sqlite3_prepare_v2(db, q, -1, statement, NULL) != SQLITE_OK, \n                    return -1, apop_errorlevel, \"Failure preparing prepared statement: %s\", sqlite3_errmsg(db));\n        free(q);\n        return 0;\n    #endif\n}\n\nstatic char *cut_at_dot(char const *infile){\n    char *incopy = strdup(infile); //basename reserves the right to modify its input.\n    char *out = strdup(basename(incopy));\n    free(incopy);\n    char *dot = strchr(out, '.');\n    if (dot) *dot='\\0';\n    return out;\n}\n\n/** Read a delimited or fixed-width text file into a database table.\n  See \\ref text_format. \n\nFor purely numeric data, you may be able to bypass the database by using \\ref apop_text_to_data.\n\nSee the \\ref apop_ols page for an example that uses this function to read in sample data (also listed on that page).\n\nApophenia ships with an \\c apop_text_to_db command-line utility, which is a wrapper for this function.\n\nEspecially if you are using a pre-2007 version of SQLite, there may be a speedup to putting this function in a begin/commit wrapper:\n\\code\napop_query(\"begin;\");\napop_data_print(dataset, .output_name=\"dbtab\", .output_type='d');\napop_query(\"commit;\");\n\\endcode\n\n\\param text_file    The name of the text file to be read in. If \\c \"-\", then read from \\c STDIN. (default: \"-\")\n\\param tabname      The name to give the table in the database\n    (default: \\c text_file after the last slash and up to the next dot. E.g.,\n    <tt>text_file==\"../data/pant_lengths.csv\"</tt> gives <tt>tabname==\"pant_lengths\"</tt>)\n\\param has_row_names Does the lines of data have row names? (default: 0)\n\\param has_col_names Is the top line a list of column names? (default: 1)\n\\param field_names The list of field names, which will be the columns for the table. If <tt>has_col_names==1</tt>, read the names from the file (and just set this to <tt>NULL</tt>). If has_col_names == 1 && field_names !=NULL, I'll use the field names.  (default: NULL)\n\\param field_ends If fields have a fixed size, give the end of each field, e.g.  <tt>.field_ends=(int[]){3, 8 11}</tt>. (default: \\c NULL, indicating not fixed width)\n\\param field_params There is an implicit <tt>create table</tt> in setting up the database. If you want to add a type, constraint, or key, put that here. The relevant part of the input \\ref apop_data set is the \\c text grid, which should be \\f$N \\times 2\\f$. The first item in each row (<tt>your_params->text[n][0]</tt>, for each \\f$n\\f$) is a regular expression to match against the variable names; the second item (<tt>your_params->text[n][1]</tt>) is the type, constraint, and/or key (i.e., what comes after the name in the \\c create query). Not all variables need be mentioned; the default type if nothing matches is <tt>numeric</tt>. I go in order until I find a regex that matches the given field, so if you don't like the default, then set the last row to have name <tt>.*</tt>, which is a regex guaranteed to match anything that wasn't matched by an earlier row, and then set the associated type to your preferred default. See \\ref apop_regex on details of matching. (default: NULL)\n\\param table_params There is an implicit <tt>create table</tt> in setting up the database. If you want to add a table constraint or key, such as <tt>not null primary key (age, sex)</tt>, put that here.\n\\param delimiters A string listing the characters that delimit fields. default = <tt>\"|,\\t\"</tt>\n\\param if_table_exists What should I do if the table exists?<br>\n\\c 'n' Do nothing; exit this function. (default)<br>\n\\c 'd' Retain the table but delete all data; refill with the new data (i.e., call <tt>\"delete * from your_table\"</tt>).<br>\n\\c 'o' Overwrite the table from scratch; deleting the previous table entirely.<br>\n\\c 'a' Append new data to the existing table.\n\n\\return Returns the number of rows on success, -1 on error.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD int apop_text_to_db(char const *text_file, char *tabname, int has_row_names, int has_col_names, char **field_names, int const *field_ends, apop_data *field_params, char *table_params, char const *delimiters, char if_table_exists){\n    char const *apop_varad_var(text_file, \"-\")\n    char *apop_varad_var(tabname, cut_at_dot(text_file))\n    int apop_varad_var(has_row_names, 'n')\n    int apop_varad_var(has_col_names, 'y')\n    if (has_row_names==1||has_row_names=='Y') has_row_names ='y';\n    if (has_col_names==1||has_col_names=='Y') has_col_names ='y';\n    int const *apop_varad_var(field_ends, NULL)\n    char ** apop_varad_var(field_names, NULL)\n    apop_data * apop_varad_var(field_params, NULL)\n    char * apop_varad_var(table_params, NULL)\n    const char * apop_varad_var(delimiters, apop_opts.input_delimiters);\n    char apop_varad_var(if_table_exists, 'n')\nAPOP_VAR_ENDHEAD\n    int  batch_size  = 10000,\n      \t col_ct, ct = 0, rows = 1;\n    FILE *infile;\n    char buffer[bs];\n    size_t ptr = bs;\n    apop_data *add_this_line = apop_data_alloc();\n    sqlite3_stmt *statement = NULL;\n    line_parse_t L = {1,0};\n        \n    bool tab_exists = apop_table_exists(tabname);\n    if (tab_exists){\n        Apop_stopif(if_table_exists=='n', return -1, 0, \"table %s exists; not recreating it.\", tabname);\n        if (if_table_exists=='d')      \n            apop_query(\"delete from %s\", tabname);\n        else if (if_table_exists=='o') {\n            apop_query(\"drop table %s\", tabname); \n            tab_exists=false;\n        }\n    }\n\n    //get names and the first row.\n    if (prep_text_reading(text_file, &infile)) return -1;\n    apop_data *fn = apop_data_alloc();\n    get_field_names(has_col_names=='y', field_names, infile, buffer, &ptr,\n                                    add_this_line, fn, field_ends, delimiters);\n    col_ct = L.ct = *add_this_line->textsize;\n    Apop_stopif(!col_ct, return -1, 0, \"counted zero columns in the input file (%s).\", tabname);\n    if (!tab_exists)\n        Apop_stopif( ((apop_opts.db_engine=='m') ? tab_create_mysql : tab_create_sqlite)(tabname, has_row_names=='y', field_params, table_params, fn),\n            return -1, 0, \"Creating the table in the database failed.\");\n#if SQLITE_VERSION_NUMBER < 3003009\n    Apop_notify(1, \"Apophenia was compiled using a version of SQLite from mid-2007 or earlier. \"\n                    \"The code for reading in text files using such an old version is no longer supported, \"\n                    \"so if errors crop up please see about installing a more recent version of SQLite's library.\");\n#endif\n    int use_sqlite_prepared_statements = apop_use_sqlite_prepared_statements(col_ct);\n    if (use_sqlite_prepared_statements)\n        Apop_stopif(apop_prepare_prepared_statements(tabname, col_ct, &statement), \n                return -1, 0, \"Trouble preparing the prepared statement for SQLite.\");\n    //done with table & query setup.\n    //convert a data line into SQL: insert into TAB values (0.3, 7, \"et cetera\");\n\twhile(L.ct && !L.eof){\n        line_to_insert(L, add_this_line, tabname, statement, rows);\n        if (apop_opts.verbose > 1 && !(ct++ % batch_size)) \n            {fprintf(stderr, \".\"); fflush(NULL);}\n        if (use_sqlite_prepared_statements){\n            int err = sqlite3_step(statement);\n            if (err!=0 && err != 101) //0=ok, 101=done\n                Apop_notify(0, \"sqlite insert query gave error code %i.\\n\", err);\n            Apop_assert_c(!sqlite3_reset(statement), -1, apop_errorlevel, \"SQLite error.\");\n#if SQLITE_VERSION_NUMBER >= 3003009\n            Apop_assert_c(!sqlite3_clear_bindings(statement), -1, apop_errorlevel, \"SQLite error.\"); //needed for NULLs\n#endif\n        }\n        do {\n            L = parse_a_line(infile, buffer, &ptr, add_this_line, field_ends, delimiters);\n            rows ++;\n        } while (!L.ct && !L.eof); //skip blank lines\n\t}\n    apop_data_free(add_this_line);\n#if SQLITE_VERSION_NUMBER >= 3003009\n\tif (use_sqlite_prepared_statements){\n        Apop_assert_c(sqlite3_finalize(statement) ==SQLITE_OK, -1, apop_errorlevel, \"SQLite error.\");\n    }\n#endif\n    if (strcmp(text_file,\"-\")) fclose(infile);\n\treturn rows;\n}\n"
  },
  {
    "path": "apop_data.m4.c",
    "content": "/** \\file \nThe apop_data structure joins together a gsl_matrix, apop_name, and a table of strings. */\n/* Copyright (c) 2006--2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n//apop_gsl_error is in apop_linear_algebra.c\n#define Set_gsl_handler gsl_error_handler_t *prior_handler = gsl_set_error_handler(apop_gsl_error);\n#define Unset_gsl_handler gsl_set_error_handler(prior_handler);\n\n/** Allocate an \\ref apop_data structure.\n \n\\li The typical case is  three arguments, like <tt>apop_data_alloc(2,3,4)</tt>: vector size, matrix rows, matrix cols. If the first argument is zero, you get a \\c NULL vector.\n\\li Two arguments, <tt>apop_data_alloc(2,3)</tt>,  would allocate just a matrix, leaving the vector \\c NULL.\n\\li One argument, <tt>apop_data_alloc(2)</tt>,  would allocate just a vector, leaving the matrix \\c NULL.\n\\li Zero arguments, <tt>apop_data_alloc()</tt>,  will produce a basically blank set, with \\c out->matrix and \\c out->vector set to \\c NULL. \n\nFor allocating the text part, see \\ref apop_text_alloc.\n\nThe \\c weights vector is set to \\c NULL. If you need it, allocate it via\n\\code d->weights = gsl_vector_alloc(row_ct); \\endcode\n\n\\return The \\ref apop_data structure, allocated and ready to be populated with data.\n\\exception out->error=='a'  Allocation error. The matrix, vector, or names couldn't be <tt>malloc</tt>ed, which probably means that you requested a very large data set.\n\n\\li An \\ref apop_data struct, by itself, is about 72 bytes. If I can't allocate that much memory, I return \\c NULL.\n                But if even this much fails, your computer may be on fire and you should go put it out. \n\n\\li This function uses the \\ref designated syntax for inputs.\n\n\\see apop_data_calloc\n*/\nAPOP_VAR_HEAD apop_data * apop_data_alloc(const size_t size1, const size_t size2, const int size3){\n    const size_t apop_varad_var(size1, 0);\n    const size_t apop_varad_var(size2, 0);\n    const int apop_varad_var(size3, 0);\nAPOP_VAR_ENDHEAD\n    size_t vsize=0, msize1=0; \n    int msize2=0;\n    if (size3){\n        vsize = size1;\n        msize1 = size2;\n        msize2 = size3;\n    }\n    else if (size2) {\n        msize1 = size1;\n        msize2 = size2;\n    }\n    else vsize = size1;\n    apop_data *setme = malloc(sizeof(apop_data));\n    Apop_stopif(!setme, return NULL, -5, \"malloc failed. Probably out of memory.\");\n    *setme = (apop_data) { }; //init to zero/NULL.\n    Set_gsl_handler\n    if (msize2 > 0  && msize1 > 0){\n        setme->matrix = gsl_matrix_alloc(msize1,msize2);\n        Apop_stopif(!setme->matrix, setme->error='a'; return setme,\n                0, \"malloc failed on a %zu x %i matrix. Probably out of memory.\", msize1, msize2);\n    }\n    if (vsize){\n        setme->vector = gsl_vector_alloc(vsize);\n        Apop_stopif(!setme->vector, setme->error='a'; return setme,\n                0, \"malloc failed on a vector of size %zu. Probably out of memory.\", vsize);\n    }\n    Unset_gsl_handler\n    setme->names = apop_name_alloc();\n    Apop_stopif(!setme->names, setme->error='a'; return setme,\n                0, \"couldn't allocate names. Probably out of memory.\");\n    return setme;\n}\n\n/** Allocate a \\ref apop_data structure, to be filled with data; set everything in the allocated portion to zero. See \\ref apop_data_alloc for details.\n\n\\return    The \\ref apop_data structure, allocated and zeroed out.\n\\exception out->error=='a' allocation error; probably out of memory.\n\\li This function uses the \\ref designated syntax for inputs.\n\\see apop_data_alloc \n*/\nAPOP_VAR_HEAD apop_data * apop_data_calloc(const size_t size1, const size_t size2, const int size3){\n    const size_t apop_varad_var(size1, 0);\n    const size_t apop_varad_var(size2, 0);\n    const int apop_varad_var(size3, 0);\nAPOP_VAR_ENDHEAD\n    size_t vsize=0, msize1=0; \n    int msize2=0;\n    if (size3){\n        vsize = size1;\n        msize1 = size2;\n        msize2 = size3;\n    }\n    else if (size2) {\n        msize1 = size1;\n        msize2 = size2;\n    }\n    else vsize = size1;\n    apop_data *setme = malloc(sizeof(apop_data));\n    Apop_stopif(!setme, apop_return_data_error('a'), 0, \"malloc failed. Probably out of memory.\");\n    *setme = (apop_data) { }; //init to zero/NULL.\n    if (msize2 >0 && msize1 > 0){\n        setme->matrix = gsl_matrix_calloc(msize1,msize2);\n        Apop_stopif(!setme->matrix, apop_return_data_error('a'), 0, \"malloc failed on a %zu x %i matrix. Probably out of memory.\", msize1, msize2);\n    }\n    if (vsize){\n        setme->vector = gsl_vector_calloc(vsize);\n        Apop_stopif(!setme->vector, apop_return_data_error('a'), 0, \"malloc failed on a vector of size %zu. Probably out of memory.\", vsize);\n    }\n    setme->names = apop_name_alloc();\n    return setme;\n}\n\n/*For a touch of space saving, blank strings in a text grid \nall point to the same nul string. */\nchar *apop_nul_string = \"\";\n\nstatic void apop_text_blank(apop_data *in, const size_t row, const size_t col){\n    if (in->text[row][col] != apop_nul_string) free(in->text[row][col]);\n    in->text[row][col] = apop_nul_string;\n}\n\n/** Free a matrix of chars* (i.e., a char***).\nThis is what \\c apop_data_free uses internally to deallocate the \\c text element of\nan \\ref apop_data set. You may never need to use it directly.\n\nSample usage:\n\\code\napop_text_free(yourdata->text, yourdata->textsize[0], yourdata->textsize[1]);\n\\endcode\n*/\nvoid apop_text_free(char ***freeme, int rows, int cols){\n    if (rows && cols)\n        for (int i=0; i < rows; i++){\n            for (int j=0; j < cols; j++)\n                if(freeme[i][j]!=apop_nul_string) \n                    free(freeme[i][j]);\n            free(freeme[i]);\n        }\n    free(freeme);\n}\n\n/** Free the elements of the given \\ref apop_data set and then the \\ref apop_data set\n  itself. Intended to be used by \\ref apop_data_free, a macro that calls this to free\n  elements, then sets the value to \\c NULL.\n\n\\li \\ref apop_data_free is a macro that calls this function and, on success, sets the input pointer to \\c NULL. \nFor typical cases, that's slightly more useful than this function.\n\n\\exception freeme.error='c' Circular linking is against the rules. If <tt>freeme->more == freeme</tt>, then \nI set <tt>freeme.error='c'</tt> and return. If you send in a structure like A -> B ->\nB, then both data sets A and B will be marked.\n\n\\return \\c 0 on OK, \\c 'c' on error.\n*/\nchar apop_data_free_base(apop_data *freeme){\n    if (!freeme) return 0;\n    if (freeme->more){\n        Apop_stopif(freeme == freeme->more, freeme->error='c'; return 'c',\n                            1, \"the ->more element of this data set equals the data set itself. \"\n                               \"This is not healthy. Not freeing; marking your data set with error='c'.\");\n        if (apop_data_free_base(freeme->more)) \n            Apop_stopif(freeme->more->error == 'c', freeme->error='c'; return 'c', \n                                1, \"Propogating error code to parent data set\");\n    } \n    if (freeme->vector)  \n        gsl_vector_free(freeme->vector);\n    if (freeme->matrix)  \n        gsl_matrix_free(freeme->matrix); \n    if (freeme->weights)\n        gsl_vector_free(freeme->weights);\n    apop_name_free(freeme->names);\n    apop_text_free(freeme->text, freeme->textsize[0] , freeme->textsize[1]);\n    free(freeme);\n    return 0;\n}\n\n/** Copy one \\ref apop_data structure to another.\n\nThis function does not allocate the output structure or the vector, matrix, text,\nor weights elements---I assume you have already done this and got the dimensions\nright. I will assert that there is at least enough room in the destination for your\ndata, and fail if the copy would write more elements than there are bins.\n\n  \\li If you want space allocated or are unsure about dimensions, use \\ref apop_data_copy.\n  \\li If both \\c in and \\c out have a \\c more pointer, also copy subsequent page(s).\n  \\li You can use the subsetting macros, \\ref Apop_r, \\ref Apop_rs, \\ref Apop_c,\n      and so on, to copy within a data set:\n  \\li Copying a \\c NULL to a \\c NULL is valid but does nothing. Other attempt to write\n      to a \\c NULL fail with an error printed to stdout if <tt>apop_opts.verbose >= 1</tt>.\n\n\\code\n//Copy the contents of row i of mydata to row j.\napop_data *fromrow = Apop_r(mydata, i);\napop_data *torow = Apop_r(mydata, j);\napop_data_memcpy(torow, fromrow);\n\n// or just\napop_data_memcpy(Apop_r(mydata, i), Apop_r(mydata, j));\n\\endcode\n \n  \\param out   A structure that this function will fill. Must be preallocated with the appropriate sizes.\n  \\param in    The input data.\n\n\\exception out.error='d'  Dimension error.\n\\exception out.error='p'  Part missing; e.g., in->matrix exists but out->matrix doesn't.\n*/\nvoid apop_data_memcpy(apop_data *out, const apop_data *in){\n    if (!out && !in) return;\n    Apop_stopif(!out, return, 0, \"you are copying to a NULL matrix. Do you mean to use apop_data_copy instead?\");\n    Apop_stopif(out==in, return, 1, \"out==in. Doing nothing.\");\n    if (in->matrix){\n        Apop_stopif(!out->matrix, out->error='p'; return, 1, \"in->matrix exists but out->matrix does not.\");\n        Apop_stopif(in->matrix->size1 != out->matrix->size1 || in->matrix->size2 != out->matrix->size2, \n                out->error='d'; return,\n                1, \"you're trying to copy a (%zu X %zu) into a (%zu X %zu) matrix.\", \n                        in->matrix->size1, in->matrix->size2, out->matrix->size1, out->matrix->size2);\n        gsl_matrix_memcpy(out->matrix, in->matrix);\n    }\n    if (in->vector){\n        Apop_stopif(!out->vector, out->error='p'; return, 1, \"in->vector exists but out->vector does not.\");\n        Apop_stopif(in->vector->size != out->vector->size,\n                out->error='d'; return,\n                1, \"You're trying to copy a %zu-elmt \"\n                        \"vector into a %zu-elmt vector.\", in->vector->size, out->vector->size);\n        gsl_vector_memcpy(out->vector, in->vector);\n    }\n    if (in->weights){\n        Apop_stopif(!out->weights, out->error='p'; return, 1, \"in->weights exists but out->weights does not.\");\n        Apop_stopif(in->weights->size != out->weights->size,\n                    out->error='d'; return,\n                    1, \"Weight vector sizes don't match: \"\n                    \"you're trying to copy a %zu-elmt vector into a %zu-elmt vector.\", \n                                 in->weights->size, out->weights->size);\n        gsl_vector_memcpy(out->weights, in->weights);\n    }\n    if (in->names){\n        if (!out->names) out->names = apop_name_alloc();\n        Asprintf(&out->names->title, \"%s\", in->names->title);\n        if (out->names->vector && in->names->vector) {Asprintf(&out->names->vector, \"%s\", in->names->vector);}\n        for (int i=0; i< in->names->rowct; i++)\n            if (i< out->names->rowct) {Asprintf(out->names->row+i, \"%s\", in->names->row[i]);}\n            else  apop_name_add(out->names, in->names->row[i], 'r');\n        for (int i=0; i< in->names->colct; i++)\n            if (i< out->names->colct) {Asprintf(out->names->col+i, \"%s\", in->names->col[i]);}\n            else  apop_name_add(out->names, in->names->col[i], 'c');\n        for (int i=0; i< in->names->textct; i++)\n            if (i< out->names->textct) {Asprintf(out->names->text+i, \"%s\", in->names->text[i]);}\n            else  apop_name_add(out->names, in->names->text[i], 't');\n    }\n    out->textsize[0] = in->textsize[0]; \n    out->textsize[1] = in->textsize[1]; \n    if (in->textsize[0] && in->textsize[1]){\n        Apop_stopif(out->textsize[0] < in->textsize[0] || out->textsize[1] < in->textsize[1],\n                    out->error='d'; return,\n                    1, \"I am trying to copy a grid of (%zu, %zu) text elements into a grid of (%zu, %zu), \"\n                    \"and that won't work. Please use apop_text_alloc to reallocate the right amount of data, \"\n                    \"or use apop_data_copy for automatic allocation.\",\n                    in->textsize[0] , in->textsize[1] , out->textsize[0] , out->textsize[1]);\n        for (size_t i=0; i< in->textsize[0]; i++)\n            for(size_t j=0; j < in->textsize[1]; j ++)\n                if (in->text[i][j] == apop_nul_string)\n                     apop_text_blank(out, i, j);\n                else apop_text_set(out, i, j, \"%s\", in->text[i][j]);\n    }\n    if (in->more && out->more) apop_data_memcpy(out->more, in->more);\n}\n\n/** Copy one \\ref apop_data structure to another. That is, all data is duplicated.\n\nBasically a front-end for \\ref apop_data_memcpy for those who prefer this sort of syntax. \n\nIf the data set has a \\c more pointer, that will be followed and subsequent pages copied as well.\n \n  \\param in    the input data\n  \\return       a structure that this function will allocate and fill. If input is NULL, then this will be NULL.\n\n\\exception out.error='a'  Allocation error.\n\\exception out.error='c'  Cyclic link: <tt>D->more == D</tt> (may be later in the chain, e.g., <tt>D->more->more = D->more</tt>) You'll have only a partial copy.\n\\exception out.error='d'  Dimension error; should never happen.\n\\exception out.error='p'  Missing part error; should never happen.\n\n\\li If the input data set has an error, then I will copy it anyway, including the\nerror flag (which might be overwritten). I print a warning if the verbosity level\nis <tt>>=1</tt>.\n\n  */\napop_data *apop_data_copy(const apop_data *in){\n    if (!in) return NULL;\n    apop_data *out = apop_data_alloc();\n    Apop_stopif(out->error, return out, 0, \"Allocation error.\");\n    if (in->error){\n        Apop_notify(1, \"the data set to be copied has an error flag of %c. Copying it.\", in->error);\n        out->error = in->error;\n    }\n    if (in->more){\n        Apop_stopif(in == in->more, out->error='c'; return out,\n                0, \"the ->more element of this data set equals the \"\n                                        \"data set itself. This is not healthy. Made a partial copy and set out.error='c'.\");\n        out->more = apop_data_copy(in->more);\n        Apop_stopif(out->more->error, out->error=out->more->error; return out,\n                0, \"propagating an error in the ->more element to the parent apop_data set. Only a partial copy made.\");\n    }\n    if (in->vector){\n        out->vector = gsl_vector_alloc(in->vector->size);\n        Apop_stopif(!out->vector, out->error='a'; return out, 0, \"Allocation error on vector of size %zu.\", in->vector->size);\n    }\n    if (in->matrix){  \n        out->matrix = gsl_matrix_alloc(in->matrix->size1, in->matrix->size2);\n        Apop_stopif(!out->matrix, out->error='a'; return out, 0, \"Allocation error on matrix \"\n                    \"of size %zu X %zu.\", in->matrix->size1, in->matrix->size2);\n    }\n    if (in->weights){\n        out->weights = gsl_vector_alloc(in->weights->size);\n        Apop_stopif(!out->weights, out->error='a'; return out, 0, \"Allocation error on weights vector of size %zu.\", in->weights->size);\n    }\n    if (in->textsize[0] && in->textsize[1]){\n        apop_text_alloc(out, in->textsize[0], in->textsize[1]);\n        Apop_stopif(out->error, return out, 0, \"Allocation error on text grid of size %zu X %zu.\", in->textsize[0], in->textsize[1]);\n    }\n    apop_data_memcpy(out, in);\n    return out;\n}\n\n/** Put the first data set either on top of or to the left of the second data set.\n\nFor the opposite operation, see \\ref apop_data_split.\n\n\\param  m1      the upper/rightmost data set (default = \\c NULL)\n\\param  m2      the second data set (default = \\c NULL)\n\\param  posn    If 'r', stack rows of m1 above rows of m2<br>\n    if 'c', stack columns of m1 to left of m2's<br>\n    (default = 'r')\n\\param  inplace If \\c 'y', use \\ref apop_matrix_realloc and \\ref apop_vector_realloc to modify \\c m1 in place. Otherwise, allocate a new \\ref apop_data set, leaving \\c m1 undisturbed. (default='n')\n\\return         The stacked data, either in a new \\ref apop_data set or \\c m1\n\\exception out->error=='a' Allocation error.\n\\exception out->error=='d'  Dimension error; couldn't make a complete copy.\n\n\\li The function returns a new data set, meaning that until you apop_data_free()\n    the original data sets, you will be taking up twice as much memory.\n\\li If m1 or m2 are \\c NULL, returns a copy of the other element, and if\n    both are \\c NULL, returns \\c NULL. If \\c m2 is \\c NULL and \\c inplace is \\c\n    'y', returns the original \\c m1 pointer unmodified.\n\\li Text is handled as you'd expect: If 'r', one set of text is stacked on top of the\n    other [number of columns must match]; if 'c', one set of text is set next to the other\n    [number of rows must match].\n\\li \\c more is ignored.\n\\li If stacking rows on rows, the output vector is the input\n    vectors stacked accordingly. If stacking columns by columns, the output\n    vector is just a copy of the vector of \\c m1 and <tt>m2->vector</tt> doesn't appear in the\n    output at all.  \n\\li The same rules for dealing with the vector(s) hold for the vector(s) of weights.\n\\li Names are a copy of the names for \\c m1, with the names for \\c m2 appended to the\n    row or column list, as appropriate.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data *apop_data_stack(apop_data *m1, apop_data * m2, char posn, char inplace){\n    apop_data * apop_varad_var(m1, NULL)\n    apop_data * apop_varad_var(m2, NULL)\n    char apop_varad_var(posn, 'r')\n    Apop_stopif(!(posn == 'r' || posn == 'c'), return NULL, 0, \"Valid positions are 'r' or 'c'\"\n                                                         \" you gave me '%c'. Returning NULL.\", posn);\n    char apop_varad_var(inplace, 'n')\n    inplace = (inplace == 'y' || inplace == 1 || inplace == 'Y') ? 1 : 0;\nAPOP_VAR_ENDHEAD\n    if (!m1) return apop_data_copy(m2);\n    if (!m2) return inplace ? m1 : apop_data_copy(m1);\n    apop_data *out = NULL;\n    if (inplace)\n        out = m1;\n    else {\n        apop_data *m = m1->more; //not following the more pointer.\n        m1->more =NULL;\n        out = apop_data_copy(m1);\n        Apop_stopif(out->error, return out, 0, \"initial copy failed; leaving.\");\n        m1->more = m;\n    }\n    Get_vmsizes(m1); //original sizes of vsize, msize1, msize2.\n    if (m2->names && !out->names) out->names = apop_name_alloc();\n    \n    if (posn == 'c'){\n        if (m2->vector && out->vector){\n            gsl_matrix_view mview = gsl_matrix_view_vector(m2->vector, m2->vector->size, 1);\n            out->matrix = apop_matrix_stack(out->matrix, &mview.matrix, posn, .inplace='y');\n            apop_name_stack(out->names, m2->names, 'c', 'v');\n            if (m2->names && !m2->names->vector && m2->names->colct) apop_name_add(out->names, \"v\", 'c');\n        }\n        if (m2->vector && !out->vector) {\n            out->vector= apop_vector_copy(m2->vector);\n            if (m2->names->vector) apop_name_add(out->names, m2->names->vector, 'v');\n        }\n    }\n\n    out->matrix = apop_matrix_stack(out->matrix, m2->matrix, posn, .inplace='y');\n\n\n    if (posn == 'r'){\n        out->vector  = apop_vector_stack(out->vector, m2->vector, .inplace='y');\n        out->weights = apop_vector_stack(out->weights, m2->weights, .inplace='y');\n    } \n\n    if (m2->text){ //we've already copied m1->text, if any, so if m2->text is NULL, we're done.\n        if (posn=='r'){\n            Apop_stopif(out->text && m2->textsize[1]!=out->textsize[1], \n                    out->error='d'; return out, 0,\n                            \"The first data set has %zu columns of text and the second has %zu columns. \"\n                            \"I can't stack that.\", out->textsize[1], m2->textsize[1]);\n            int basetextsize = out->textsize[0];\n            apop_text_alloc(out, basetextsize+m2->textsize[0], m2->textsize[1]);\n            Apop_stopif(out->error, return out, 0, \"Allocation error.\");\n            for(int i=0; i< m2->textsize[0]; i++)\n                for(int j=0; j< m2->textsize[1]; j++)\n                    if (m2->text[i][j] == apop_nul_string)\n                         apop_text_blank(out, i+basetextsize, j);\n                    else apop_text_set(out, i+basetextsize, j, \"%s\", m2->text[i][j]);\n        } else {\n            Apop_stopif(out->text && m2->textsize[0]!=out->textsize[0], \n                    out->error='d'; return out, 0,\n                            \"The first data set has %zu rows of text and the second has %zu rows. \"\n                            \"I can't stack that.\", out->textsize[0], m2->textsize[0]);\n            int basetextsize = out->textsize[1];\n            apop_text_alloc(out, m2->textsize[0], basetextsize+m2->textsize[1]);\n            Apop_stopif(out->error, out->error='a'; return out, 0, \"Allocation error.\");\n            for(int i=0; i< m2->textsize[0]; i++)\n                for(int j=0; j< m2->textsize[1]; j++)\n                    if (m2->text[i][j] == apop_nul_string)\n                         apop_text_blank(out, i, j+basetextsize);\n                    else apop_text_set(out, i, j+basetextsize, \"%s\", m2->text[i][j]);\n            apop_name_stack(out->names, m2->names, 't');\n        }\n    }\n    if ((posn=='r' && m2->names && m2->names->rowct) || (posn=='c' && m2->names && m2->names->colct)){\n        int min = posn =='r' ? m1->names->rowct : m1->names->colct;\n        int max = posn =='r' ? GSL_MAX(vsize, msize1) : msize2;\n        for (int k = min; k < max; k++)          //pad so the name stacking is aligned (if needed)\n            apop_name_add(out->names, \"\", posn); \n        apop_name_stack(out->names, m2->names, posn);\n    }\n    return out;\n}\n\n/** Split one input \\ref apop_data structure into two.\n\n For the opposite operation, see \\ref apop_data_stack.\n \n\\param in  The \\ref apop_data structure to split \n\\param splitpoint The index of what will be the first row/column of the second data set.\nE.g., if this is -1 and \\c r_or_c=='c', then the whole data set will be in the second\ndata set; if this is the length of the matrix then the whole data set will be in the\nfirst data set. Another way to put it is that for values between zero and the matrix's\nsize, \\c splitpoint will equal the number of rows/columns in the first matrix.\n\n\\param r_or_c If this is 'r' or 'R', then put some rows in the first data set and some in the second; of 'c' or 'C', split columns into first and second data sets.\n\n \\return An array of two \\ref apop_data sets. If one is empty then a\n \\c NULL pointer will be returned in that position. For example, for a data set of 50 rows, <tt>apop_data **out = apop_data_split(data, 100, 'r')</tt> sets <tt>out[0] = apop_data_copy(data)</tt> and <tt>out[1] = NULL</tt>.\n\n \\li When splitting at a row, the text is also split.\n \\li The \\c more pointer is ignored.\n \\li The <tt>apop_data->vector</tt> is taken to be the -1st element of the matrix.  \n \\li Weights will be preserved. If splitting by rows, then the top and bottom parts of the weights vector will be assigned to the top and bottom parts of the main data set. If splitting by columns, identical copies of the weights vector will be assigned to both parts.\n \\li Data is copied, so you may want to call <tt>apop_data_free(in)</tt> after this.\n */\napop_data ** apop_data_split(apop_data *in, int splitpoint, char r_or_c){\n    //A long, dull series of contingencies. Bonus: a reasonable use of goto.\n    apop_data   **out   = malloc(2*sizeof(apop_data *));\n    out[0] = out[1] = NULL;\n    Apop_stopif(!in, return out, 1, \"input was NULL; output will be an array of two NULLs.\");\n    gsl_vector v1, v2, w1, w2;\n    gsl_matrix m1, m2;\n    int set_v1 = 1, set_v2 = 1,\n        set_m1 = 1, set_m2 = 1,\n        set_w1 = 1, set_w2 = 1,\n        namev0 = 0, namev1 = 0,\n        namer0 = 0, namer1 = 0,\n        namec0 = 0, namec1 = 0,\n        namersplit = -1, namecsplit = -1;\n     if (r_or_c == 'r' || r_or_c == 'R') {\n        if (splitpoint <=0)\n            out[1]  = apop_data_copy(in);\n        else if (in->matrix && splitpoint >= in->matrix->size1)\n            out[0]  = apop_data_copy(in);\n        else {\n            namev0  =\n            namev1  = \n            namec0  =\n            namec1  = 1;\n            if (in->vector){\n                v1 = gsl_vector_subvector(in->vector, 0, splitpoint).vector;\n                v2 = gsl_vector_subvector(in->vector, splitpoint, in->vector->size - splitpoint).vector;\n            } else\n                set_v1  = \n                set_v2  = 0;\n            if (in->weights){\n                w1 = gsl_vector_subvector(in->weights, 0, splitpoint).vector;\n                w2 = gsl_vector_subvector(in->weights, splitpoint,\n                        in->weights->size - splitpoint).vector;\n            } else\n                set_w1  = \n                set_w2  = 0;\n            if (in->matrix){\n                m1      = gsl_matrix_submatrix (in->matrix, 0, 0, splitpoint, in->matrix->size2).matrix;\n                m2      = gsl_matrix_submatrix (in->matrix, splitpoint, 0,\n                                    in->matrix->size1 - splitpoint,  in->matrix->size2).matrix;\n            } else\n                set_m1  = \n                set_m2  = 0;\n            namersplit=splitpoint;\n            goto allocation;\n        }\n    } else if (r_or_c == 'c' || r_or_c == 'C') {\n        if (in->weights){\n            w1      = gsl_vector_subvector(in->weights, 0, in->weights->size).vector;\n            w2      = gsl_vector_subvector(in->weights, 0, in->weights->size).vector;\n        } else \n            set_w1 = \n            set_w2 = 0;\n        namer0 = 1;\n        namer1 = 1;\n\n        if (splitpoint <= -1)\n            out[1]  = apop_data_copy(in);\n        else if (in->matrix && splitpoint >= in->matrix->size2)\n            out[0]  = apop_data_copy(in);\n        else if (splitpoint == 0){\n            if (in->vector){\n                v1      = gsl_vector_subvector(in->vector, 0, in->vector->size).vector;\n                namev0  = 1;\n            } else \n                set_v1 = 0;\n            set_v2  = 0;\n            set_m1  = 0;\n            if (in->matrix){\n                m2      = gsl_matrix_submatrix (in->matrix, 0, 0, \n                                    in->matrix->size1,  in->matrix->size2).matrix;\n                namec1  = 1;\n            } else \n                set_m2 = 0;\n            goto allocation;\n        } else if (splitpoint > 0 && in->matrix && splitpoint < in->matrix->size2){\n            if (in->vector){\n                v1      = gsl_vector_subvector(in->vector, 0, in->vector->size).vector;\n                namev0  = 1;\n            } else \n                set_v1 = 0;\n            set_v2  = 0;\n            if (in->matrix){\n                m1      = gsl_matrix_submatrix (in->matrix, 0, 0, in->matrix->size1, splitpoint).matrix;\n                m2      = gsl_matrix_submatrix (in->matrix, 0, splitpoint, \n                                    in->matrix->size1,  in->matrix->size2-splitpoint).matrix;\n                namecsplit = splitpoint;\n            } else\n                set_m1  = \n                set_m2  = 0;\n            goto allocation;\n        } else { //splitpoint >= in->matrix->size2\n            if (in->vector){\n                v1      = gsl_vector_subvector(in->vector, 0, in->vector->size).vector;\n                namev0  = 1;\n            } else \n                set_v1 = 0;\n            set_v2  = 0;\n            if (in->matrix){\n                m1      = gsl_matrix_submatrix (in->matrix, 0, 0, \n                            in->matrix->size1, in->matrix->size2).matrix;\n                namec0 = 1;\n            }\n            else set_m1 = 0;\n            set_m2  = 0;\n            goto allocation;\n        }\n    } else Apop_notify(0, \"Please set r_or_c == 'r' or == 'c'. Returning two NULLs.\");\n    return out;\n\nallocation:\n    out[0]  = apop_data_alloc();\n    out[1]  = apop_data_alloc();\n    if (set_v1) out[0]->vector  = apop_vector_copy(&v1);\n    if (set_v2) out[1]->vector  = apop_vector_copy(&v2);\n    if (set_m1) out[0]->matrix  = apop_matrix_copy(&m1);\n    if (set_m2) out[1]->matrix  = apop_matrix_copy(&m2);\n    if (set_w1) out[0]->weights  = apop_vector_copy(&w1);\n    if (set_w2) out[1]->weights  = apop_vector_copy(&w2);\n    if (namev0 && out[0]) apop_name_stack(out[0]->names, in->names, 'v');\n    if (namev1 && out[1]) apop_name_stack(out[1]->names, in->names, 'v');\n    if (namersplit >=0)\n        for (int k=0; k< in->names->rowct; k++){\n            int which = (k >= namersplit);\n            assert(out[which]);\n            apop_name_add(out[which]->names, in->names->row[k], 'r');\n        }\n    else {\n        if (namer0 && out[0]) apop_name_stack(out[0]->names, in->names, 'r');\n        if (namer1 && out[1]) apop_name_stack(out[1]->names, in->names, 'r');\n    }\n    if (namecsplit >=0)\n        for (int k=0; k< in->names->colct; k++){\n            int which = (k >= namecsplit);\n            assert(out[which]);\n            apop_name_add(out[which]->names, in->names->col[k], 'c');\n        }\n    else {\n        if (namec0 && out[0]) apop_name_stack(out[0]->names, in->names, 'c');\n        if (namec1 && out[1]) apop_name_stack(out[1]->names, in->names, 'c');\n    }\n    //finally, the text [split by rows only]\n    if (r_or_c=='r' && in->textsize[0] && in->textsize[1]){\n        apop_name_stack(out[1]->names, in->names, 't');\n        apop_text_alloc(out[0], splitpoint, in->textsize[1]);\n        Apop_stopif(out[0]->error, return out, 0, \"Allocation error.\");\n        if (in->textsize[0] > splitpoint){\n            apop_name_stack(out[0]->names, in->names, 't');\n            apop_text_alloc(out[1], in->textsize[0]-splitpoint, in->textsize[1]);\n            Apop_stopif(out[1]->error, return out, 0, \"Allocation error.\");\n        }\n        for (int i=0; i< in->textsize[0]; i++)\n            for (int j=0; j< in->textsize[1]; j++){\n                int whichtext = (i >= splitpoint);\n                int row = whichtext ? i - splitpoint : i;\n                Asprintf(&(out[whichtext]->text[row][j]), \"%s\", in->text[i][j]);\n            }\n    }\n    return out;\n}\n\n\n/** Remove the columns set to one in the \\c drop vector.\n\\param n the \\ref apop_name structure to be pared down\n\\param drop  a vector with n->colct elements, mostly zero, with a one marking those columns to be removed.\n\\see \\ref apop_data_prune_columns\n*/\nstatic void apop_name_rm_columns(apop_name *n, int *drop){\n    apop_name *newname = apop_name_alloc();\n    size_t initial_colct = n->colct;\n    for (size_t i=0; i< initial_colct; i++){\n        if (drop[i]==0) apop_name_add(newname, n->col[i],'c');\n        else            n->colct--;\n        free(n->col[i]);\n    }\n    free(n->col);\n    n->col = newname->col;\n\n    //we need to free the newname struct, but leave the column intact.\n    newname->col = NULL;\n    newname->colct  = 0;\n    apop_name_free(newname);\n}\n\n\nstatic gsl_matrix *apop_matrix_rm_columns(gsl_matrix *in, int *drop){\n    int ct  = 0,  //how many columns will not be dropped?\n        j   = 0;\n    for (size_t i=0; i < in->size2; i++)\n        if (drop[i]==0)\n            ct++;\n    if (ct == in->size2) return apop_matrix_copy(in);\n    if (ct == 0)         return NULL;\n    gsl_matrix *out = gsl_matrix_alloc(in->size1, ct);\n    for (size_t i=0; i < in->size2; i++){\n        if (drop[i]==0){\n            gsl_vector *v = Apop_cv(&(apop_data){.matrix=in}, i);\n            gsl_matrix_set_col(out, j, v);\n            j   ++;\n        }\n    }\n    return out;\n}\n\n/** Remove the columns of the \\ref apop_data set corresponding to a nonzero value in the \\c drop vector.\n\n\\li The returned data structure looks like it was modified in place, but the data\nmatrix and the names are duplicated before being pared down, so if your data is taking\nup more than half of your memory, this may not work.\n\n\\param d  The \\ref apop_data structure to be pared down. \n\\param drop  An array of ints. If use[7]==1, then column seven will be cut from the\noutput. A reminder: <tt>calloc(in->size2 , sizeof(int))</tt> will fill your array with zeros on allocation, and \n<tt>memset(use, 1, in->size2 * sizeof(int))</tt> will\nquickly fill an array of ints with nonzero values.\n\\ref apop_data_rm_rows\n*/\nvoid apop_data_rm_columns(apop_data *d, int *drop){\n    gsl_matrix *freeme = d->matrix;\n    d->matrix = apop_matrix_rm_columns(d->matrix, drop);\n    gsl_matrix_free(freeme); \n    apop_name_rm_columns(d->names, drop);\n}\n\n/** \\def apop_data_prune_columns(in, ...)\n  Keep only the columns of a data set that you name.\n\n\\param in The data set to prune.\n\\param ... A list of names to retain (i.e. the columns that shouldn't be pruned\nout). For example, if you have run \\ref apop_data_summarize, you have columns for several\nstatistics, but may care about only one or two; see the example.\n\nFor example:\n\\include test_pruning.c \n\n\\li I use a case-insensitive search to find your column.\n\\li If your name multiple columns, I'll only give you the first.\n\\li If I can't find a column matching one of your strings, I throw an error to the screen and continue.\n\\li This is a macro calling \\ref apop_data_prune_columns_base. It packages your list of\ncolumns into a list of strings, adds a \\c NULL string at the end, and calls that function.\n\\hideinitializer */\n \n/** Keep only the columns of a data set that you name.\n  This is the function called internally by the \\ref apop_data_prune_columns macro. In\n  most cases, you'll want to use that macro. An example of the two uses demonstrating the\n  difference:\n\n  \\code \n    apop_data_prune_columns(d, \"mean\", \"median\");\n\n    char *list[] = {\"mean\", \"median\", NULL};\n    apop_data_prune_columns_base(d, list);\n  \\endcode\n\n\\param d The data set to prune.\n\\param colnames A NULL-terminated list of names to retain. \n\\return A pointer to the input data set, now pruned.\n\\see apop_data_rm_columns\n*/\napop_data* apop_data_prune_columns_base(apop_data *d, char **colnames){\n    /* In types.h, you'll find an alias that takes the input, wraps it in the cruft that is\n    C's compound literal syntax, and appends a final \"\" to the list of strings. Here, I\n    find each element of the list, using that \"\" as a stopper, and then call apop_data_rm_columns.*/\n    Apop_stopif(!d, return NULL, 1, \"You're asking me to prune a NULL data set; returning.\");\n    Apop_stopif(!d->matrix, return d, 1, \"You're asking me to prune a data set with NULL matrix; returning.\");\n    int rm_list[d->names->colct];\n    int keep_count = 0;\n    char **name_step = colnames;\n    //to throw errors for typos (and slight efficiency gains), I need an array of whether\n    //each input colname has been used.\n    while (*name_step++)\n        keep_count++;\n    int used_field[keep_count];\n    memset(used_field, 0, keep_count*sizeof(int));\n\n    for (int i=0; i< d->names->colct; i++){\n        int keep = 0;\n        for (int j=0; j<keep_count; j++)\n            if (!used_field[j] && !strcasecmp(d->names->col[i], colnames[j])){\n                keep ++;\n                used_field[j]++;\n                break;\n            }\n        rm_list[i] = !keep;\n    }\n    apop_data_rm_columns(d, rm_list);\n    for (int j=0; j<keep_count; j++)\n        Apop_stopif(!used_field[j], , 1, \"You asked me to keep column \\\"%s\\\" but I couldn't find a match for it. Typo?\", colnames[j]);\n    return d;\n}\n\n/** Get a pointer to an element of an \\ref apop_data set. \n\n\\li If a \\c NULL vector or matrix (as the case may be), or the row/column you requested\n    is outside bounds, return \\c NULL.\n\\li See \\ref data_set_get \"the set/get page\" for details. \n\n\\param data The data set. Must not be \\c NULL.\n\\param row The row number of the desired element. If <tt>rowname==NULL</tt>, default is zero.\n\\param col The column number of the desired element. -1 indicates the vector. If <tt>colname==NULL</tt>, default is zero.\n\\param rowname The row name of the desired element. If <tt>NULL</tt>, use the row number.\n\\param colname The column name of the desired element. If <tt>NULL</tt>, use the column number.\n\\param page The case-insensitive name of the page on which the element is found. If \\c NULL, use first page.\n\n\\return A pointer to the element.\n*/\nAPOP_VAR_HEAD double * apop_data_ptr(apop_data *data, int row, int col, const char *rowname, const char *colname, const char *page){\n    apop_data * apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NULL, 0, \"You sent me a NULL data set. Returning NULL pointer.\");\n    int apop_varad_var(row, 0);\n    int apop_varad_var(col, 0);\n    const char * apop_varad_var(rowname, NULL);\n    const char * apop_varad_var(colname, NULL);\n    const char * apop_varad_var(page, NULL);\n\n    if (page){\n        data = apop_data_get_page(data, page);\n        Apop_stopif(!data, return NULL, 1, \"I couldn't find a page with label '%s'. Returning NULL.\", page);\n    };\n    if (rowname){\n        row = apop_name_find(data->names, rowname, 'r');\n        Apop_stopif(row == -2, return NULL, 1, \"Couldn't find '%s' amongst the row names.\", rowname);\n    }\n    if (colname){\n        col =  apop_name_find(data->names, colname, 'c');\n        Apop_stopif(col == -2, return NULL, 1, \"Couldn't find '%s' amongst the column names.\", colname);\n    }\nAPOP_VAR_ENDHEAD\n    if (col == -1 || (col == 0 && !data->matrix && data->vector)){\n        Apop_stopif(!data->vector, return NULL, 1, \"You asked for the vector element (col=-1) but it is NULL. Returning NULL.\");\n        return gsl_vector_ptr(data->vector, row);\n    } else {\n        Apop_stopif(!data->matrix, return NULL, 1, \"You asked for the matrix element (%i, %i) but the matrix is NULL Returning NULL..\", row, col);\n        return gsl_matrix_ptr(data->matrix, row,col);\n    }\n    return NULL;//the main function is blank.\n}\n\n/** Returns the data element at the given point.\n \nIn case of error (probably that you asked for a data point out of bounds), returns \\c NAN.\n See \\ref data_set_get \"the set/get page\" for details and examples.\n\n\\param data The data set. Must not be \\c NULL.\n\\param row The row number of the desired element. If <tt>rowname==NULL</tt>, default is zero.\n\\param col The column number of the desired element. -1 indicates the vector. \nIf <tt>colname==NULL</tt>, default is zero if the <tt>->matrix</tt> element is not \\c\nNULL and -1 if the <tt>->matrix</tt> element is \\c NULL and the <tt>->vector</tt> element is not.\n\n\\param rowname The row name of the desired element. If <tt>NULL</tt>, use the row number.\n\\param colname The column name of the desired element. If <tt>NULL</tt>, use the column number.\n\\param page The case-insensitive name of the page on which the element is found. If \\c NULL, use first page.\n\n\\return The value at the given location. */\nAPOP_VAR_HEAD double apop_data_get(const apop_data *data, size_t row, int col, const char *rowname, const char *colname, const char *page){\n    const apop_data * apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NAN, 0, \"You sent me a NULL data set. Returning NaN.\");\n    size_t apop_varad_var(row, 0);\n    int apop_varad_var(col, 0);\n    const char * apop_varad_var(rowname, NULL);\n    const char * apop_varad_var(colname, NULL);\n    const char * apop_varad_var(page, NULL);\n    \n    if (page){\n        data = apop_data_get_page(data, page);\n        Apop_stopif(!data, return NAN, 1, \"I couldn't find a page with label '%s'. Returning NaN.\", page);\n    };\n    if (rowname){\n        row = apop_name_find(data->names, rowname, 'r');\n        Apop_stopif(row == -2, return NAN, 1, \"Couldn't find '%s' amongst the row names. Returning NaN.\", rowname);\n    }\n    if (colname){\n        col =  apop_name_find(data->names, colname, 'c');\n        Apop_stopif(col == -2, return NAN, 1, \"Couldn't find '%s' amongst the column names. Returning NaN.\", colname);\n    }\nAPOP_VAR_ENDHEAD\n    if (col==-1 || (col == 0 && !data->matrix && data->vector)){\n        Apop_stopif(!data->vector, return NAN, 1,  \"You asked for the vector element (col=-1) but it is NULL.\");\n        return gsl_vector_get(data->vector, row);\n    } else {\n        Apop_stopif(!data->matrix, return NAN, 1, \"You asked for the matrix element (%zu, %i) but the matrix is NULL.\", row, col);\n        return gsl_matrix_get(data->matrix, row, col);\n    }\n}\n\n/* The only hint the GSL gives that something failed is that the error-handler is called.\n   The error handling function won't let you set an output to the function. So all we\n   can do is use a global variable.\n*/\n\nstatic threadlocal int error_for_set; //see apop_internal.h\n\nvoid apop_gsl_error_for_set(const char *reason, const char *file, int line, int gsl_errno){\n    Apop_notify(1, \"%s: %s\", file, reason);\n    Apop_maybe_abort(1);\n    error_for_set = -1;\n}\n\n/**  Set a data element.\nSee \\ref data_set_get \"the set/get page\" for details and examples. \n \n  \\return 0=OK, -1=error: couldn't find row/column name, or you asked for a location outside the vector/matrix bounds.\n\n\\li  The error codes for out-of-bounds errors are thread-safe iff you are have a\nC11-compliant compiler (thanks to the \\c _Thread_local keyword) or a version of GCC with the \\c __thread\nextension enabled.\n\n\\li Set weights via <tt>gsl_vector_set(your_data->weights, row, val);</tt>.\n\\li Set text elements via \\ref apop_text_set.\n\n\n\\param data The data set. Must not be \\c NULL.\n\\param row The row number of the desired element. If <tt>rowname==NULL</tt>, default is zero.\n\\param col The column number of the desired element. -1 indicates the vector. If <tt>colname==NULL</tt>, default is zero.\n\\param rowname The row name of the desired element. If <tt>NULL</tt>, use the row number.\n\\param colname The column name of the desired element. If <tt>NULL</tt>, use the column number.\n\\param page The case-insensitive name of the page on which the element is found. If \\c NULL, use first page.\n\\param val The value to give the point.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD int apop_data_set(apop_data *data, size_t row, int col, const double val, const char *colname, const char *rowname, const char *page){\n    apop_data * apop_varad_var(data, NULL);\n    Apop_stopif(!data, return -1, 0, \"You sent me a NULL data set.\");\n    size_t apop_varad_var(row, 0);\n    int apop_varad_var(col, 0);\n    const double apop_varad_var(val, 0);\n    const char * apop_varad_var(rowname, NULL);\n    const char * apop_varad_var(colname, NULL);\n    const char * apop_varad_var(page, NULL);\n    \n    if (page){\n        data = apop_data_get_page((apop_data*)data, page);\n        Apop_stopif(!data, return -1, 1, \"I couldn't find a page with label '%s'. Making no changes.\", page);\n    }\n    if (rowname){\n        row = apop_name_find(data->names, rowname, 'r');\n        Apop_stopif(row == -2, return -1, 1, \"Couldn't find '%s' amongst the column names. Making no changes.\", rowname);\n    }\n    if (colname){\n        col = apop_name_find(data->names, colname, 'c');\n        Apop_stopif(col == -2, return -1, 1, \"Couldn't find '%s' amongst the column names. Making no changes.\", colname);\n    }\nAPOP_VAR_ENDHEAD\n    Set_gsl_handler\n    if (col==-1 || (col == 0 && !data->matrix && data->vector)){\n        Apop_stopif(!data->vector, return -1, 1, \"You're trying to set a vector element (row=-1) but the vector is NULL.\");\n        gsl_vector_set(data->vector, row, val);\n    } else {\n        Apop_stopif(!data->matrix, return -1, 1, \"You're trying to set the matrix element (%zu, %i) but the matrix is NULL.\", row, col);\n        gsl_matrix_set(data->matrix, row, col, val);\n    }\n    Unset_gsl_handler\n    return error_for_set;\n}\n\n/** A convenience function to add a named element to a data set.  Many of Apophenia's\ntesting procedures use this to easily produce a column of named parameters. It is public\nas a convenience.\n\n\\param d    The \\ref apop_data structure. Must not be \\c NULL, but may be blank (as per\nallocation via \\ref apop_data_alloc <tt>( )</tt> ).\n\\param name The name to add\n\\param val  the value to add to the set.\n\n\\li I use the position of the last non-empty row name to know where to put the value. If\nthere are two names in the data set, then I will put the new name in\nthe third name slot and the data in the third slot in the vector. If\nyou use this function from start to finish in building your list, then you'll be fine.\n\\li If the vector is too short (or \\c NULL), I will call \\ref apop_vector_realloc internally to make space.\n\\li This fits well with the defaults for \\ref apop_data_get. An example:\n\n\\code\napop_data *list = apop_data_alloc();\napop_data_add_named_elmt(list, \"height\", 165);\napop_data_add_named_elmt(list, \"weight\", 60);\n\ndouble height = apop_data_get(list, .rowname=\"height\");\n\n//or\n#define Lookup(dataset, key) apop_data_get(dataset, .rowname=#key)\nheight = Lookup(list, height);\n\\endcode\n*/\nvoid apop_data_add_named_elmt(apop_data *d, char *name, double val){\n    Apop_stopif(!d, return, 0, \"You sent me a NULL apop_data set. \"\n                               \"Maybe allocate with apop_data_alloc() to start.\");\n    apop_name_add(d->names, name, 'r');\n    if (!d->vector) d->vector = gsl_vector_alloc(1);\n    if (d->vector->size < d->names->rowct)\n        apop_vector_realloc(d->vector, d->names->rowct);\n    gsl_vector_set(d->vector, d->names->rowct-1, val);\n}\n\n//See apop_data_add_names in types.h.\nvoid apop_data_add_names_base(apop_data *d, const char type, char const ** names){\n    if (!d->names) d->names = apop_name_alloc();\n    for(char const** name = names; *name !=NULL; name++)\n        apop_name_add(d->names, *name, type);\n}\n\n\n/** Add a string to the text element of an \\ref apop_data set.  If you\n send me a \\c NULL string, I will write the value of  <tt>apop_opts.nan_string</tt> in the given slot.\n If there is already something in that slot, that string is freed, preventing memory leaks.\n\n\\param in   The \\ref apop_data set, that already has an allocated \\c text element.\n\\param row  The row\n\\param col  The column\n\\param fmt The text to write.\n\\param ... You can use a printf-style fmt and follow it with the usual variables to fill in.\n\n\\return 0=OK, -1=error (probably out-of-bounds)\n\n  \\li UTF-8 or ASCII text is correctly handled.\n  \\li Apophenia follows a general rule of not reallocating behind your back: if\nyour text matrix is currently of size (3,3) and you try to put an item in slot (4,4),\nthen I display an error rather than reallocating the text matrix.\n  \\li The string added is a copy (via <tt>asprintf</tt>), not a pointer to the input(s).\n  \\li If there had been a string at the grid point you are writing to,\nthe old one is freed to prevent leaks. Remember this if you had other pointers aliasing\nthat string.\n  \\li If an element is \\c NULL, write <tt>apop_opts.nan_string</tt> at that point. You\nmay prefer to use <tt>\"\"</tt> to express a blank.\n  \\li \\ref apop_text_alloc will reallocate to a new size if you need. For example,\nthis code will fill the diagonals of the text array with a message, resizing as it goes:\n\n\\code\napop_data *list = (something already allocated.);\nfor (int n=0; n < 10; n++){\n    apop_text_alloc(list, n+1, n+1);\n    apop_text_set(list, n, n, \"This is cell (%i, %i)\", n, n);\n}\n\\endcode\n*/\nint apop_text_set(apop_data *in, const size_t row, const size_t col, const char *fmt, ...){\n    Apop_stopif(!in, return -1, 0, \"You asked me to write text to a NULL data set.\");\n    Apop_stopif((in->textsize[0] < (int)row+1) || (in->textsize[1] < (int)col+1), return -1, 0, \"You asked me to put the text \"\n                            \" '%s' at position (%zu, %zu), but the text array has size (%zu, %zu)\\n\", \n                               fmt,             row, col,                  in->textsize[0], in->textsize[1]);\n    if (in->text[row][col] != apop_nul_string) free(in->text[row][col]);\n    if (!fmt){\n        Asprintf(&(in->text[row][col]), \"%s\", apop_opts.nan_string);\n        return 0;\n    }\n    va_list argp;\n\tva_start(argp, fmt);\n    Apop_stopif(vasprintf(&(in->text[row][col]), fmt, argp)==-1, , 0, \"Trouble writing to a string.\");\n\tva_end(argp);\n    return 0;\n}\n\n/** This allocates or resizes the \\c text element of an \\ref apop_data set. \n\n  If the \\c text element already exists, then this is effectively a \\c realloc function,\n  reshaping to the size you specify.\n\n  \\param in An \\ref apop_data set. It's OK to send in \\c NULL, in which case an apop_data set with \\c NULL \\c matrix and \\c vector elements is returned.\n  \\param row    the number of rows of text.\n  \\param col     the number of columns of text.\n  \\return       A pointer to the relevant \\ref apop_data set. If the input was not \\c NULL, then this is a repeat of the input pointer.\n  \\exception out->error=='a'  Allocation error.\n  */\napop_data * apop_text_alloc(apop_data *in, const size_t row, const size_t col){\n    Apop_stopif((!row && col) || (!col && row), return in, 1, \"Not allocating a %zu x %zu text grid. \"\n                                            \"Returning the input apop_data set.\", row, col);\n    if (!in) in  = apop_data_alloc();\n    if (!in->text){\n        if (row){\n            in->text = malloc(sizeof(char**) * row);\n            Apop_stopif(!in->text, in->error='a'; return in, \n                    0, \"malloc failed setting up %zu rows. Probably out of memory.\", row);\n        }\n        if (row && col)\n            for (size_t i=0; i< row; i++){\n                in->text[i] = malloc(sizeof(char*) * col);\n                Apop_stopif(!in->text[i], in->error='a'; return in, \n                        0, \"malloc failed setting up row %zu (with %zu columns). Probably out of memory.\", i, col);\n                for (size_t j=0; j< col; j++)\n                    in->text[i][j] = apop_nul_string;\n            }\n    } else { //realloc\n        size_t rows_now = in->textsize[0];\n        size_t cols_now = in->textsize[1];\n        if (rows_now > row){\n            for (int i=row; i < rows_now; i++){\n                for (int j=0; j < cols_now; j++)\n                    if (in->text[i][j] != apop_nul_string) \n                        free(in->text[i][j]);\n                free(in->text[i]);\n            }\n            in->text = realloc(in->text, sizeof(char**)*row);\n            Apop_stopif(row && !in->text, in->error='a'; return in,\n                            0, \"realloc failed shrinking down to %zu rows from %zu rows. \"\n                            \"There may be actual bugs eating your computer.\", row, rows_now);\n        }\n        if (rows_now < row){\n            in->text = realloc(in->text, sizeof(char**)*row);\n            Apop_stopif(!in->text, in->error='a'; return in,\n                            0, \"realloc failed setting up %zu rows. Probably out of memory.\", row);\n            for (size_t i=rows_now; i < row; i++){\n                in->text[i] = malloc(sizeof(char*) * col);\n                Apop_stopif(!in->text[i], in->error='a'; return in, \n                        0, \"malloc failed setting up row %zu (with %zu columns). Probably out of memory.\", i, col);\n                for (int j=0; j < cols_now; j++)\n                    in->text[i][j] = apop_nul_string;\n            }\n        }\n        if (cols_now > col)\n            for (int i=0; i < row; i++)\n                for (int j=col; j < cols_now; j++)\n                    if (in->text[i][j]!=apop_nul_string) \n                        free(in->text[i][j]);\n        if (cols_now != col)\n            for (int i=0; i < row; i++){\n                in->text[i] = realloc(in->text[i], sizeof(char*)*col);\n                for (int j=cols_now; j < col; j++) //happens iff cols_now < col\n                    in->text[i][j] = apop_nul_string;\n            }\n    }\n    in->textsize[0] = row;\n    in->textsize[1] = col;\n    return in;\n}\n\n/** Transpose the matrix and text elements of the input data set, including the row/column names. \n\nThe vector and weights elements of the input data set are completely ignored (but see\nalso \\ref apop_vector_to_matrix, which can convert a vector to a 1 X N matrix.) If\ncopying, these other elements won't be present; if <tt>.inplace='y'</tt>, it is up to you to\nhandle these not-transposed elements correctly.\n\n\\param in The input \\ref apop_data set. If \\c NULL, I return \\c NULL. (default: \\c NULL)\n\\param transpose_text If \\c 'y', then also transpose the text element. (default: \\c 'y')\n\\param inplace If \\c 'y', transpose the input in place; if \\c 'n', produce a transposed\ncopy, leaving the original untouched. Due to how <tt>gsl_matrix_transpose_memcpy</tt>\nworks, a copy will still be made, then copied to the original location.  (default: \\c 'y')\n\n\\return  If <tt>inplace=='n'</tt>, a newly alloced \\ref apop_data set, with the\nappropriately transposed matrix and/or text. The vector and weights elements will be\n\\c NULL. If <tt>transpose_text='n'</tt>, then the text element of the output set will\nalso be \\c NULL.<br> if <tt>inplace=='y'</tt>, a pointer to the original data set,\nwith matrix and (if <tt>transpose_text='y'</tt>, text) transposed and vector and weights\nleft in place untouched.\n\n\\li Row names are written to column names of the output matrix, text, or both (whichever is not empty in the input).\n\\li If only the matrix or only the text have names, then the one set of names is written to the row names of the output.\n\\li If both matrix column names and text column names are present, text column names are lost.\n\\li if you have a \\c gsl_matrix with no names or text, you may prefer to use \\c gsl_matrix_transpose_memcpy.\n\\li This function uses the \\ref designated syntax for inputs.\n*/ \nAPOP_VAR_HEAD apop_data * apop_data_transpose(apop_data *in, char transpose_text, char inplace){\n    apop_data * apop_varad_var(in, NULL);\n    Apop_stopif(!in, return NULL, 1, \"Transposing a NULL data set; returning NULL.\");\n    char apop_varad_var(transpose_text, 'y');\n    char apop_varad_var(inplace, 'y');\nAPOP_VAR_ENDHEAD\n    Apop_stopif(!in->matrix && !*in->textsize, return apop_data_alloc(), \n            1, \"input data set has neither matrix nor text elements; returning an empty data set.\");\n    apop_data *out = (inplace=='y') ? in\n                                    : apop_data_alloc(0, in->matrix ? in->matrix->size2 : 0\n                                                       , in->matrix ? in->matrix->size1 : 0);\n    if (inplace=='y'){\n        if (in->matrix) {\n            if (in->matrix->size1 == in->matrix->size2)\n                gsl_matrix_transpose(in->matrix);\n            else {\n                gsl_matrix *outm = gsl_matrix_alloc(in->matrix->size2, in->matrix->size1);\n                gsl_matrix_transpose_memcpy(outm, in->matrix);\n                gsl_matrix_free(in->matrix);\n                in->matrix = outm;\n            }\n        }\n        if (out->names){\n            char **tmp = out->names->col;\n            out->names->col = out->names->row;\n            out->names->row = tmp;\n            int tmpct = out->names->colct;\n            out->names->colct = out->names->rowct;\n            out->names->rowct = tmpct;\n        }\n    } else if (inplace!='y' && in->matrix){\n        if (in->matrix) gsl_matrix_transpose_memcpy(out->matrix, in->matrix);\n        apop_name_stack(out->names, in->names, 'r', 'c');\n        apop_name_stack(out->names, in->names, 'c', 'r');\n    }\n    if (transpose_text!='y' || in->textsize[0] == 0 || in->textsize[1] == 0) return out;\n    if (inplace=='y'){\n        size_t orows = in->textsize[0];\n        size_t ocols = in->textsize[1];\n        if (orows > ocols){ //extend the first ocols rows to their now-longer length\n            for (size_t i=0; i< ocols; i++){\n                in->text[i] = realloc(in->text[i], sizeof(char*)*orows);\n                Apop_stopif(!in->text[i], in->error='a'; return in, \n                        0, \"malloc failed setting up row %zu (with %zu columns). Probably out of memory.\", i, orows);\n                for (int j=ocols; j < orows; j++)\n                    in->text[i][j] = in->text[j][i] == apop_nul_string\n                                        ? apop_nul_string\n                                        : strdup(in->text[j][i]);\n            }\n        }\n        if (ocols > orows){ //add rows.\n            in->text = realloc(in->text, sizeof(char**)*ocols);\n            Apop_stopif(!in->text, in->error='a'; return in,\n                            0, \"realloc failed setting up %zu rows. Probably out of memory.\", ocols);\n            for (size_t i=orows; i < ocols; i++){\n                in->text[i] = malloc(sizeof(char*) * orows);\n                Apop_stopif(!in->text[i], in->error='a'; return in, \n                        0, \"malloc failed setting up row %zu (with %zu columns). Probably out of memory.\", i, orows);\n                for (int j=0; j < orows; j++)\n                    in->text[i][j] = in->text[j][i] == apop_nul_string\n                                        ? apop_nul_string\n                                        : strdup(in->text[j][i]);\n            }\n        }\n        size_t squaresize = GSL_MIN(orows, ocols);\n        for (int i=0; i< squaresize; i++) //now do the no-need-to-extend square\n            for (int j=i+1; j< squaresize; j++){\n                char *tmp = in->text[i][j];\n                in->text[i][j] = in->text[j][i];\n                in->text[j][i] = tmp;\n            }\n        in->textsize[0] = ocols;\n        in->textsize[1] = orows;\n    } else {\n        apop_text_alloc(out, in->textsize[1], in->textsize[0]);\n        for (int r=0; r< in->textsize[0]; r++)\n            for (int c=0; c< in->textsize[1]; c++)\n                if (in->text[r][c] == apop_nul_string)\n                     apop_text_blank(out, c, r);\n                else apop_text_set(out, c, r, in->text[r][c]);\n    }\n    if (in->names && in->names->textct && !in->names->colct)\n        apop_name_stack(out->names, in->names, 't', 'r');\n    return out;\n}\n\n/** This function will resize a \\c gsl_matrix to a new height or width.\n\nData in the matrix will be retained. If the new height or width is smaller than the old, then data in the later rows/columns will be cropped away (in a non--memory-leaking manner). If the new height or width is larger than the old, then new cells will be filled with garbage; it is your responsibility to zero out or otherwise fill new rows/columns before use.\n\n  \\li A large number of <tt>realloc</tt>s can take a noticeable amount of time. You\nare encouraged to determine the size of your data beforehand and avoid writing \\c for\nloops that reallocate the matrix at every iteration.\n  \\li The <tt>gsl_matrix</tt> is a versatile struct that can represent submatrices and\nother cuts from parent data. Resizing a subset of a parent matrix makes no sense,\nso return \\c NULL and print a warning if asked to resize a view of a matrix.\n\n\\param m The already-allocated matrix to resize.  If you give me \\c NULL, this becomes equivalent to \\c gsl_matrix_alloc\n\\param newheight, newwidth The height and width you'd like the matrix to be.\n\\return m, now resized\n */\ngsl_matrix * apop_matrix_realloc(gsl_matrix *m, size_t newheight, size_t newwidth){\n    if (!m)\n        return (newheight && newwidth) ?  gsl_matrix_alloc(newheight, newwidth) : NULL;\n    size_t i, oldoffset=0, newoffset=0, realloced = 0;\n    Apop_stopif(m->block->data!=m->data || !m->owner || m->tda != m->size2,\n            return NULL, 0, \"I can't resize submatrices or other subviews.\");\n    m->block->size = newheight * newwidth;\n    if (m->size2 > newwidth)\n        for (i=1; i< GSL_MIN(m->size1, newheight); i++){\n            oldoffset +=m->size2;\n            newoffset +=newwidth;\n            memmove(m->data+newoffset, m->data+oldoffset, sizeof(double)*newwidth);\n        } \n    else if (m->size2 < newwidth){\n        m->block->data = m->data = realloc(m->data, sizeof(double) * m->block->size);\n        realloced = 1;\n        int height = GSL_MIN(m->size1, newheight);\n        for (i= height-1; i > 0; i--){\n            newoffset +=newwidth;\n            memmove(m->data+(height * newwidth) - newoffset, m->data+i*m->size2, sizeof(double)*m->size2);\n        }\n    }\n    m->size1 = newheight;\n    m->tda   =\n    m->size2 = newwidth;\n    if (!realloced)\n        m->block->data = m->data = realloc(m->data, sizeof(double) * m->block->size);\n    return m;\n}\n\n/** This function will resize a \\c gsl_vector to a new length.\n\nData in the vector will be retained. If the new height is\nsmaller than the old, then data at the end of the vector will be\ncropped away (in a non--memory-leaking manner). If the new height is larger than the old,\nthen new cells will be filled with garbage; it is your responsibility\nto zero out or otherwise fill them before use.\n\n  \\li A large number of <tt>realloc</tt>s can take a noticeable amount of time. You\nare thus encouraged to make an effort to determine the size of your data and do one\nallocation, rather than writing \\c for loops that resize a vector at every increment.\n  \\li The <tt>gsl_vector</tt> is a versatile struct that\ncan represent subvectors, matrix columns and other cuts from parent data. \nResizing a portion of a parent matrix makes no sense, so\nreturn \\c NULL and print an error if asked to resize a view.\n\n\\param v The already-allocated vector to resize.  If you give me \\c NULL, this is equivalent to \\c gsl_vector_alloc\n\\param newheight The height you'd like the vector to be.\n\\return v, now resized\n */\ngsl_vector * apop_vector_realloc(gsl_vector *v, size_t newheight){\n    if (!v) return newheight ? gsl_vector_alloc(newheight) : NULL;\n    Apop_stopif(v->block->data!=v->data || !v->owner || v->stride != 1,\n                    return NULL, 0, \"I can't resize subvectors or other views.\");\n    v->block->size = newheight;\n    v->size = newheight;\n    v->block->data = \n    v->data        = realloc(v->data, sizeof(double) * v->block->size);\n    return v;\n}\n\n/** It's good form to get a page from your data set by name, because you\n  may not know the order for the pages, and the stepping through makes\n  for dull code anyway (<tt>apop_data *page = dataset; while (page->more) page= page->more;</tt>).\n\n  \\param data The \\ref apop_data set to use. No default; if \\c NULL,\n      gives a warning if <tt>apop_opts.verbose >=1</tt> and returns \\c NULL.\n\n  \\param title The name of the page to retrieve. Default=\\c \"<Info>\", which\n      is the name of the page of additional estimation information returned\n      by estimation routines (log likelihood, status, AIC, BIC, confidence intervals, ...).\n      \n  \\param match If \\c 'c', case-insensitive match (via \\c strcasecmp); if \\c 'e', exact match, if \\c 'r' regular expression substring search (via \\ref apop_regex). Default=\\c 'c'.\n\n    \\return The page whose title matches what you gave me. If I don't find a match, return \\c NULL.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data * apop_data_get_page(const apop_data * data, const char *title, const char match){\n    const apop_data * apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NULL, 1, \"You requested a page from a NULL data set. Returning NULL\");\n    const char * apop_varad_var(title, \"<Info>\");\n    const char apop_varad_var(match, 'c');\n    Apop_stopif(match!='r' && match!='e' && match!='c', return NULL, 0,\n                \"match type needs to be 'r', 'e', or 'c'; you supplied %c.\", match);\nAPOP_VAR_ENDHEAD\n    while (data && (!data->names || !data->names->title ||\n                (match=='c' && strcasecmp(data->names->title, title))\n                || (match=='r' && !apop_regex(data->names->title, title))\n                || (match=='e' && strcmp(data->names->title, title))\n                ))\n        data = data->more;\n    return (apop_data *) data; //de-const.\n}\n\n/** Add a page to an \\ref apop_data set. It gets a name so you can find it later.\n\n  \\param dataset The input data set, to which a page will be added.\n  \\param newpage The page to append\n  \\param title The name of the new page.\n\n  \\return The new page.  I post a warning if I am appending or appending to a \\c NULL data set and  <tt>apop_opts.verbose >=1 </tt>.\n\n  \\li See \\ref pps for further notes.\n*/\napop_data * apop_data_add_page(apop_data * dataset, apop_data *newpage, const char *title){\n    Apop_stopif(!newpage, return NULL, 1, \"You are adding a NULL page to a data set. Doing nothing; returning NULL.\");\n    if (!newpage->names) newpage->names = apop_name_alloc();\n    if (title && !(newpage->names->title == title)){//has title, but is not pointing to existing title\n        free(newpage->names->title);\n        Asprintf(&newpage->names->title, \"%s\", title);\n    }\n    Apop_stopif(!dataset, return newpage, 1, \"You are adding a page to a NULL data set. Returning the new page as its own data set.\");\n    while (dataset->more)\n        dataset = dataset->more;\n    dataset->more = newpage;\n    return newpage;\n}\n\n/** Remove the first page from an \\ref apop_data set that matches a given name.\n\n\\param data The input data set, from which a page will be removed. No default. \nIf \\c NULL, maybe print a warning (see below).\n\n\\param title The case-insensitive name of the page to remove. Default: \\c \"<Info>\"\n\\param free_p If \\c 'y', then \\ref apop_data_free the page. Default: \\c 'y'.\n\n\\return If not freed, a pointer to the \\c apop_data page that I just pulled out. Thus,\n  you can use this to pull a single page from a data set. I set that page's \\c more\n  pointer to \\c NULL, to minimize any confusion about more-than-linear linked list\n  topologies. If <tt>free_p=='y'</tt> (the default) or the page is not found, return \\c NULL.\n\n  \\li I don't check the first page, so there's no concern that the head of your list of\n  pages will move. Again, the intent of the <tt>->more</tt> pointer in the \\ref apop_data\n  set is not to fully implement a linked list, but primarily to allow you to staple auxiliary\n  information to a main data set.\n\n  \\li If I don't find the page you want, I return NULL, and maybe print a warning; see below.\n\n  \\li For the two above cases where a warning may be printed, if the page is to be\n      returned and <tt> apop_opts.verbose >= 1 </tt>, print a warning.\n    If the page is to be freed and <tt> apop_opts.verbose >= 2 </tt>, print a warning.\n\n  \\li The remaining \\c more pointers in the \\ref apop_data set are adjusted accordingly.\n*/\nAPOP_VAR_HEAD apop_data* apop_data_rm_page(apop_data * data, const char *title, const char free_p){\n    const char *apop_varad_var(title, \"<Info>\");\n    const char apop_varad_var(free_p, 'y');\n    apop_data *apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NULL, (free_p=='y'? 2: 1), \"You are removing a \"\n                               \"page from a NULL a data set. Doing nothing.\");\nAPOP_VAR_ENDHEAD\n    while (data->more && strcasecmp(data->more->names->title, title))\n        data = data->more;\n    Apop_stopif(!data->more, return NULL, (free_p=='y'?2:1), \"You asked me to \"\n                \"remove '%s' but I couldn't find a page matching that.\", title);\n    if (data->more){\n        apop_data *tmp = data->more;\n        data->more = data->more->more;\n        tmp->more = NULL;\n        if (free_p=='y'){\n            free(tmp);\n            return NULL;\n        } //else:\n        return tmp;\n    } else return NULL;\n}\n\ntypedef int (*apop_fn_ir)(apop_data*, void*);\n\n/** Remove the rows set to one in the \\c drop vector or for which the \\c do_drop function returns one.  \n\\param in the \\ref apop_data structure to be pared down\n\\param drop  a vector with as many elements as the max of the vector, matrix, or text\n  parts of \\c in, with a one marking those rows to be removed.\n\\param do_drop A function that returns one for rows to drop and zero for rows to not drop. A sample function:\n  \\code\n  int your_drop_function(apop_data *onerow, void *extra_param){\n    return gsl_isnan(apop_data_get(onerow)) ||\n                !strcmp(onerow->text[0][0], \"Uninteresting data point\");\n  }\n  \\endcode\n  \\ref apop_data_rm_rows will use \\ref Apop_r to get a subview of the input data set\n  of height one, and send that subview to this function (and since arguments typically\n  default to zero, you don't have to write out things like \\ref apop_data_get\n  <tt>(onerow, .row=0, .col=0)</tt>, which can help to keep things readable).\n\\param drop_parameter If your \\c do_drop function requires additional input, put it here\n  and it will be passed through.\n\n\\return Returns a pointer to the input data set, now pruned.\n\n\\li If all the rows are to be removed, then you will wind up with the same \\ref\n    apop_data set, with \\c NULL \\c vector, \\c matrix, \\c weight, and text. Therefore,\n    you may wish to check for \\c NULL elements after use. I remove rownames, but leave\n    the other names, in case you want to add new data rows.\n\\li The typical use is to provide only a list or only a function. If both are \\c\n    NULL, I return without doing anything, and print a warning if <tt>apop_opts.verbose\n    >=2</tt>. If you provide both, I will drop the row if either the vector has a one in\n    that row's position, or if the function returns a nonzero value.\n\\li This function uses the \\ref designated syntax for inputs.\n\\see \\ref apop_data_listwise_delete, \\ref apop_data_rm_columns\n*/  \nAPOP_VAR_HEAD apop_data* apop_data_rm_rows(apop_data *in, int *drop, apop_fn_ir do_drop, void *drop_parameter ){\n    apop_data* apop_varad_var(in, NULL);\n    Apop_stopif(!in, return in, 2, \"Input data set was NULL; no changes made.\");\n    int* apop_varad_var(drop, NULL);\n    apop_fn_ir apop_varad_var(do_drop, NULL);\n    void* apop_varad_var(drop_parameter, NULL);\n    Apop_stopif(!drop && !do_drop, return in, 0, \"You gave me neither a list of ints \"\n            \"indicating which rows to drop, nor a drop_fn I can use to test \"\n            \"each row. Returning with no changes made.\");\nAPOP_VAR_ENDHEAD\n    //First, shift columns down to the nearest not-freed row.\n    int outlength = 0;\n    Get_vmsizes(in); //vsize, msize1, maxsize\n    for (int i=0 ; i < maxsize; i++){\n        int drop_row=0;\n        if (drop && drop[i]) drop_row = 1;\n        else if (do_drop){\n            drop_row = do_drop(Apop_r(in, i), drop_parameter);\n        }\n        if (!drop_row){\n            if (outlength != i) apop_data_memcpy(Apop_r(in, outlength), Apop_r(in, i));\n            outlength++;\n        }\n    }\n    if (!outlength){\n        gsl_vector_free(in->vector);  in->vector = NULL;\n        gsl_vector_free(in->weights); in->weights = NULL;\n        gsl_matrix_free(in->matrix);  in->matrix = NULL;\n        apop_text_alloc(in, 0, 0);\n        //leave colnames intact, remove rownames below.\n    }\n\n    //now trim excess memory:\n    if (in->vector)  apop_vector_realloc(in->vector, GSL_MIN(in->vector->size, outlength));\n    if (in->weights) apop_vector_realloc(in->weights, GSL_MIN(in->weights->size, outlength));\n    if (in->matrix)  apop_matrix_realloc(in->matrix, GSL_MIN(in->matrix->size1, outlength), in->matrix->size2);\n    if (in->text)    apop_text_alloc(in, GSL_MIN(outlength, in->textsize[0]), in->textsize[1]);\n    if (in->names && in->names->rowct > outlength){\n        for (int k=outlength; k< in->names->rowct; k++)\n            free(in->names->row[k]);\n        in->names->rowct = outlength;\n    }\n    return in;\n}\n"
  },
  {
    "path": "apop_db.m4.c",
    "content": "/** \\file apop_db.c\tAn easy front end to SQLite. Includes a few nice\nfeatures like a variance, skew, and kurtosis aggregator for SQL. */\n/* Copyright (c) 2006--2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n\n/** Here are where the options are initially set. See the \\ref apop_opts_type\n    documentation for details.\n\\ingroup all_public\n*/\napop_opts_type apop_opts\t= \n          { .verbose=1,\n            .output_delimiter =\"\\t\",       .input_delimiters = \"|,\\t\", \n            .db_name_column = \"row_names\", .nan_string = \"NaN\", \n            .db_engine = '\\0',             .db_user = \"\\0\", \n            .db_pass = \"\\0\",               .stop_on_warning = 'n',\n            .log_file = NULL,\n            .rng_seed = 479901,            .version = m4_apop_version };\n\n#define ERRCHECK {Apop_stopif(err, return 1, 0, \"%s: %s\",query, err); }\n#define ERRCHECK_NR {Apop_stopif(err, return NULL, 0, \"%s: %s\",query, err); }\n#define ERRCHECK_SET_ERROR(outdata) {Apop_stopif(err, if (!(outdata)) (outdata)=apop_data_alloc(); (outdata)->error='q'; sqlite3_free(err); return outdata, 0, \"%s: %s\",query, err); }\n\n#include \"apop_db_sqlite.c\" // callback_t is defined here, btw.\n\n\n#ifdef HAVE_MYSQL\n//Let mysql have these.\n#undef VERSION\n#undef PACKAGE\n#undef PACKAGE_NAME\n#undef PACKAGE_STRING\n#undef PACKAGE_TARNAME\n#undef PACKAGE_VERSION\n#undef PACKAGE_BUGREPORT\n#include \"apop_db_mysql.c\"\n#endif\n\n//if !apop_opts.db_engine, run this to assign a value.\nstatic void get_db_type(){\n    if (getenv(\"APOP_DB_ENGINE\") && (!strcasecmp(getenv(\"APOP_DB_ENGINE\"), \"mysql\") || !strcasecmp(getenv(\"APOP_DB_ENGINE\"), \"mariadb\")))\n        apop_opts.db_engine = 'm';\n    else\n        apop_opts.db_engine = 's';\n}\n\n//This macro declares the query string and fills it from the printf part of the call.\n#define Fillin(query, fmt)        \\\n    char *query;                  \\\n    va_list argp;                 \\\n\tva_start(argp, fmt);          \\\n\tApop_stopif(vasprintf(&query, fmt, argp)==-1, , 0, \"Trouble writing to a string.\"); \\\n\tva_end(argp);                 \\\n\tApop_notify(2, \"%s\", query);\n\n/** If you want to use a database on the hard drive instead of memory, then call this\nonce and only once before using any other database utilities.\n\nWith SQLite, if you want a disposable database which you won't use after the program\nends, don't bother with this function.\n\nThe trade-offs between an on-disk database and an in-memory db are as one would expect:\nmemory is faster, but the database is destroyed when the program exits.\n\nMySQL users: either set the environment variable APOP_DB_ENGINE=mysql or set \\c apop_opts.db_engine = 'm'.\n\nThe Apophenia package assumes you are only using a single database at a time. You\ncan use the SQL <tt>attach</tt> function to load other databases, or see <a\nhref=\"http://modelingwithdata.org/arch/00000142.htm\">this blog post</a> for further\nsuggestions and sample code.\n\nWhen you are done doing your database manipulations, call \\ref apop_db_close if writing to disk.\n\n\\param filename\nThe name of a file on the hard drive on which to store the database. If\n<tt>NULL</tt>, then the database will be kept in memory (in which case,\nthe other database functions will call this function for you and you\ndon't need to bother).\n\n\\li See \\ref sqlsec for mroe notes on using databases.\n\n\\return 0: everything OK<br>\n        1: database did not open.\n*/\nint apop_db_open(char const *filename){\n    if (!apop_opts.db_engine) get_db_type();\n    if (!db) //check the environment.\n#ifdef HAVE_MYSQL\n       if(!mysql_db)  \n#endif\n\n    if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        return apop_mysql_db_open(filename);\n#else\n        {Apop_stopif(1, return -1, 0, \"Apophenia was compiled without mysql support.\");}\n#endif\n        return apop_sqlite_db_open(filename);\n}\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    char const *name;\n    int isthere;\n} tab_exists_t;\n/** \\endcond */\n\nstatic int tab_exists_callback(void *in, int argc, char **argv, char **whatever){\n    tab_exists_t *te = in;\n\tif (!strcmp(argv[argc-1], te->name))\n\t\tte->isthere=1;\n\treturn 0;\n}\n\n/** Check for the existence of a table, and maybe delete it.\n\nRecreating a table which already exists can cause errors, so it is good practice to check for existence first.  Also, this is the stylish way to delete a table, since just calling <tt>\"drop table\"</tt> will give you an error if the table doesn't exist.\n\n\\param name \tthe table name (no default)\n\\param remove 'd'\t==>delete table so it can be recreated in main.<br>\n\t\t'n'\t==>no action. Return result so program can continue. (default)\n\\return\n0 = table does not exist<br>\n1 = table was found, and if remove=='d', has been deleted\n-1 = processing error\n\n\\li In the SQLite engine, this function considers table views to be tables.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD int apop_table_exists(char const *name, char remove){\n    char const *apop_varad_var(name, NULL)\n    Apop_stopif(!name, return -1, 0, \"You gave me a NULL table name.\");\n    char apop_varad_var(remove, 'n')\nAPOP_VAR_END_HEAD\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        return apop_mysql_table_exists(name, remove);\n#else\n        Apop_stopif(1, return -1, 0, \"Apophenia was compiled without mysql support.\");\n#endif\n    char *err=NULL, *q2;\n    tab_exists_t te = { .name = name };\n    tab_exists_t tev = { .name = name };\n\tif (db==NULL) return 0;\n\tsqlite3_exec(db, \"select name from sqlite_master where type='table'\", tab_exists_callback, &te, &err); \n\tsqlite3_exec(db, \"select name from sqlite_master where type='view'\", tab_exists_callback, &tev, &err); \n    char query[]=\"Selecting names from sqlite_master\";//for ERRCHECK.\n\tERRCHECK\n\tif ((remove==1|| remove=='d') && (te.isthere||tev.isthere)){\n        if (te.isthere)\n            Asprintf(&q2, \"drop table %s;\", name);\n        else\n            Asprintf(&q2, \"drop view %s;\", name);\n\t\tsqlite3_exec(db, q2, NULL, NULL, &err); \n        free(q2);\n        ERRCHECK\n    }\n\treturn (te.isthere||tev.isthere);\n}\n\n/**\nCloses the database on disk. If you opened the database with \\c apop_db_open(NULL), then this is basically optional.\n\n\\param vacuum \n'v': vacuum---do clean-up to minimize the size of the database on disk.<br>\n'q': Don't bother; just close the database. (default = 'q')\n\n\\return 0 on OK, nonzero on error.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD int apop_db_close(char vacuum){\n    char apop_varad_var(vacuum, 'q')\nAPOP_VAR_END_HEAD\n    if (apop_opts.db_engine == 'm') //assume this is set by now...\n#ifdef HAVE_MYSQL\n        {apop_mysql_db_close(0);\n        return 0;}\n#else\n        {Apop_stopif(1, return -1, 0, \"Apophenia was compiled without mysql support.\");}\n#endif\n    else {\n        char *err, *query = \"db close\";//for errcheck.\n        if (vacuum==1 || vacuum=='v') {\n            sqlite3_exec(db, \"VACUUM\", NULL, NULL, &err);\n            ERRCHECK\n        }\n        sqlite3_close(db);\n    \t//ERRCHECK\n        db  = NULL;\n    }\n    return 0;\n}\n\n/** Send a query to the database that returns no data.\n\n\\li As with functions like the \\c apop_query_to_data, the query can include\nprintf-style format specifiers, such as <tt>apop_query(\"create table %s(id, name,\nage);\", tablename)</tt>.\n\n\\param fmt A <tt>printf</tt>-style SQL query.\n\\return 0 on success, 1 on failure.\n*/\nint apop_query(const char *fmt, ...){\n    char *err=NULL;\n    Fillin(query, fmt)\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        {Apop_stopif(!mysql_db, return 1, 0, \"No mySQL database is open.\");\n        return apop_mysql_query(query);}\n#else\n        Apop_stopif(1, return 1, 0, \"Apophenia was compiled without mysql support.\");\n#endif\n    else \n        {if (!db) apop_db_open(NULL);\n        sqlite3_exec(db, query, NULL,NULL, &err);\n\t    ERRCHECK\n        }\n\tfree(query);\n\treturn 0;\n}\n\n/** Dump the results of a query into an array of strings.\n\n\\return\t An \\ref apop_data structure with the <tt>text</tt> element filled.\n\n\\param fmt A <tt>printf</tt>-style SQL query.\n\n\\exception out->error=='q' The database engine was unable to run the query (e.g.,  invalid SQL syntax). Again, a valid query that returns zero rows is not an error, and \\c NULL is returned.\n\\exception out->error=='d' Database error.\n\n\\li If <tt>apop_opts.db_name_column</tt> matches a column of the output table, then that\n    column is used for row names, and therefore will not be included in the <tt>text</tt>.\n\\li <tt>query_output->text</tt> is always a 2-D array of strings, even if the query\n    returns a single column. In that case, use <tt>returned_tab->text[i][0]</tt> (or\n    equivalently, <tt>*returned_tab->text[i]</tt>) to refer to row <tt>i</tt>.\n\\li If an element in the database is \\c NULL, the corresponding cell in the output\n    table will be filled with the text given by \\c apop_opts.nan_string. The default\n    is \\c \"NaN\", but you can set <tt>apop_opts.nan_string = \"whatever you like\"</tt>\n    to change the text to whatever you like.\n\\li Returns \\c NULL if your query is valid but returns zero rows.\n\\li The query can include printf-style format specifiers, such as\n    <tt>apop_query_to_text(\"select name from %s where id=%i;\", tablename, id_number)</tt>.\n\nFor example, the following function will list the tables in an SQLite database (much like you\ncould do from the command line using <tt>sqlite3 dbname.db \".table\"</tt>).\n\n\\include ls_tables.c\n*/\napop_data * apop_query_to_text(const char * fmt, ...){\n    apop_data *out = NULL;\n    Fillin(query, fmt)\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm'){\n#ifdef HAVE_MYSQL\n        out = apop_mysql_query_core(query, process_result_set_chars);\n#else\n        Apop_stopif(1, apop_return_data_error('d'), 0, \"Apophenia was compiled without mysql support.\");\n#endif\n    } else out = apop_sqlite_query_to_text(query);\n    free(query);\n    return out;\n}\n\n//apop_query_to_data callback.\nstatic int db_to_table(void *qinfo, int argc, char **argv, char **column){\n    Apop_stopif(!argv, return -1, apop_errorlevel, \"Got NULL data from SQLite.\");\n    int i, ncfound = 0;\n    callback_t *qi= qinfo;\n    if (qi->firstcall){\n        qi->firstcall--;\n        for(i=0; i<argc; i++)\n            if (apop_opts.db_name_column && !strcasecmp(column[i], apop_opts.db_name_column)){\n                qi->namecol = i;\n                ncfound = 1;\n                break;\n            }\n\t    qi->outdata = argc-ncfound ? apop_data_alloc(1, argc-ncfound) : apop_data_alloc( );\n        for(i=0; i<argc; i++)\n            if (qi->namecol != i)\n                apop_name_add(qi->outdata->names, column[i], 'c');\n    } else \n        if (qi->outdata->matrix)\n            apop_matrix_realloc(qi->outdata->matrix, qi->currentrow+1, qi->outdata->matrix->size2);\n    ncfound =0;\n    for (int jj=0;jj<argc;jj++)\n        if (jj != qi->namecol){\n            double valor = \n                !argv[jj] || !strcmp(argv[jj], \"NULL\")|| \n                (apop_opts.nan_string && !strcasecmp(apop_opts.nan_string, argv[jj]))\n                 ? GSL_NAN : atof(argv[jj]);\n            gsl_matrix_set(qi->outdata->matrix,qi->currentrow,jj-ncfound, valor);\n        } else {\n            apop_name_add(qi->outdata->names, argv[jj], 'r');\n            ncfound = 1;\n        }\n    (qi->currentrow)++;\n\treturn 0;\n}\n\n/** Queries the database and dumps the result into an \\ref apop_data set.\n\n\\param fmt A <tt>printf</tt>-style SQL query.\n\n\\return If no rows are returned, \\c NULL; else an \\ref apop_data set with the data\nin place. Most data will be in the \\c matrix element of the output. Column names are\nappropriately placed. If \\ref apop_opts_type \"apop_opts.db_name_column\" matches one\nof the fields in your query's output (default: \\c row_names), then that column will\nbe used for row names (and therefore will not appear in the \\c matrix).\n\n\\exception out->error=='q' Query error. A valid query that returns no rows is not an error; in that case, you get \\c NULL.\n\n\\li The query can include printf-style\n    format specifiers, such as <tt>apop_query_to_data(\"select age from %s where id=%i;\",\n    tablename, id_number)</tt>.\n\\li Blanks in the database (i.e., <tt> NULL</tt>s) and elements that match \\ref\n    apop_opts_type \"apop_opts.nan_string\" are filled with <tt>NAN</tt>s in the matrix.\n*/ \napop_data * apop_query_to_data(const char * fmt, ...){\n    Fillin(query, fmt)\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        return apop_mysql_query_core(query, process_result_set_data);\n#else\n        Apop_stopif(1, apop_return_data_error('d'), 0, \"Apophenia was compiled without mysql support.\");\n#endif\n\n    //else\n    char *err=NULL;\n    callback_t qinfo = {.firstcall = 1, .namecol=-1};\n\tif (db==NULL) apop_db_open(NULL);\n    sqlite3_exec(db, query,db_to_table,&qinfo, &err); \n    free (query);\n    ERRCHECK_SET_ERROR(qinfo.outdata)\n\treturn qinfo.outdata;\n}\n\n\n    /** \\cond doxy_ignore */\n//These used to do more, but I'll leave them as a macro anyway in case of future expansion.\n#define Store_settings  \\\n    int v = apop_opts.verbose; apop_opts.verbose=0;/*hack to prevent double-printing.*/ \\\n\n#define Restore_settings  \\\n    apop_opts.verbose=v;\n    /** \\endcond */\n\n/** Queries the database and dumps the first column of the result into a \\c gsl_vector.\n\n\\param fmt A <tt>printf</tt>-style SQL query.\n\\return\t A <tt>gsl_vector</tt> holding the first column of the returned matrix. Thus, if your query returns multiple lines, you will get no warning, and the function will return the first in the list.\n\\exception out->error=='q' Query error. A valid query that returns no rows is not an error; in that case, you get \\c NULL.\n\n\\li Uses \\ref apop_query_to_data internally, then throws away all but the first column\n    of the matrix.\n\\li If \\c apop_opts.db_name_column is set, then I'll ignore that column. It gets put\n    into the names of the \\ref apop_data set, and then thrown away when I look at only\n    the \\c gsl_matrix part of that set.\n\\li If the query returns zero rows of data or no columns, the function returns \\c NULL.\n\\li The query can include printf-style format specifiers, such as <tt>apop_query_to_vector(\"select age from %s where id=%i;\", tablename, id_number)</tt>.\n*/\ngsl_vector * apop_query_to_vector(const char * fmt, ...){\n    Fillin(query, fmt)\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        return apop_mysql_query_core(query, process_result_set_vector);\n#else\n        Apop_stopif(1, return NULL, 0, \"Apophenia was compiled without mysql support.\");\n#endif\n    apop_data *d=NULL;\n    gsl_vector *out;\n\tif (db==NULL) apop_db_open(NULL);\n    Store_settings\n\td\t= apop_query_to_data(\"%s\", query);\n    Restore_settings\n    Apop_stopif(!d, return NULL, 2, \"Query [%s] turned up a blank table. Returning NULL.\", query);\n    //else:\n    out = gsl_vector_alloc(d->matrix->size1);\n\tgsl_matrix_get_col(out, d->matrix, 0);\n\tapop_data_free(d);\n    free(query);\n\treturn out;\n}\n\n/** Queries the database, and dumps the result into a single double-precision floating point number.\n\n\\li This calls \\ref apop_query_to_data and returns the (0,0)th element of the returned matrix. Thus, if your query returns multiple lines, you will get no warning, and the function will return the first in the list (which is not always well-defined; maybe use an <tt>order by</tt> clause in your query if you expect multiple lines).\n\n\\li If \\c apop_opts.db_name_column is set, then I'll ignore that column. It gets put\n    into the names of the \\ref apop_data set, and then thrown away when I look at only\n    the \\c gsl_matrix element of that set.\n\\li If the query produces a blank table, returns \\c NAN, and if\n    <tt>apop_opts.verbose>=2</tt>, prints an error.\n\\li The query can include printf-style format specifiers, such as\n    <tt>apop_query_to_float(\"select age from %s where id=%i;\", tablename, id_number)</tt>.\n\\li If the query produces an error, returns \\c NAN, and if <tt>apop_opts.verbose>=0</tt>,\n    prints an error. If you need to distinguish between blank tables, NaNs in the data,\n    and query errors, use \\ref apop_query_to_data.\n\n\\param fmt A <tt>printf</tt>-style SQL query.\n\\return\t\tA \\c double, actually.\n*/\ndouble apop_query_to_float(const char * fmt, ...){\n    double out;\n    Fillin(query, fmt)\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm'){\n#ifdef HAVE_MYSQL\n        out = apop_mysql_query_to_float(query);\n#else\n        Apop_stopif(1, return NAN, 0, \"Apophenia was compiled without mysql support.\");\n#endif\n    } else {\n        apop_data *d=NULL;\n        if (db==NULL) apop_db_open(NULL);\n        Store_settings\n        d = apop_query_to_data(\"%s\", query);\n        Restore_settings\n        Apop_stopif(!d, return GSL_NAN, 2, \"Query [%s] turned up a blank table. Returning NaN.\", query);\n        Apop_stopif(d->error, return GSL_NAN, 0, \"Query [%s] failed. Returning NaN.\", query);\n        out\t= apop_data_get(d);\n        apop_data_free(d);\n    }\n    free(query);\n\treturn out;\n}\n\n/** Query data to an \\c apop_data set, but a mix of names, vectors, matrix elements, and text.\n\nIf you are querying to a matrix and maybe a name, use \\c\napop_query_to_data (and set \\ref apop_opts_type \"apop_opts.db_name_column\" if desired). If querying only text, use \\ref apop_query_to_text. But\nif your data is a mix of text and numbers, use this.\n\nThe first argument is a character string consisting of the letters \\c nvmtw, one for each column of the SQL output, indicating whether the column is a name, vector, matrix column, text column, or weight vector. You can have only one \\c n, one \\c v, and one \\c w. \n\nIf the query produces more columns than there are elements in the column specification, then the remainder are dumped into the text section. If there are fewer columns produced than given in the spec, the additional elements will be allocated but not filled (i.e., they are uninitialized and will have garbage).\n\n\n\\param typelist A string consisting of the letters \\c nvmtw. For example, if your query columns should go into a text column, the vector, the weights, and two matrix columns, this would be \"tvwmm\".\n\\param fmt A <tt>printf</tt>-style SQL query.\n\\exception out->error=='d' Dimension error. Your count of matrix parts didn't match what the query returned.\n\\exception out->error=='q' Query error. A valid query that returns no rows is not an error; in that case, you get \\c NULL.\n\n\\li \\ref apop_opts_type \"apop_opts.db_name_column\" is ignored.  Use the \\c 'n' character\n    to indicate the output column with row names.\n\\li As with the other \\c apop_query_to_... functions, the query can include printf-style\n    format specifiers, such as <tt>apop_query_to_mixed_data(\"tv\", \"select name, age from\n\n*/\napop_data * apop_query_to_mixed_data(const char *typelist, const char * fmt, ...){\n    Fillin(query, fmt)\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        {apop_data* out = apop_mysql_mixed_query(typelist, query);\n        free(query);\n        return out;}\n#else\n        {Apop_notify(0, \"Apophenia was compiled without mysql support.\");\n        return 0;}\n#endif\n    //else\n    apop_data *out = apop_sqlite_multiquery(typelist, query);\n    free(query);\n    return out;\n}\n\n/* Convenience function for extending a string. \n asprintf(%q, \"%s and stuff\", q);\n gives you a memory leak. This takes care of that.\n */\nvoid qxprintf(char **q, char *format, ...){\n    va_list ap;\n    char *r = *q;\n    va_start(ap, format);\n    Apop_stopif(vasprintf(q, format, ap)==-1, , 0, \"Trouble writing to a string.\");\n    va_end(ap);\n    free(r);\n}\n\nstatic void add_a_number (char **q, char *comma, double v){\n    if (gsl_isnan(v))\n        qxprintf(q,\"%s%c NULL \", *q, *comma);\n    else if (isinf(v)==1)\n        qxprintf(q,\"%s%c  'inf'\", *q, *comma);\n    else if (isinf(v)==-1)\n        qxprintf(q,\"%s%c  '-inf' \", *q, *comma);\n    else\n        qxprintf(q,\"%s%c %g \",*q ,*comma, v);\n    *comma = ',';\n}\n\nstatic int run_prepared_statements(apop_data const *set, sqlite3_stmt *p_stmt){\n#if SQLITE_VERSION_NUMBER < 3003009\n     Apop_stopif(1, return -1, 0, \"Attempting to use prepared statements, but using a version of SQLite that doesn't support them.\");\n#else\n    Get_vmsizes(set) //firstcol, msize1, maxsize\n    for (size_t row=0; row < maxsize; row++){\n        size_t field =1;\n        if (set->names && set->names->rowct>row){\n            if (!strlen(set->names->row[row])) field++; //leave NULL and cleared\n            Apop_stopif(sqlite3_bind_text(p_stmt, field++, set->names->row[row], -1, SQLITE_TRANSIENT),\n                    return -1, apop_errorlevel, \n                    \"Something wrong with the row name for line %zu, [%s].\\n\" , row, set->names->row[row]);\n        }\n        if (set->vector && set->vector->size > row)\n                Apop_stopif(sqlite3_bind_double(p_stmt, field++, apop_data_get(set, row, -1)),\n                    return -1, apop_errorlevel, \n                    \"Something wrong with the vector element on line %zu, [%g].\\n\" ,row,  apop_data_get(set, row, -1));\n        if (msize1 > row)\n            for (size_t col=0; col < msize2; col++)\n                Apop_stopif(sqlite3_bind_double(p_stmt, field++, apop_data_get(set, row, col)),\n                    return -1, apop_errorlevel, \n                    \"Something wrong with the matrix element %zu on line %zu, [%g].\\n\" ,col, row,  apop_data_get(set, row, col));\n        if (*set->textsize > row)\n            for (size_t col=0; col < set->textsize[1]; col++){\n                if (!strlen(set->text[row][col]) || (apop_opts.nan_string && !strcasecmp(apop_opts.nan_string, set->text[row][col])))\n                    {field++; continue;} //leave NULL and cleared\n                Apop_stopif(sqlite3_bind_text(p_stmt, field++, set->text[row][col], -1, SQLITE_TRANSIENT),\n                    return -1, apop_errorlevel, \n                    \"Something wrong with a text element at row %zu, col %zu [%s].\\n\" , row, col, set->text[row][col]);\n            }\n        if (set->weights && set->weights->size > row)\n                Apop_stopif(sqlite3_bind_double(p_stmt, field++, gsl_vector_get(set->weights, row)),\n                    return -1, apop_errorlevel, \n                    \"Something wrong with the weight element on line %zu, [%g].\\n\" ,row,  gsl_vector_get(set->weights, row));\n        int err = sqlite3_step(p_stmt);\n        Apop_stopif(err!=0 && err != 101 //0=ok, 101=done\n                    , , 0, \"prepared sqlite insert query gave error code %i.\\n\", err);\n        Apop_stopif(sqlite3_reset(p_stmt), return -1, apop_errorlevel, \"SQLite error.\");\n        Apop_stopif(sqlite3_clear_bindings(p_stmt), return -1, apop_errorlevel, \"SQLite error.\"); //needed for NULLs\n    }\n    Apop_stopif(sqlite3_finalize(p_stmt)!=SQLITE_OK, return -1, apop_errorlevel, \"SQLite error.\");\n    return 0;\n#endif\n}\n\n//users are expected to call apop_data_print.\nint apop_data_to_db(const apop_data *set, const char *tabname, const char output_append){\n    Apop_stopif(!set, return -1, 1, \"you sent me a NULL data set. Database table %s will not be created.\", tabname);\n    int\ti,j; \n    char *q;\n    char comma = ' ';\n    int use_row = (apop_opts.db_name_column && strlen(apop_opts.db_name_column))  && set->names\n                && ((set->matrix && set->names->rowct == set->matrix->size1)\n                    || (set->vector && set->names->rowct == set->vector->size));\n\n    if (!apop_opts.db_engine) get_db_type();\n    if (apop_table_exists(tabname))\n        Asprintf(&q, \" \");\n    else if (apop_opts.db_engine == 'm')\n#ifdef HAVE_MYSQL\n        if (((output_append =='a' || output_append =='A') && apop_table_exists(tabname)))\n            Asprintf(&q, \" \");\n        else {\n            Asprintf(&q, \"create table %s (\", tabname);\n            if (use_row) {\n                qxprintf(&q, \"%s\\n %s varchar(1000)\", q, apop_opts.db_name_column);\n                comma = ',';\n            }\n            if (set->vector){\n                if(!set->names || !set->names->vector) \n                    qxprintf(&q, \"%s%c\\n vector double \", q, comma);\n                else\n                    qxprintf(&q, \"%s%c\\n %s double \", q,comma, set->names->vector);\n                comma = ',';\n            }\n            if (set->matrix)\n                for(i=0;i< set->matrix->size2; i++){\n                    if(!set->names || set->names->colct <= i) \n                        qxprintf(&q, \"%s%c\\n c%i double \", q, comma,i);\n                     else\n                        qxprintf(&q, \"%s%c\\n %s  double \", q, comma, set->names->col[i]);\n                    comma = ',';\n                }\n            for(i=0;i< set->textsize[1]; i++){\n                if (!set->names || set->names->textct <= i)\n                    qxprintf(&q, \"%s%c\\n tc%i varchar(1000) \", q, comma,i);\n                else\n                    qxprintf(&q, \"%s%c\\n %s  varchar(1000) \", q, comma, set->names->text[i]);\n                comma = ',';\n            }\n            apop_query(\"%s); \", q);\n            sprintf(q, \" \");\n        }\n#else \n        Apop_stopif(1, return -1, apop_errorlevel, \"Apophenia was compiled without mysql support.\");\n#endif\n    else {\n        if (db==NULL) apop_db_open(NULL);\n        if (((output_append =='a' || output_append =='A') && apop_table_exists(tabname)) )\n            Asprintf(&q, \" \");\n        else {\n            Asprintf(&q, \"create table %s (\", tabname);\n            if (use_row) {\n                qxprintf(&q, \"%s\\n %s\", q, apop_opts.db_name_column);\n                comma = ',';\n            }\n            if (set->vector){\n                if (!set->names || !set->names->vector) qxprintf(&q, \"%s%c\\n vector numeric\", q, comma);\n                else qxprintf(&q, \"%s%c\\n \\\"%s\\\"\", q, comma, set->names->vector);\n                comma = ',';\n            }\n            if (set->matrix)\n                for(i=0;i< set->matrix->size2; i++){\n                    if(!set->names || set->names->colct <= i) \t\n                        qxprintf(&q, \"%s%c\\n c%i numeric\", q, comma,i);\n                    else\t\t\t\n                        qxprintf(&q, \"%s%c\\n \\\"%s\\\" numeric\", q, comma, set->names->col[i]);\n                    comma = ',';\n                }\n            for(i=0; i< set->textsize[1]; i++){\n                if(!set->names || set->names->textct <= i) qxprintf(&q, \"%s%c\\n tc%i \", q, comma, i);\n                else qxprintf(&q, \"%s%c\\n %s \", q, comma, set->names->text[i]);\n                comma = ',';\n            }\n            if (set->weights) qxprintf(&q, \"%s%c\\n \\\"weights\\\" numeric\", q, comma);\n            qxprintf(&q,\"%s);\",q);\n            apop_query(\"%s\", q);\n            qxprintf(&q,\" \");\n        }\n    }\n\n    Get_vmsizes(set) //firstcol, msize2, maxsize\n    int col_ct = (set->names ? !!set->names->rowct : 0) + set->textsize[1] + msize2 - firstcol + !!set->weights;\n    Apop_stopif(!col_ct, return -1, 0, \"Input data set has zero columns of data (no rownames, text, matrix, vector, or weights). I can't create a table like that, sorry.\");\n    if(apop_use_sqlite_prepared_statements(col_ct)){\n        sqlite3_stmt *statement;\n        Apop_stopif(\n            apop_prepare_prepared_statements(tabname, col_ct, &statement), \n            return -1, 0, \"Trouble preparing prepared statements.\");\n        Apop_stopif(\n            run_prepared_statements(set, statement), \n            return -1, 0, \"error in insertions.\");\n    } else {\n        for(i=0; i< maxsize; i++){\n            comma = ' ';\n            qxprintf(&q, \"%s \\n insert into %s values(\",q, tabname);\n            if (use_row){\n                char *fixed= prep_string_for_sqlite(0, set->names->row[i]);\n                qxprintf(&q, \"%s %s \",q, fixed);\n                free(fixed);\n                comma = ',';\n            }\n            if (set->vector)\n               add_a_number (&q, &comma, gsl_vector_get(set->vector,i));\n            if (set->matrix)\n                for(j=0; j< set->matrix->size2; j++)\n                   add_a_number (&q, &comma, gsl_matrix_get(set->matrix,i,j));\n            for(j=0; j< set->textsize[1]; j++){\n                char *fixed= prep_string_for_sqlite(0, set->text[i][j]);\n                qxprintf(&q, \"%s%c %s \",q, comma,fixed ? fixed : \"''\");\n                free(fixed);\n                comma = ',';\n            }\n            if (set->weights)\n               add_a_number (&q, &comma, gsl_vector_get(set->weights,i));\n            qxprintf(&q,\"%s);\",q);\n            apop_query(\"%s\", q); \n            q[0]='\\0';\n        }\n    }\n\tfree(q);\n    return 0;\n}\n"
  },
  {
    "path": "apop_db_mysql.c",
    "content": "/** \\file apop_db_mysql.c\nThis file is included directly into \\ref apop_db.c. It is read only if APOP_USE_MYSQL is defined.*/\n\n/* Copyright (c) 2006--2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include <mysql.h>\n#include <math.h>\n\nstatic MYSQL *mysql_db; \n\n#define Areweconected(retval) Apop_stopif(!mysql_db, return retval, 0,  \\\n        \"No connection to a mySQL/mariadb database. apop_db_open() failure?\");\n\nstatic char *opt_host_name = NULL;      /* server host (default=localhost) */\nstatic unsigned int opt_port_num = 0;   /* port number (use built-in value) */\nstatic char *opt_socket_name = NULL;    /* socket name (use built-in value) */\nstatic unsigned int opt_flags = 0;      /* connection flags (none) */\n\n#define Apop_mstopif(cond, returnop, str) \\\n    Apop_stopif(cond, returnop, 0,         \\\n         str \"\\n mySQL/mariadb error %u: %s\\n\", mysql_errno (mysql_db), mysql_error (mysql_db));\n\nstatic int apop_mysql_db_open(char const *in){\n    Apop_stopif(!in, return 2, 0, \"MySQL needs a non-NULL db name.\");\n    mysql_db = mysql_init (NULL);\n    Apop_stopif(!mysql_db, return 1, 0, \"mysql_init() failed (probably out of memory)\");\n    Apop_mstopif (!mysql_real_connect (mysql_db, opt_host_name, apop_opts.db_user, apop_opts.db_pass,\n                        in, opt_port_num, opt_socket_name, CLIENT_MULTI_STATEMENTS+opt_flags),\n                mysql_close (mysql_db); return 1, \n                \"mysql_real_connect() failed\");\n    return 0;\n}\n\nstatic void apop_mysql_db_close(int ignoreme){\n    if (mysql_db) mysql_close (mysql_db);\n}\n\n/*\n    //Cut & pasted & cleaned from the mysql manual.\nstatic void process_results(void){\n    else                // mysql_store_result() returned nothing; should it have?\n        Apop_stopif(mysql_field_count(mysql_db) == 0,  , 0, \"apop_query error\");\n    //else query wasn't a select & just didn't return data.\n}\n        */\n\nstatic double apop_mysql_query(char *query){\n    Apop_mstopif(mysql_query(mysql_db, query), return 1, \"apop_mysql_query failed\");\n    MYSQL_RES *result = mysql_store_result(mysql_db);\n    if (result) mysql_free_result(result);\n    return 0;\n}\n\nstatic double apop_mysql_table_exists(char const *table, int delme){\n    Areweconected(GSL_NAN);\n    MYSQL_RES *res_set = mysql_list_tables(mysql_db, table);\n    Apop_mstopif(!mysql_list_tables(mysql_db, table), return GSL_NAN,\n          \"show tables query failed.\");\n    int is_found = mysql_num_rows(res_set);\n    mysql_free_result(res_set);\n    if (!is_found) return 0;\n\n    if (delme =='d' || delme=='D'){\n       char *a_query;\n       Asprintf(&a_query, \"drop table %s\", table);\n       Apop_mstopif(mysql_query (mysql_db, a_query), GSL_NAN, \n            \"table exists, but table dropping failed\");\n    }\n    return 1;\n}\n\n#define check_and_clean(do_if_failure) \\\n    Apop_mstopif( mysql_errno (conn),   \\\n         if (out) do_if_failure; return NULL, \\\n         \"mysql_fetch_row() failed\"); \\\n    return out; \\\n\nstatic int get_name_row(unsigned int *num_fields, MYSQL_FIELD *fields){\n    for(size_t i = 0; i < *num_fields; i++)\n        if (apop_opts.db_name_column && !strcasecmp(fields[i].name, apop_opts.db_name_column)){\n            (*num_fields)--;\n            return i;\n        }\n    return -1;\n}\n\nstatic void * process_result_set_data (MYSQL *conn, MYSQL_RES *res_set) {\n    MYSQL_ROW row;\n    unsigned int num_fields = mysql_num_fields(res_set);\n    unsigned int num_rows = mysql_num_rows (res_set);\n    if (!num_fields || !num_rows) return NULL;\n\n    MYSQL_FIELD *fields = mysql_fetch_fields(res_set);\n    int name_row = get_name_row(&num_fields, fields);\n\n    apop_data *out = apop_data_alloc(0, num_rows, num_fields);\n\n    for(size_t i = 0; i < num_fields+ (name_row>=0); i++)\n        if (i!=name_row) apop_name_add(out->names, fields[i].name, 'c');\n\n    for (int i=0; (row = mysql_fetch_row (res_set)); i++) {\n        int passed_name = 0;\n        for (size_t j = 0; j < mysql_num_fields (res_set); j++){\n            if (j==name_row){\n                apop_name_add(out->names, row[j], 'r');\n                passed_name = 1;\n                continue;\n            }\n            if (!row[j]) apop_data_set(out, i , j-passed_name, NAN);\n            else {\n                char *end = NULL;\n                double num = strtod(row[j], &end);\n                apop_data_set(out, i , j-passed_name, *end ? NAN : num);\n            }\n       }\n    }\n    check_and_clean(apop_data_free(out))\n}\n\nstatic void * process_result_set_vector (MYSQL *conn, MYSQL_RES *res_set) {\n    MYSQL_ROW row;\n    unsigned int num_fields = mysql_num_fields(res_set);\n    unsigned int num_rows = mysql_num_rows (res_set);\n    if (num_fields == 0 || num_rows == 0) return NULL;\n    gsl_vector *out = gsl_vector_alloc(num_rows);\n    for (int j=0; (row = mysql_fetch_row (res_set)); j++){\n        double valor = (!row[0] || !strcmp(row[0], \"NULL\"))\n                           ? GSL_NAN : atof(row[0]);\n        gsl_vector_set(out, j, valor);\n    }\n    check_and_clean(gsl_vector_free(out))\n}\n\nstatic void * process_result_set_chars (MYSQL *conn, MYSQL_RES *res_set) {\n    MYSQL_ROW row;\n    unsigned int total_cols = mysql_num_fields(res_set);\n    unsigned int total_rows = mysql_num_rows(res_set);\n\n    MYSQL_FIELD *fields = mysql_fetch_fields(res_set);\n    int name_row = get_name_row(&total_cols, fields);\n    apop_data *out = apop_text_alloc(NULL, total_rows, total_cols);\n\n    for (size_t i = 0; i < total_cols + (name_row>=0); i++)\n        if (i!=name_row) apop_name_add(out->names, fields[i].name, 't');\n\n    for (int i=0; (row = mysql_fetch_row (res_set)); i++){\n        int passed_name = 0;\n\t\tfor (size_t jj=0; jj<total_cols; jj++){\n            if (jj==name_row){\n                apop_name_add(out->names, row[jj], 'r');\n                passed_name = 1;\n                continue;\n            }\n            apop_text_set(out, i, jj-passed_name, \"%s\", (row[jj]==NULL)?  apop_opts.nan_string : row[jj]);\n\t\t}\n    }\n    check_and_clean(;)\n}\n\nstatic void * apop_mysql_query_core(char *query, void *(*callback)(MYSQL*, MYSQL_RES*)){\n    Areweconected(NULL);\n    apop_data *output = NULL;\n    Apop_mstopif(mysql_query (mysql_db, query), return NULL, \"mysql_query() failed\");\n    MYSQL_RES *res_set = mysql_store_result (mysql_db);\n    Apop_mstopif(!res_set, \n        if (callback == process_result_set_data || callback==process_result_set_data) apop_return_data_error('q') \n            else return NULL, \n            \"mysql_store_result() failed\");\n    if (!res_set->row_count) goto done; //just a blank table.\n    output = callback(mysql_db, res_set);\n\n    done:\n    mysql_free_result (res_set);\n    return output;\n}\n\nstatic double apop_mysql_query_to_float(char *query){\n    Areweconected(GSL_NAN);\n    Apop_mstopif(mysql_query (mysql_db, query) != 0, return GSL_NAN,\n          \"mysql_query() failed\");\n    MYSQL_RES *res_set = mysql_store_result (mysql_db);\n    Apop_mstopif(!res_set, return GSL_NAN, \"mysql_store_result() failed\");\n    if (mysql_num_rows(res_set)==0) return GSL_NAN;\n    MYSQL_ROW row = mysql_fetch_row (res_set);\n    Apop_mstopif(mysql_errno (mysql_db),\n        mysql_free_result (res_set); return GSL_NAN,\n        \"mysql_fetch_row() failed\");\n    double out = atof(row[0]);\n    mysql_free_result (res_set);\n    return out;\n}\n\napop_data* apop_mysql_mixed_query(char const *intypes, char const *query){\n    Areweconected(NULL);\n    apop_data *out = NULL;\n    Apop_mstopif(mysql_query (mysql_db, query), return NULL, \"mysql_query() failed\");\n    MYSQL_RES *res_set = mysql_store_result(mysql_db);\n    MYSQL_ROW row;\n    Apop_mstopif(!res_set, return NULL, \"mysql_store_result() failed\");\n    if (!res_set->row_count) goto done; //just a blank table.\n\n    unsigned int total_cols = mysql_num_fields(res_set);\n    unsigned int total_rows = mysql_num_rows(res_set);\n    if (!total_cols || !total_rows) goto done;\n\n    apop_qt info = { };\n    count_types(&info, intypes); //in apop_db_sqlite.c\n    //intypes[5] === names, vectors, mcols, textcols, weights.\n\n    out = apop_data_alloc(info.intypes[1] ? total_rows : 0, \n                           info.intypes[2] ? total_rows : 0,  \n                           info.intypes[2]);\n\n    int requested = info.intypes[0]+info.intypes[1]+info.intypes[2]+info.intypes[3]+info.intypes[4];\n    int excess = requested - total_cols;\n    Apop_stopif(excess > 0, out->error='d' /*and continue.*/, 1, \n      \"you asked for %i columns in your list of types(%s), but your query produced %u columns. \"\n      \"The remainder will be placed in the text section. Output data set's ->error element set to 'd'.\" , requested, intypes, total_cols);\n    Apop_stopif(excess < 0, out->error='d' /*and continue.*/, 1, \n      \"you asked for %i columns in your list of types(%s), but your query produced %u columns. \"\n      \"Ignoring the last %i type(s) in your list. Output data set's ->error element set to 'd'.\" , requested, intypes, total_cols, -excess);\n\n    if (info.intypes[3]||excess>0) apop_text_alloc(out, total_rows, info.intypes[3] + ((excess > 0) ? excess : 0));\n    if (info.intypes[4]) out->weights = gsl_vector_alloc(total_rows);\n\n    MYSQL_FIELD *fields = mysql_fetch_fields(res_set);\n    for (size_t i=0; i<total_cols; i++){\n        char c = (i < requested) ? intypes[i] : 't';\n        if (c == 't'|| c=='T')\n            apop_name_add(out->names, fields[i].name, 't');\n        else if (c == 'v'|| c=='V')\n            apop_name_add(out->names, fields[i].name, 'v');\n        else if (c == 'm'|| c=='M')\n            apop_name_add(out->names, fields[i].name, 'c');\n    }\n\n    for (int i=0; (row = mysql_fetch_row (res_set)); i++) {\n        int thism=0, thist=0;\n\t\tfor (size_t j=0; j<total_cols; j++){\n            char c = (j < requested) ? intypes[j] : 't';\n            if (c == 'n' || c =='N')\n                apop_name_add(out->names, row[j], 'r');\n            else if (c == 't'|| c=='T')\n                apop_text_set(out, i, thist++, \"%s\", (row[j]==NULL)?  apop_opts.nan_string : row[j]);\n            else if (c == 'v'|| c=='V'){\n                double valor = (!row[j] || !strcmp(row[j], \"NULL\")) ? NAN : atof(row[j]);\n                gsl_vector_set(out->vector, i, valor);\n            } else if (c == 'w'|| c=='W'){\n                double valor = (!row[j] || !strcmp(row[j], \"NULL\")) ? NAN : atof(row[j]);\n                gsl_vector_set(out->weights, i, valor);\n            } else if (c == 'm'|| c=='M')\n                gsl_matrix_set(out->matrix, i , thism++, row[j] ? atof(row[j]): GSL_NAN);\n\t\t}\n    }\n\n    done:\n    mysql_free_result (res_set);\n    return out;\n}\n"
  },
  {
    "path": "apop_db_sqlite.c",
    "content": "/** \\file apop_db_sqlite.c\nThis file is included directly into \\ref apop_db.c.\n\nCopyright (c) 2006--2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n */\n#include <sqlite3.h>\n#include <string.h>\n\nsqlite3\t*db=NULL;\t                //There's only one SQLite database handle. Here it is.\n\n\n\n/** \\cond doxy_ignore */\ntypedef struct StdDevCtx StdDevCtx;\nstruct StdDevCtx {\n    double avg;     /* avg of terms */\n    double avg2;    /* avg of the squares of terms */\n    double avg3;    /* avg of the cube of terms */\n    double avg4;    /* avg of the fourth-power of terms */\n    int cnt;        /* Number of terms counted */\n};\n/** \\endcond */\n\nstatic void twoStep(sqlite3_context *context, int argc, sqlite3_value **argv){\n    if (argc<1) return;\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if (p && argv[0]){\n        double x = sqlite3_value_double(argv[0]);\n        double ratio = p->cnt/(p->cnt+1.0);\n        p->cnt++;\n        p->avg\t*= ratio;\n        p->avg2\t*= ratio;\n        p->avg += x/(p->cnt +0.0);\n        p->avg2 += gsl_pow_2(x)/(p->cnt +0.0);\n    }\n}\n\nstatic void threeStep(sqlite3_context *context, int argc, sqlite3_value **argv){\n    if (argc<1) return;\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if (p && argv[0]){\n        double x = sqlite3_value_double(argv[0]);\n        double ratio =  p->cnt/(p->cnt+1.0);\n        p->cnt++;\n        p->avg\t*= ratio;\n        p->avg2\t*= ratio;\n        p->avg3\t*= ratio;\n        p->avg += x/p->cnt;\n        p->avg2 += gsl_pow_2(x)/p->cnt;\n        p->avg3 += gsl_pow_3(x)/p->cnt;\n    }\n}\n\nstatic void fourStep(sqlite3_context *context, int argc, sqlite3_value **argv){\n    if( argc<1 ) return;\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if (p && argv[0]){\n        double x = sqlite3_value_double(argv[0]);\n        p->cnt++;\n        p->avg = (x + p->avg * (p->cnt-1.))/p->cnt;\n        p->avg2 = (gsl_pow_2(x)+ p->avg2 * (p->cnt-1.))/p->cnt;\n        p->avg3 = (gsl_pow_3(x)+ p->avg3 * (p->cnt-1.))/p->cnt;\n        p->avg4 = (gsl_pow_4(x)+ p->avg4 * (p->cnt-1.))/p->cnt;\n    }\n}\n\nstatic void stdDevFinalizePop(sqlite3_context *context){\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if (p && p->cnt>1)\n      sqlite3_result_double(context, sqrt((p->avg2 - gsl_pow_2(p->avg))));\n    else if (p->cnt == 1)\n      \tsqlite3_result_double(context, 0);\n}\n\nstatic void varFinalizePop(sqlite3_context *context){\n  StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if( p && p->cnt>1 )\n        sqlite3_result_double(context, (p->avg2 - gsl_pow_2(p->avg)));\n    else if (p->cnt == 1)\n    \tsqlite3_result_double(context, 0);\n}\n\nstatic void stdDevFinalize(sqlite3_context *context){\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if( p && p->cnt>1 ){\n      double rCnt = p->cnt;\n      sqlite3_result_double(context,\n         sqrt((p->avg2 - gsl_pow_2(p->avg))*rCnt/(rCnt-1.0)));\n    } else if (p->cnt == 1)\n      \tsqlite3_result_double(context, 0);\n}\n\nstatic void varFinalize(sqlite3_context *context){\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if( p && p->cnt>1 ){\n      double rCnt = p->cnt;\n      sqlite3_result_double(context,\n         (p->avg2 - gsl_pow_2(p->avg))*rCnt/(rCnt-1.0));\n    } else if (p->cnt == 1)\n      \tsqlite3_result_double(context, 0);\n}\n\nstatic void skewFinalize(sqlite3_context *context){\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if( p && p->cnt>1 ){\n      double rCnt = p->cnt;\n      sqlite3_result_double(context,\n         (p->avg3*rCnt - 3*p->avg2*p->avg*rCnt \n                        + 2*rCnt * gsl_pow_3(p->avg)) * rCnt/((rCnt-1.0)*(rCnt-2.0)));\n    } else if (p->cnt == 1)\n      \tsqlite3_result_double(context, 0);\n}\n\nstatic void kurtFinalize(sqlite3_context *context){\n    StdDevCtx *p = sqlite3_aggregate_context(context, sizeof(*p));\n    if( p && p->cnt>1 ){\n      double n = p->cnt;\n      double kurtovern = p->avg4 - 4*p->avg3*p->avg\n                        + 6 * p->avg2*gsl_pow_2(p->avg)\n                        - 3* gsl_pow_4(p->avg);\n      double var = p->avg2 - gsl_pow_2(p->avg);\n      long double coeff0= n*n/(gsl_pow_3(n)*(gsl_pow_2(n)-3*n+3));\n      long double coeff1= n*gsl_pow_2(n-1)+ (6*n-9);\n      long double coeff2= n*(6*n-9);\n      sqlite3_result_double(context, coeff0*(coeff1 * kurtovern + coeff2 * gsl_pow_2(var)));\n    } else if (p->cnt == 1)\n      sqlite3_result_double(context, 0);\n}\n\nstatic void powFn(sqlite3_context *context, int argc, sqlite3_value **argv){\n    double base = sqlite3_value_double(argv[0]);\n    double exp  = sqlite3_value_double(argv[1]);\n    sqlite3_result_double(context, pow(base, exp));\n}\n\nstatic void rngFn(sqlite3_context *context, int argc, sqlite3_value **argv){\n    Staticdef(gsl_rng *, rng, apop_rng_alloc(apop_opts.rng_seed++));\n    //sqlite3_result_double(context, gsl_rng_uniform(rng));\n    sqlite3_result_double(context, gsl_rng_uniform(apop_rng_get_thread(-1)));\n}\n\n#define sqfn(name) static void name##Fn(sqlite3_context *context, int argc, sqlite3_value **argv){ \\\n    sqlite3_result_double(context, name(sqlite3_value_double(argv[0]))); }\n\nsqfn(sqrt) sqfn(exp) sqfn(log) sqfn(log10) sqfn(sin) \nsqfn(cos) sqfn(tan) sqfn(asin) sqfn(acos) sqfn(atan)\n\n\nstatic int apop_sqlite_db_open(char const *filename){\n    int status = sqlite3_open(filename ? filename : \":memory:\", &db);\n    Apop_stopif(status, db=NULL; return status,\n            0, \"The database %s didn't open.\", filename ? filename : \"in memory\");\n\tsqlite3_create_function(db, \"stddev\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &stdDevFinalize);\n\tsqlite3_create_function(db, \"std\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &stdDevFinalizePop);\n\tsqlite3_create_function(db, \"stddev_samp\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &stdDevFinalize);\n\tsqlite3_create_function(db, \"stddev_pop\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &stdDevFinalizePop);\n\tsqlite3_create_function(db, \"var\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &varFinalize);\n\tsqlite3_create_function(db, \"var_samp\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &varFinalize);\n\tsqlite3_create_function(db, \"var_pop\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &varFinalizePop);\n\tsqlite3_create_function(db, \"variance\", 1, SQLITE_ANY, NULL, NULL, &twoStep, &varFinalizePop);\n\tsqlite3_create_function(db, \"skew\", 1, SQLITE_ANY, NULL, NULL, &threeStep, &skewFinalize);\n\tsqlite3_create_function(db, \"kurt\", 1, SQLITE_ANY, NULL, NULL, &fourStep, &kurtFinalize);\n\tsqlite3_create_function(db, \"kurtosis\", 1, SQLITE_ANY, NULL, NULL, &fourStep, &kurtFinalize);\n\tsqlite3_create_function(db, \"ln\", 1, SQLITE_ANY, NULL, &logFn, NULL, NULL);\n\tsqlite3_create_function(db, \"ran\", 0, SQLITE_ANY, NULL, &rngFn, NULL, NULL);\n\tsqlite3_create_function(db, \"pow\", 2, SQLITE_ANY, NULL, &powFn, NULL, NULL);\n\n#define sqlink(name) sqlite3_create_function(db, #name , 1, SQLITE_ANY, NULL, &name##Fn, NULL, NULL);\n    sqlink(sqrt) sqlink(exp) sqlink(sin) sqlink(cos)\n    sqlink(tan) sqlink(asin) sqlink(acos) sqlink(atan) sqlink(log) sqlink(log10)\n\tapop_query(\"pragma short_column_names\");\n    return 0;\n}\n\n/** \\cond doxy_ignore */\ntypedef struct {    //for the apop_query_to_... functions.\n    int       firstcall, namecol;\n    size_t    currentrow;\n    apop_data *outdata;\n} callback_t;\n/** \\endcond */\n\n//This is the callback for apop_query_to_text.\nstatic int db_to_chars(void *qinfo,int argc, char **argv, char **column){\n    callback_t *qi= qinfo;\n    apop_data* d  = qi->outdata; //alias. Allocated in calling fn.\n    int\taddnames = 0, ncshift=0;\n    if (!d->names->textct) addnames++;\n    if (qi->firstcall){\n        qi->firstcall = 0;\n        for(int i=0; i<argc; i++)\n            if (apop_opts.db_name_column && !strcasecmp(column[i], apop_opts.db_name_column)){\n                qi->namecol = i;\n                break;\n            }\n    }\n    int rows = d->textsize[0];\n    int cols = argc - (qi->namecol >= 0);\n    apop_text_alloc(d, rows+1, cols);//doesn't move d.\n    for (size_t jj=0; jj<argc; jj++)\n        if (jj == qi->namecol){\n            apop_name_add(d->names, argv[jj], 'r'); \n            ncshift ++;\n        } else {\n            apop_text_set(d, rows, jj-ncshift, (argv[jj]==NULL)? apop_opts.nan_string: argv[jj]);\n            //Asprintf(&(d->text[rows][jj-ncshift]), \"%s\", (argv[jj]==NULL)? \"NaN\": argv[jj]);\n            if(addnames)\n                apop_name_add(d->names, column[jj], 't'); \n        }\n    return 0;\n}\n\napop_data * apop_sqlite_query_to_text(char *query){\n    char *err = NULL;\n    callback_t qinfo = {.outdata=apop_data_alloc(), .namecol=-1, .firstcall=1};\n    if (db==NULL) apop_db_open(NULL);\n    sqlite3_exec(db, query, db_to_chars, &qinfo, &err); ERRCHECK_SET_ERROR(qinfo.outdata)\n    if (qinfo.outdata->textsize[0]==0){\n        apop_data_free(qinfo.outdata);\n        return NULL;\n    }\n    return qinfo.outdata;\n}\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    apop_data  *d;\n    int        intypes[5];//names, vectors, mcols, textcols, weights.\n    int        current, thisrow, error_thrown;\n    const char *instring;\n} apop_qt;\n/** \\endcond */\n\nstatic void count_types(apop_qt *in, const char *intypes){\n    int i = 0;\n    char c;\n    in->instring = intypes;\n    while ((c=intypes[i++]))\n        if (c=='n'||c=='N')      in->intypes[0]++;\n        else if (c=='v'||c=='V') in->intypes[1]++;\n        else if (c=='m'||c=='M') in->intypes[2]++;\n        else if (c=='t'||c=='T') in->intypes[3]++;\n        else if (c=='w'||c=='W') in->intypes[4]++;\n    if (in->intypes[0]>1)\n        Apop_notify(1, \"You asked apop_query_to_mixed data for multiple row names. I'll ignore all but the last one.\");\n    if (in->intypes[1]>1)\n        Apop_notify(1, \"You asked apop_query_to_mixed for multiple vectors. I'll ignore all but the last one.\");\n    if (in->intypes[4]>1)\n        Apop_notify(1, \"You asked apop_query_to_mixed for multiple weighting vectors. I'll ignore all but the last one.\");\n}\n\nstatic int multiquery_callback(void *instruct, int argc, char **argv, char **column){\n    apop_qt *in = instruct;\n    char c;\n    int thistcol    = 0, \n        thismcol    = 0,\n        colct       = 0,\n        i, addnames = 0;\n    in->thisrow ++;\n    if (!in->d) {\n        in->d = in->intypes[2]\n                ? apop_data_alloc(in->intypes[1], 1, in->intypes[2])\n                : apop_data_alloc(in->intypes[1]);\n        if (in->intypes[4])\n            in->d->weights  = gsl_vector_alloc(1);\n        if (in->intypes[3]){\n            in->d->textsize[0]  = 1;\n            in->d->textsize[1]  = in->intypes[3];\n            in->d->text         = malloc(sizeof(char***));\n        }\n    }\n    if (!(in->d->names->colct + in->d->names->textct + (in->d->names->vector!=NULL)))\n        addnames++;\n    if (in->d->textsize[1]){\n        in->d->textsize[0]         = in->thisrow;\n        in->d->text                = realloc(in->d->text, sizeof(char ***)*in->thisrow);\n        in->d->text[in->thisrow-1] = malloc(sizeof(char**) * in->d->textsize[1]);\n    }\n    if (in->intypes[2])\n        apop_matrix_realloc(in->d->matrix, in->thisrow, in->intypes[2]);\n    for (i=in->current=0; i< argc; i++){\n        c   = in->instring[in->current++];\n        if (c=='n'||c=='N'){\n            apop_name_add(in->d->names, (argv[i]? argv[i] : \"NaN\")  , 'r'); \n            if(addnames)\n                apop_name_add(in->d->names, column[i], 'h'); \n        } else if (c=='v'||c=='V'){\n            apop_vector_realloc(in->d->vector, in->thisrow);\n            apop_data_set(in->d, in->thisrow-1, -1, \n                                    argv[i] ? atof(argv[i]) : GSL_NAN);\n            if(addnames)\n                apop_name_add(in->d->names, column[i], 'v'); \n        } else if (c=='m'||c=='M'){\n            apop_data_set(in->d, in->thisrow-1, thismcol++, \n                                    argv[i] ? atof(argv[i]) : GSL_NAN);\n            if(addnames)\n                apop_name_add(in->d->names, column[i], 'c'); \n        } else if (c=='t'||c=='T'){\n            Asprintf(&(in->d->text[in->thisrow-1][thistcol++]), \"%s\", \n\t\t\t                        argv[i] ? argv[i] : \"NaN\");\n            if(addnames)\n                apop_name_add(in->d->names, column[i], 't'); \n        } else if (c=='w'||c=='W'){\n            apop_vector_realloc(in->d->weights, in->thisrow);\n            gsl_vector_set(in->d->weights, in->thisrow-1, \n                                    argv[i] ? atof(argv[i]) : GSL_NAN);\n        }\n        colct++;\n    }\n    int requested = in->intypes[0]+in->intypes[1]+in->intypes[2]+in->intypes[3]+in->intypes[4];\n      Apop_stopif(colct != requested, in->error_thrown='d'; return 1, 1, \n      \"you asked for %i columns in your list of types(%s), but your query produced %u columns. \"\n      \"The remainder will be placed in the text section. Output data set's ->error element set to 'd'.\" , requested, in->instring, colct);\n    return 0;\n}\n\napop_data *apop_sqlite_multiquery(const char *intypes, char *query){\n    Apop_stopif(!intypes, apop_return_data_error('t'), 0, \"You gave me NULL for the list of input types. I can't work with that.\");\n    Apop_stopif(!query, apop_return_data_error('q'), 0, \"You gave me a NULL query. I can't work with that.\");\n    char *err = NULL;\n    apop_qt info = { };\n    count_types(&info, intypes);\n\tif (!db) apop_db_open(NULL);\n    sqlite3_exec(db, query, multiquery_callback, &info, &err); \n    Apop_stopif(info.error_thrown, if (!info.d) apop_data_alloc(); info.d->error='d'; return info.d,\n            0, \"dimension error\");\n    ERRCHECK_SET_ERROR(info.d)\n\treturn info.d;\n}\n"
  },
  {
    "path": "apop_fexact.c",
    "content": "/** \\file apop_fexact.c\n\n   Fisher's exact test for contingency tables \n\n   This file primarily consists of an algorithm from the ACM, fully\n   documented below. The C code below was cut and pasted from the\n   R project. Thanks, guys.\n\nUn-R-ifying modifications Copyright (c) 2006--2009 by Ben Klemens.\nLicensed under the GPLv2; see COPYING.\n\nR version credits:\nfexact.f -- translated by f2c (version 19971204).\\\\\nRun through a slightly modified version of MM's f2c-clean.\\\\\nHeavily hand-edited by KH and MM.\n */\n\n#include \"apop_internal.h\"\n#include <gsl/gsl_sf.h>\n#include <gsl/gsl_math.h>\n#include <stdbool.h>\n\n/* These are the R-specific items. */\ntypedef enum { FALSE = 0, TRUE /*, MAYBE */ } Rboolean;\nint imax2(int a, int b){return (a>b) ? a : b;}\nint imin2(int a, int b){return (a<b) ? a : b;}\nfloat fmax2(float a, float b){return (a>b) ? a : b;}\nfloat fmin2(float a, float b){return (a<b) ? a : b;}\n/* end R-to-Apophenia additions */\n\nstatic void f2xact(int nrow, int ncol, int *table, int ldtabl,\n\t\t   double *expect, double *percnt, double *emin,\n\t\t   double *prt, double *pre, double *fact, int *ico, int *iro,\n\t\t   int *kyy, int *idif, int *irn, int *key,\n\t\t   int *ldkey, int *ipoin, double *stp, int *ldstp,\n\t\t   int *ifrq, double *LP, double *SP, double *tm,\n\t\t   int *key2, int *iwk, double *rwk);\nstatic double f3xact(int nrow, int *irow, int ncol, int *icol, int ntot,\n\t\t     double *fact, int *ico, int *iro,\n\t\t     int *it, int *lb, int *nr, int *nt, int *nu,\n\t\t     int *itc, int *ist, double *stv, double *alen,\n\t\t     const double *tol);\nstatic double f4xact(int nrow, int *irow, int ncol, int *icol, double dspt,\n\t\t     double *fact, int *icstk, int *ncstk,\n\t\t     int *lstk, int *mstk, int *nstk, int *nrstk, int *irstk,\n\t\t     double *ystk, const double *tol);\nstatic void f5xact(double *pastp, const double *tol, int *kval, int *key,\n\t\t   int *ldkey, int *ipoin, double *stp, int *ldstp,\n\t\t   int *ifrq, int *npoin, int *nr, int *nl, int *ifreq,\n\t\t   int *itop, Rboolean psh);\nstatic Rboolean f6xact(int nrow, int *irow, int *kyy,\n\t\t       int *key, int *ldkey, int *last, int *ipn);\nstatic void f7xact(int nrow, int *imax, int *idif, int *k, int *ks,\n\t\t   int *iflag);\nstatic void f8xact(int *irow, int is, int i1, int izero, int *new);\nstatic double f9xact(int n, int ntot, int *ir, double *fact);\nstatic Rboolean f10act(int nrow, int *irow, int ncol, int *icol, double *val,\n\t\t       double *fact, int *nd, int *ne, int *m);\nstatic void f11act(int *irow, int i1, int i2, int *new);\nstatic int iwork(int iwkmax, int *iwkpt, int number, int itype);\n\nstatic void isort(int *n, int *ix);\nstatic double gammds(double *y, double *p, int *ifault);\n\nthreadlocal bool has_error;\nvoid prterr(int icode, const char *mes) {\n    has_error++;\n    Apop_notify(1, \"FEXACT error %d.\\n%s\", icode, mes);\n}\n\n/* The interface to the original code, which apop_test_fisher_exact calls: */\nstatic void fexact(int *nrow, int *ncol, int *table, int *ldtabl,\n       double *expect, double *percnt, double *emin, double *prt,\n       double *pre, /* new in C : */ int *workspace,\n       /* new arg, was const = 30*/int *mult) {\n\n/*\n  ALGORITHM 643, COLLECTED ALGORITHMS FROM ACM.\n  THIS WORK PUBLISHED IN TRANSACTIONS ON MATHEMATICAL SOFTWARE,\n  VOL. 19, NO. 4, DECEMBER, 1993, PP. 484-488.\n  -----------------------------------------------------------------------\n  Name:\t      FEXACT\n  Purpose:    Computes Fisher's exact test probabilities and a hybrid\n\t      approximation to Fisher exact test probabilities for a\n\t      contingency table using the network algorithm.\n\n  Arguments:\n    NROW    - The number of rows in the table.\t\t\t(Input)\n    NCOL    - The number of columns in the table.\t\t(Input)\n    TABLE   - NROW by NCOL matrix containing the contingency\n              table.\t\t\t\t\t\t(Input)\n    LDTABL  - Leading dimension of TABLE exactly as specified\n              in the dimension statement in the calling\n\t      program.\t\t\t\t\t\t(Input)\n    EXPECT  - Expected value used in the hybrid algorithm for\n\t      deciding when to use asymptotic theory\n\t      probabilities.\t\t\t\t\t(Input)\n\t      If EXPECT <= 0.0 then asymptotic theory probabilities\n\t      are not used and Fisher exact test probabilities are\n\t      computed.\t Otherwise, if PERCNT or more of the cells in\n\t      the remaining table have estimated expected values of\n\t      EXPECT or more, with no remaining cell having expected\n\t      value less than EMIN, then asymptotic chi-squared\n\t      probabilities are used.  See the algorithm section of the\n\t      manual document for details.\n\t      Use EXPECT = 5.0 to obtain the 'Cochran' condition.\n    PERCNT  - Percentage of remaining cells that must have\n              estimated expected values greater than EXPECT\n\t      before asymptotic probabilities can be used.\t(Input)\n\t      See argument EXPECT for details.\n\t      Use PERCNT = 80.0 to obtain the 'Cochran' condition.\n    EMIN    - Minimum cell estimated expected value allowed for\n\t      asymptotic chi-squared probabilities to be used.\t(Input)\n\t      See argument EXPECT for details.\n\t      Use EMIN = 1.0 to obtain the 'Cochran' condition.\n    PRT     - Probability of the observed table for fixed\n              marginal totals.\t\t\t\t\t(Output)\n    PRE     - Table p-value.\t\t\t\t\t(Output)\n\t      PRE is the probability of a more extreme table,\n\t      where `extreme' is in a probabilistic sense.\n\t      If EXPECT < 0 then the Fisher exact probability\n\t      is returned.  Otherwise, an approximation to the\n\t      Fisher exact probability is computed based upon\n\t      asymptotic chi-squared probabilities for ``large''\n\t      table expected values.  The user defines ``large''\n\t      through the arguments EXPECT, PERCNT, and EMIN.\n\n  Remarks:\n  1. For many problems one megabyte or more of workspace can be\n     required.\tIf the environment supports it, the user should begin\n     by increasing the workspace used to 200,000 units.\n  2. In FEXACT, LDSTP = MULT*LDKEY.  The proportion of table space used\n     by STP may be changed by changing the line MULT = 30 below to\n     another value. --> MULT is now an __argument__ of the function\n  3. FEXACT may be converted to single precision by setting IREAL = 3,\n     and converting all DOUBLE PRECISION specifications (except the\n     specifications for RWRK, IWRK, and DWRK) to REAL.\tThis will\n     require changing the names and specifications of the intrinsic\n     functions ALOG, AMAX1, AMIN1, EXP, and REAL.  In addition, the\n     machine specific constants will need to be changed, and the name\n     DWRK will need to be changed to RWRK in the call to F2XACT.\n  4. Machine specific constants are specified and documented in F2XACT.\n     A missing value code is specified in both FEXACT and F2XACT.\n  5. Although not a restriction, is is not generally practical to call\n     this routine with large tables which are not sparse and in\n     which the 'hybrid' algorithm has little effect.  For example,\n     although it is feasible to compute exact probabilities for the\n     table\n\t    1 8 5 4 4 2 2\n\t    5 3 3 4 3 1 0\n\t   10 1 4 0 0 0 0,\n     computing exact probabilities for a similar table which has been\n     enlarged by the addition of an extra row (or column) may not be\n     feasible.\n  -----------------------------------------------------------------------\n  */\n\n    /* To increase the length of the table of past path lengths relative\n       to the length of the hash table, increase MULT.\n    */\n\n    /* AMISS is a missing value indicator which is returned when the\n       probability is not defined.\n    */\n    const double amiss = GSL_NAN;\n    /*\n      Set IREAL = 4 for DOUBLE PRECISION\n      Set IREAL = 3 for SINGLE PRECISION\n    */\n#define i_real 4\n#define i_int  2\n\n    /* System generated locals */\n    int ikh;\n    /* Local variables */\n    int nco, nro, ntot, numb, iiwk, irwk;\n    int i, j, k, kk, ldkey, ldstp, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10;\n    int i3a, i3b, i3c, i9a, iwkmax, iwkpt;\n\n    /* Workspace Allocation  */\n    double *equiv;\n    iwkmax = 2 * (int) (*workspace / 2);\n    equiv = (double *) calloc(iwkmax / 2, sizeof(double));\n\n#define dwrk (equiv)\n#define iwrk ((int *)equiv)\n#define rwrk ((float *)equiv)\n\n    iwkpt = 0;\n\n    Apop_stopif(*nrow > *ldtabl, has_error=true; return, 0, \"NROW must be less than or equal to LDTABL.\");\n\n    ntot = 0;\n    for (i = 0; i < *nrow; ++i) {\n        for (j = 0; j < *ncol; ++j) {\n            if (table[i + j * *ldtabl] < 0)\n            prterr(2, \"All elements of TABLE may not be negative.\");\n            ntot += table[i + j * *ldtabl];\n        }\n    }\n    if (ntot == 0) {\n        prterr(3, \"All elements of TABLE are zero.\\n\"\n               \"PRT and PRE are set to missing values.\");\n        *pre = *prt = amiss;\n        free(equiv);\n        return;\n    }\n\n    /* nco := max(*nrow, *ncol)\n     * nro := min(*nrow, *ncol) */\n    if(*ncol > *nrow) {\n        nco = *ncol;\n        nro = *nrow;\n    } else {\n        nco = *nrow;\n        nro = *ncol;\n    }\n    k = *nrow + *ncol + 1;\n    kk = k * nco;\n\n    ikh = ntot + 1;\n    i1  = iwork(iwkmax, &iwkpt, ikh, i_real);\n    i2  = iwork(iwkmax, &iwkpt, nco, i_int);\n    i3  = iwork(iwkmax, &iwkpt, nco, i_int);\n    i3a = iwork(iwkmax, &iwkpt, nco, i_int);\n    i3b = iwork(iwkmax, &iwkpt, nro, i_int);\n    i3c = iwork(iwkmax, &iwkpt, nro, i_int);\n    ikh = imax2(k * 5 + (kk << 1), nco * 7 + 800);\n    iiwk= iwork(iwkmax, &iwkpt, ikh, i_int);\n    ikh = imax2(nco + 401, k);\n    irwk= iwork(iwkmax, &iwkpt, ikh, i_real);\n\n    /* NOTE:\n       What follows below splits the remaining amount iwkmax - iwkpt of\n       (int) workspace into hash tables as follows.\n           type  size       index\n\t   INT   2 * ldkey  i4 i5 i11\n\t   REAL  2 * ldkey  i8 i9 i10\n\t   REAL  2 * ldstp  i6\n\t   INT   6 * ldstp  i7\n       Hence, we need ldkey times\n           3 * 2 + 3 * 2 * s + 2 * mult * s + 6 * mult\n       chunks of integer memory, where s = sizeof(REAL) / sizeof(INT).\n       If doubles are used and are twice as long as ints, this gives\n           18 + 10 * mult\n       so that the value of ldkey can be obtained by dividing available\n       (int) workspace by this number.\n\n       In fact, because iwork() can actually s * n + s - 1 int chunks\n       when allocating a REAL, we use ldkey = available / numb - 1.\n\n       FIXME:\n       Can we always assume that sizeof(double) / sizeof(int) is 2?\n       */\n\n    if (i_real == 4) \t\t/* Double precision reals */\n        numb = 18 + 10 * *mult;\n    else\t\t\t/* Single precision reals */\n        numb = (*mult << 3) + 12;\n    ldkey = (iwkmax - iwkpt) / numb - 1;\n    ldstp = *mult * ldkey;\n    ikh = ldkey << 1;\ti4  = iwork(iwkmax, &iwkpt, ikh, i_int);\n    ikh = ldkey << 1;\ti5  = iwork(iwkmax, &iwkpt, ikh, i_int);\n    ikh = ldstp << 1;\ti6  = iwork(iwkmax, &iwkpt, ikh, i_real);\n    ikh = ldstp * 6;\ti7  = iwork(iwkmax, &iwkpt, ikh, i_int);\n    ikh = ldkey << 1;\ti8  = iwork(iwkmax, &iwkpt, ikh, i_real);\n    ikh = ldkey << 1;\ti9  = iwork(iwkmax, &iwkpt, ikh, i_real);\n    ikh = ldkey << 1;\ti9a = iwork(iwkmax, &iwkpt, ikh, i_real);\n    ikh = ldkey << 1;\ti10 = iwork(iwkmax, &iwkpt, ikh, i_int);\n\n    /* To convert to double precision, change RWRK to DWRK in the next CALL.\n     */\n    f2xact(*nrow,\n\t   *ncol,\n\t   table,\n\t   *ldtabl,\n\t   expect,\n\t   percnt,\n\t   emin,\n\t   prt,\n\t   pre,\n\t   dwrk + i1,\n\t   iwrk + i2,\n\t   iwrk + i3,\n\t   iwrk + i3a,\n\t   iwrk + i3b,\n\t   iwrk + i3c,\n\t   iwrk + i4,\n\t   &ldkey,\n\t   iwrk + i5,\n\t   dwrk + i6,\n\t   &ldstp,\n\t   iwrk + i7,\n\t   dwrk + i8,\n\t   dwrk + i9,\n\t   dwrk + i9a,\n\t   iwrk + i10,\n\t   iwrk + iiwk,\n\t   dwrk + irwk);\n\n    free(equiv);\n}\n\n#undef rwrk\n#undef iwrk\n#undef dwrk\n\nstatic void f2xact(int nrow, int ncol, int *table, int ldtabl,\n       double *expect, double *percnt, double *emin, double *prt,\n       double *pre, double *fact, int *ico, int *iro, int *kyy,\n       int *idif, int *irn, int *key, int *ldkey, int *ipoin,\n       double *stp, int *ldstp, int *ifrq, double *LP, double *SP,\n       double *tm, int *key2, int *iwk, double *rwk) {\n/* -----------------------------------------------------------------------\n  Name:\t\tF2XACT\n  Purpose:\tComputes Fisher's exact test for a contingency table,\n\t\troutine with workspace variables specified.\n  -----------------------------------------------------------------------\n  */\n    const int imax = INT_MAX;/* the largest representable int on the machine.*/\n\n    /* AMISS is a missing value indicator which is returned when the\n       probability is not defined. */\n    const double amiss = GSL_NAN;\n\n    /* TOL is chosen as the square root of the smallest relative spacing. */\n    const static double tol = 3.45254e-7;\n\n    const char* ch_err_5 =\n\t\"The hash table key cannot be computed because the largest key\\n\"\n\t\"is larger than the largest representable int.\\n\"\n\t\"The algorithm cannot proceed.\\n\"\n\t\"Reduce the workspace size or use another algorithm.\";\n\n    /* Local variables -- changed from \"static\"\n     *  (*does* change results very slightly on i386 linux) */\n    int i, ii, j, k, n,\n\tiflag,ifreq, ikkey, ikstp, ikstp2, ipn, ipo, itop, itp = 0,\n\tjkey, jstp, jstp2, jstp3, jstp4, k1, kb, kd, ks, kval = 0, kmax, last,\n\tncell, ntot, nco, nro, nro2, nrb,\n\ti31, i32, i33, i34, i35, i36, i37, i38, i39,\n\ti41, i42, i43, i44, i45, i46, i47, i48, i310, i311;\n\n    double dspt, d1,dd,df,ddf, drn,dro, obs, obs2, obs3, pastp,pv, tmp=0.;\n\n#ifndef USING_R\n    double d2;\n    int ifault;\n#endif\n    Rboolean nr_gt_nc, maybe_chisq, chisq = FALSE/* -Wall */, psh;\n\n    /* Parameter adjustments */\n    table -= ldtabl + 1;\n    --ico;\n    --iro;\n    --kyy;\n    --idif;\n    --irn;\n\n    --key;\n    --ipoin;\n    --stp;\n    --ifrq;\n    --LP;\n    --SP;\n    --tm;\n    --key2;\n    --iwk;\n    --rwk;\n\n    /* Check table dimensions */\n    Apop_stopif(nrow > ldtabl, has_error=true; return, 0, \"NROW must be less than or equal to LDTABL.\");\n    Apop_stopif(ncol <= 1, has_error=true; return, 0, \"NCOL must be at least 2\");\n\n    /* Initialize KEY array */\n    for (i = 1; i <= *ldkey << 1; ++i) {\n        key[i] = -9999;\n        key2[i] = -9999;\n    }\n\n    nr_gt_nc =  nrow > ncol;\n    /* nco := max(nrow, ncol) : */\n    if (nr_gt_nc) nco = nrow;\n    else          nco = ncol;\n    /* Compute row marginals and total */\n    ntot = 0;\n    for (i = 1; i <= nrow; ++i) {\n        iro[i] = 0;\n        for (j = 1; j <= ncol; ++j) {\n            Apop_stopif(table[i + j * ldtabl] < 0., has_error=true; return,\n                    0, \"All elements of TABLE must be non-negative.\");\n            iro[i] += table[i + j * ldtabl];\n        }\n        ntot += iro[i];\n    }\n\n    if (ntot == 0) {\n        prterr(3, \"All elements of TABLE are zero.\\n\"\n               \"PRT and PRE are set to missing values.\");\n        *pre = *prt = amiss;\n        return;\n    }\n\n    /* Column marginals */\n    for (i = 1; i <= ncol; ++i) {\n        ico[i] = 0;\n        for (j = 1; j <= nrow; ++j)\n            ico[i] += table[j + i * ldtabl];\n    }\n\n    /* sort marginals */\n    isort(&nrow, &iro[1]);\n    isort(&ncol, &ico[1]);\n\n    /*\tDetermine row and column marginals.\n\tDefine max(nrow,ncol) =: nco >= nro := min(nrow,ncol)\n\tnco is defined above\n\n\tSwap marginals if necessary to\tico[1:nco] & iro[1:nro]\n     */\n    if (nr_gt_nc) {\n        nro = ncol;\n        /* Swap marginals */\n        for (i = 1; i <= nco; ++i) {\n            ii = iro[i];\n            if (i <= nro) iro[i] = ico[i];\n            ico[i] = ii;\n        }\n    } else\n        nro = nrow;\n\n    /* Get multiplers for stack */\n    kyy[1] = 1;\n    for (i = 1; i < nro; ++i) {\n        /* Hash table multipliers */\n        if (iro[i] + 1 <= imax / kyy[i]) {\n            kyy[i + 1] = kyy[i] * (iro[i] + 1);\n            j /= kyy[i];\n        }\n        else {\n            prterr(5, ch_err_5);\n            return;\n        }\n    }\n\n    /* Check for Maximum product : */\n    /* original code: if (iro[nro - 1] + 1 > imax / kyy[nro - 1]) */\n    if (iro[nro] + 1 > imax / kyy[nro]) {\n        prterr(501, ch_err_5);\n        return;\n    }\n\n    /* Compute log factorials */\n    fact[0] = 0.;\n    fact[1] = 0.;\n    if(ntot >= 2) fact[2] = log(2.);\n    /* MM: old code assuming log() to be SLOW */\n    for (i = 3; i <= ntot; i += 2) {\n        fact[i] = fact[i - 1] + log((double) i);\n        j = i + 1;\n        if (j <= ntot)\n            fact[j] = fact[i] + fact[2] + fact[j / 2] - fact[j / 2 - 1];\n    }\n    /* Compute obs := observed path length */\n    obs = tol;\n    ntot = 0;\n    for (j = 1; j <= nco; ++j) {\n        dd = 0.;\n        if (nr_gt_nc) {\n            for (i = 1; i <= nro; ++i) {\n            dd += fact[table[j + i * ldtabl]];\n            ntot +=    table[j + i * ldtabl];\n            }\n        } else {\n            for (i = 1, ii = j * ldtabl + 1; i <= nro; i++, ii++) {\n            dd += fact[table[ii]];\n            ntot +=    table[ii];\n            }\n        }\n        obs += fact[ico[j]] - dd;\n    }\n\n    /* Denominator of observed table: DRO */\n    dro = f9xact(nro, ntot, &iro[1], fact);\n    /* improve: the following \"easily\" underflows to zero -- return \"log()\" */\n    *prt = exp(obs - dro);\n    *pre = 0.;\n    itop = 0;\n    maybe_chisq = (*expect > 0.);\n\n    /* Initialize pointers for workspace */\n    /* f3xact */\n    i31 = 1;\n    i32 = i31 + nco;\n    i33 = i32 + nco;\n    i34 = i33 + nco;\n    i35 = i34 + nco;\n    i36 = i35 + nco;\n    i37 = i36 + nco;\n    i38 = i37 + nco;\n    i39 = i38 + 400;\n    i310 = 1;\n    i311 = 1 + 400;\n    /* f4xact */\n    i = nrow + ncol + 1;\n    i41 = 1;\n    i42 = i41 + i;\n    i43 = i42 + i;\n    i44 = i43 + i;\n    i45 = i44 + i;\n    i46 = i45 + i;\n    i47 = i46 + i * nco;\n    i48 = 1;\n\n    /* Initialize pointers */\n    k = nco;\n    last = *ldkey + 1;\n    jkey = *ldkey + 1;\n    jstp = *ldstp + 1;\n    jstp2 = *ldstp * 3 + 1;\n    jstp3 = (*ldstp << 2) + 1;\n    jstp4 = *ldstp * 5 + 1;\n    ikkey = 0;\n    ikstp = 0;\n    ikstp2 = *ldstp << 1;\n    ipo = 1;\n    ipoin[1] = 1;\n    stp[1] = 0.;\n    ifrq[1] = 1;\n    ifrq[ikstp2 + 1] = -1;\n\nOuter_Loop:\n    kb = nco - k + 1;\n    ks = 0;\n    n = ico[kb];\n    kd = nro + 1;\n    kmax = nro;\n    /* IDIF is the difference in going to the daughter */\n    for (i = 1; i <= nro; ++i)\n        idif[i] = 0;\n\n    /* Generate the first daughter */\n    do {\n        --kd;\n        ntot = imin2(n, iro[kd]);\n        idif[kd] = ntot;\n        if (idif[kmax] == 0)\n            --kmax;\n        n -= ntot;\n    } while (n > 0 && kd != 1);\n\n    if (n != 0) /* i.e. kd == 1 */\n\tgoto L310;\n\n    k1 = k - 1;\n    n = ico[kb];\n    ntot = 0;\n    for (i = kb + 1; i <= nco; ++i)\n        ntot += ico[i];\n\nL150:\n    /* Arc to daughter length=ICO[KB] */\n    for (i = 1; i <= nro; ++i)\n        irn[i] = iro[i] - idif[i];\n\n    if (k1 > 1) {\n        /* Sort irn */\n        if (nro == 2) {\n            if (irn[1] > irn[2]) {\n            ii = irn[1]; irn[1] = irn[2]; irn[2] = ii;\n            }\n        } else\n            isort(&nro, &irn[1]);\n\n        /* Adjust start for zero */\n        for (i = 1; i <= nro; ++i) {\n            if (irn[i] != 0)\n            break;\n        }\n        nrb = i;\n    } else \n        nrb = 1;\n    nro2 = nro - nrb + 1;\n\n    /* Some table values */\n    ddf = f9xact(nro,  n,    &idif[1],  fact);\n    drn = f9xact(nro2, ntot, &irn[nrb], fact) - dro + ddf;\n    /* Get hash value */\n    if (k1 > 1) {\n        kval = irn[1];\n        /* Note that with the corrected check at error \"502\",\n         * we won't have overflow in  kval  below : */\n        for (i = 2; i <= nro; ++i)\n            kval += irn[i] * kyy[i];\n\n        /* Get hash table entry */\n        i = kval % (*ldkey << 1) + 1;\n        /* Search for unused location */\n        for (itp = i; itp <= *ldkey << 1; ++itp) {\n            ii = key2[itp];\n            if (ii == kval) {\n                goto L240;\n            } else if (ii < 0) {\n                key2[itp] = kval;\n                LP[itp] = 1.;\n                SP[itp] = 1.;\n                goto L240;\n            }\n        }\n\n        for (itp = 1; itp <= i - 1; ++itp) {\n            ii = key2[itp];\n            if (ii == kval) \n                goto L240;\n            else if (ii < 0) {\n                key2[itp] = kval;\n                LP[itp] = 1.;\n                goto L240;\n            }\n        }\n\n        /* KH\n           prterr(6, \"LDKEY is too small.\\n\"\n           \"It is not possible to give the value of LDKEY required,\\n\"\n           \"but you could try doubling LDKEY (and possibly LDSTP).\");\n           */\n        prterr(6, \"LDKEY is too small for this problem.\\n\"\n               \"Try increasing the size of the workspace.\");\n    }\n\nL240:\n    psh = TRUE;\n    /* Recover pastp */\n    ipn = ipoin[ipo + ikkey];\n    pastp = stp[ipn + ikstp];\n    ifreq = ifrq[ipn + ikstp];\n    /* Compute shortest and longest path */\n    if (k1 > 1) {\n        obs2 = obs - fact[ico[kb + 1]] - fact[ico[kb + 2]] - ddf;\n        for (i = 3; i <= k1; ++i)\n            obs2 -= fact[ico[kb + i]];\n\n        if (LP[itp] > 0.) {\n            dspt = obs - obs2 - ddf;\n            /* Compute longest path */\n            LP[itp] = f3xact(nro2, &irn[nrb], k1, &ico[kb + 1], ntot, fact,\n                      &iwk[i31], &iwk[i32], &iwk[i33], &iwk[i34],\n                      &iwk[i35], &iwk[i36], &iwk[i37], &iwk[i38],\n                      &iwk[i39], &rwk[i310], &rwk[i311], &tol);\n            if(LP[itp] > 0.) {/* can this happen? */\n                printf(\"___ LP[itp=%d] = %g > 0\\n\", itp, LP[itp]);\n                LP[itp] = 0.;\n            }\n\n            /* Compute shortest path -- using  dspt  as offset */\n            SP[itp] = f4xact(nro2, &irn[nrb], k1, &ico[kb + 1], dspt, fact,\n                      &iwk[i47], &iwk[i41], &iwk[i42], &iwk[i43],\n                      &iwk[i44], &iwk[i45], &iwk[i46], &rwk[i48], &tol);\n            /* SP[itp] = fmin2(0., SP[itp] - dspt);*/\n            if(SP[itp] > 0.) { /* can this happen? */\n                printf(\"___ SP[itp=%d] = %g > 0\\n\", itp, SP[itp]);\n                SP[itp] = 0.;\n            }\n\n            /* Use chi-squared approximation? */\n            if (maybe_chisq && (irn[nrb] * ico[kb + 1]) > ntot * *emin) {\n                ncell = 0.;\n                for (i = 0; i < nro2; ++i)\n                    for (j = 1; j <= k1; ++j)\n                        if (irn[nrb + i] * ico[kb + j] >= ntot * *expect)\n                            ncell++;\n\n                if (ncell * 100 >= k1 * nro2 * *percnt) {\n                    tmp = 0.;\n                    for (i = 0; i < nro2; ++i)\n                        tmp += (fact[irn[nrb + i]] -\n                            fact[irn[nrb + i] - 1]);\n                    tmp *= k1 - 1;\n                    for (j = 1; j <= k1; ++j)\n                    tmp += (nro2 - 1) * (fact[ico[kb + j]] -\n                                 fact[ico[kb + j] - 1]);\n                    df = (double) ((nro2 - 1) * (k1 - 1));\n                    tmp += df * 1.83787706640934548356065947281;\n                    tmp -= (nro2 * k1 - 1) * (fact[ntot] - fact[ntot - 1]);\n                    tm[itp] = (obs - dro) * -2. - tmp;\n                } else {\n                    /* tm[itp] set to a flag value */\n                    tm[itp] = -9876.;\n                }\n            } else\n                tm[itp] = -9876.;\n        }\n        obs3 = obs2 - LP[itp];\n        obs2 -= SP[itp];\n        if (tm[itp] == -9876.) \n            chisq = FALSE;\n        else {\n            chisq = TRUE;\n            tmp = tm[itp];\n        }\n    } else {\n        obs2 = obs - drn - dro;\n        obs3 = obs2;\n    }\n\nL300:\n    /* Process node with new PASTP */\n    if (pastp <= obs3)  /* Update pre */\n        *pre += (double) ifreq * exp(pastp + drn);\n    else if (pastp < obs2) {\n        if (chisq) {\n            df = (double) ((nro2 - 1) * (k1 - 1));\n            d1 = fmax2(0., tmp + (pastp + drn) * 2.) / 2.;\n            d2 = df / 2.;\n            pv = 1. - gammds(&d1, &d2, &ifault);\n            *pre += (double) ifreq * exp(pastp + drn) * pv;\n        } else {\n            /* Put daughter on queue */\n            d1 = pastp + ddf;\n            f5xact(&d1, &tol, &kval, &key[jkey], ldkey, &ipoin[jkey],\n               &stp[jstp], ldstp, &ifrq[jstp], &ifrq[jstp2],\n               &ifrq[jstp3], &ifrq[jstp4], &ifreq, &itop, psh);\n            psh = FALSE;\n        }\n    }\n    /* Get next PASTP on chain */\n    ipn = ifrq[ipn + ikstp2];\n    if (ipn > 0) {\n        pastp = stp[ipn + ikstp];\n        ifreq = ifrq[ipn + ikstp];\n        goto L300;\n    }\n    /* Generate a new daughter node */\n    f7xact(kmax, &iro[1], &idif[1], &kd, &ks, &iflag);\n    if (iflag != 1)\n\tgoto L150;\n\n\nL310:\n    /* Go get a new mother from stage K */\n    do {\n        if(!f6xact(nro, &iro[1], &kyy[1], &key[ikkey + 1], ldkey, &last, &ipo))\n            /* Update pointers */\n            goto Outer_Loop;\n\n        /* else : no additional nodes to process */\n        --k;\n        itop = 0;\n        ikkey = jkey - 1;\n        ikstp = jstp - 1;\n        ikstp2 = jstp2 - 1;\n        jkey = *ldkey - jkey + 2;\n        jstp = *ldstp - jstp + 2;\n        jstp2 = (*ldstp << 1) + jstp;\n        for (i = 1; i <= *ldkey << 1; ++i)\n            key2[i] = -9999;\n    } while (k >= 2);\n}/* f2xact() */\n\nstatic double f3xact(int nrow, int *irow, int ncol, int *icol,\n       int ntot, double *fact, int *ico, int *iro, int *it,\n       int *lb, int *nr, int *nt, int *nu, int *itc, int *ist,\n       double *stv, double *alen, const double *tol) {\n/* -----------------------------------------------------------------------\n  Name:\t      F3XACT\n  Purpose:    Computes the longest path length for a given table.\n\n  Arguments:\n    NROW    - The number of rows in the table.\t\t\t(Input)\n    IROW    - Vector of length NROW containing the row sums\n              for the table.\t\t\t\t\t(Input)\n    NCOL    - The number of columns in the table.\t\t(Input)\n    ICOL    - Vector of length K containing the column sums\n              for the table.\t\t\t\t\t(Input)\n    NTOT    - The total count in the table.\t\t\t(Input)\n    FACT    - Vector containing the logarithms of factorials.\t(Input)\n    ICO     - Work vector of length MAX(NROW,NCOL).\n    IRO     - Work vector of length MAX(NROW,NCOL).\n    IT\t    - Work vector of length MAX(NROW,NCOL).\n    LB\t    - Work vector of length MAX(NROW,NCOL).\n    NR\t    - Work vector of length MAX(NROW,NCOL).\n    NT\t    - Work vector of length MAX(NROW,NCOL).\n    NU\t    - Work vector of length MAX(NROW,NCOL).\n    ITC     - Work vector of length 400.\n    IST     - Work vector of length 400.\n    STV     - Work vector of length 400.\n    ALEN    - Work vector of length MAX(NROW,NCOL).\n    TOL     - Tolerance.\t\t\t\t\t(Input)\n\n  Return Value :\n    LP     - The longest path for the table.\t\t\t(Output)\n  -----------------------------------------------------------------------\n  */\n\n    const int ldst = 200;/* half stack size */\n    /* Initialized data */\n    static int nst = 0;\n    static int nitc = 0;\n\n    int i, k;\n    int n11, n12, ii, nn, ks, ic1, ic2, nc1, nn1;\n    int nr1, nco, nct, ipn, irl, key, lev, itp, nro, nrt, kyy, nc1s;\n    double LP, v, val, vmn;\n    Rboolean xmin;\n\n    --stv;\n    --ist;\n    --itc;\n    --nu;\n    --nt;\n    --nr;\n    --lb;\n    --it;\n    --iro;\n    --ico;\n    --icol;\n    --irow;\n\n    if (nrow <= 1) {\t/* nrow is 1 */\n        LP = 0.;\n        if (nrow > 0) \n            for (i = 1; i <= ncol; ++i)\n                LP -= fact[icol[i]];\n        return LP;\n    }\n\n    if (ncol <= 1) {\t/* ncol is 1 */\n        LP = 0.;\n        if (ncol > 0) {\n            for (i = 1; i <= nrow; ++i)\n            LP -= fact[irow[i]];\n        }\n        return LP;\n    }\n\n    /* 2 by 2 table */\n    if (nrow * ncol == 4) {\n        n11 = (irow[1] + 1) * (icol[1] + 1) / (ntot + 2);\n        n12 = irow[1] - n11;\n        return -(fact[n11] + fact[n12] +\n             fact[icol[1] - n11] + fact[icol[2] - n12]);\n    }\n\n    /* ELSE:  larger than 2 x 2 : */\n\n    /* Test for optimal table */\n    val = 0.;\n    if (irow[nrow] <= irow[1] + ncol) \n        xmin = f10act(nrow, &irow[1], ncol, &icol[1], &val, fact,\n\t\t      &lb[1], &nu[1], &nr[1]);\n    else xmin = FALSE;\n    if (! xmin &&  icol[ncol] <= icol[1] + nrow) \n        xmin = f10act(ncol, &icol[1], nrow, &irow[1], &val, fact,\n\t\t      &lb[1], &nu[1], &nr[1]);\n    if (xmin)\n        return  - val;\n\n    /* Setup for dynamic programming */\n\n    for (i = 0; i <= ncol; ++i)\n        alen[i] = 0.;\n    for (i = 1; i <= 2*ldst; ++i)\n        ist[i] = -1;\n\n    nn = ntot;\n    /* Minimize ncol */\n    if (nrow >= ncol) {\n        nro = nrow;\n        nco = ncol;\n        ico[1] = icol[1];\n        nt[1] = nn - ico[1];\n        for (i = 2; i <= ncol; ++i) {\n            ico[i] = icol[i];\n            nt[i] = nt[i - 1] - ico[i];\n        }\n        for (i = 1; i <= nrow; ++i)\n            iro[i] = irow[i];\n    } else {\n        nro = ncol;\n        nco = nrow;\n        ico[1] = irow[1];\n        nt[1] = nn - ico[1];\n        for (i = 2; i <= nrow; ++i) {\n            ico[i] = irow[i];\n            nt[i] = nt[i - 1] - ico[i];\n        }\n        for (i = 1; i <= ncol; ++i)\n            iro[i] = icol[i];\n    }\n\n    nc1s = nco - 1;\n    kyy = ico[nco] + 1;\n    /* Initialize pointers */\n    vmn = 1e100;/* to contain min(v..) */\n    irl = 1;\n    ks = 0;\n    k = ldst;\n\nLnewNode: /* Setup to generate new node */\n\n    lev = 1;\n    nr1 = nro - 1;\n    nrt = iro[irl];\n    nct = ico[1];\n    lb[1] = (int) ((((double) nrt + 1) * (nct + 1)) /\n\t\t    (double) (nn + nr1 * nc1s + 1) - *tol) - 1;\n    nu[1] = (int) ((((double) nrt + nc1s) * (nct + nr1)) /\n\t\t    (double) (nn + nr1 + nc1s)) - lb[1] + 1;\n    nr[1] = nrt - lb[1];\n\nLoopNode: /* Generate a node */\n    --nu[lev];\n    if (nu[lev] == 0) {\n\tif (lev == 1)\n\t    goto L200;\n\n\t--lev;\n\tgoto LoopNode;\n    }\n    ++lb[lev];\n    --nr[lev];\n\n    while(1) {\n        alen[lev] = alen[lev - 1] + fact[lb[lev]];\n        if (lev >= nc1s)\n            break;\n\n        nn1 = nt[lev];\n        nrt = nr[lev];\n        ++lev;\n        nc1 = nco - lev;\n        nct = ico[lev];\n        lb[lev] = (int) ((((double) nrt + 1) * (nct + 1)) /\n                  (double) (nn1 + nr1 * nc1 + 1) - *tol);\n        nu[lev] = (int) ((((double) nrt + nc1) * (nct + nr1)) /\n                  (double) (nn1 + nr1 + nc1) - lb[lev] + 1);\n        nr[lev] = nrt - lb[lev];\n    }\n    alen[nco] = alen[lev] + fact[nr[lev]];\n    lb[nco] = nr[lev];\n\n    v = val + alen[nco];\n\n    if (nro == 2) { /* Only 1 row left */\n        v += fact[ico[1] - lb[1]] + fact[ico[2] - lb[2]];\n        for (i = 3; i <= nco; ++i)\n            v += fact[ico[i] - lb[i]];\n\n        if (v < vmn)\n            vmn = v;\n    } else if (nro == 3 && nco == 2) { /* 3 rows and 2 columns */\n\tnn1 = nn - iro[irl] + 2;\n\tic1 = ico[1] - lb[1];\n\tic2 = ico[2] - lb[2];\n\tn11 = (iro[irl + 1] + 1) * (ic1 + 1) / nn1;\n\tn12 = iro[irl + 1] - n11;\n\tv += fact[n11] + fact[n12] + fact[ic1 - n11] + fact[ic2 - n12];\n\tif (v < vmn)\n\t    vmn = v;\n\n    } else { /* Column marginals are new node */\n\n\tfor (i = 1; i <= nco; ++i)\n\t    it[i] = imax2(ico[i] - lb[i], 0);\n\t\n\t/* Sort column marginals it[] : */\n\tif (nco == 2) {\n\t    if (it[1] > it[2]) { /* swap */\n\t\tii = it[1]; it[1] = it[2]; it[2] = ii;\n\t    }\n\t} else\n\t    isort(&nco, &it[1]);\n\n\t/* Compute hash value */\n\tkey = it[1] * kyy + it[2];\n\tfor (i = 3; i <= nco; ++i) {\n\t    key = it[i] + key * kyy;\n\t}\n\tif (key < -1){\n        if (apop_opts.verbose)\n\t        printf(\"Bug in FEXACT: gave negative key.\\n\");\n        return -1;\n    }\n\t/* Table index */\n\tipn = key % ldst + 1;\n\t/* Find empty position */\n\tfor (itp = ipn, ii = ks + ipn; itp <= ldst; ++itp, ++ii) {\n\t    if (ist[ii] < 0) {\n\t\tgoto L180;\n\t    } else if (ist[ii] == key) {\n\t\tgoto L190;\n\t    }\n\t}\n\n\tfor (itp = 1, ii = ks + 1; itp <= ipn - 1; ++itp, ++ii) {\n\t    if (ist[ii] < 0) {\n\t\tgoto L180;\n\t    } else if (ist[ii] == key) {\n\t\tgoto L190;\n\t    }\n\t}\n\n\t/* this happens less, now that we check for negative key above: */\n\tApop_stopif(1, has_error=true; return GSL_NAN, 0, \"Stack length exceeded in f3xact. This problem should not occur.\");\n\nL180: /* Push onto stack */\n\tist[ii] = key;\n\tstv[ii] = v;\n\t++nst;\n\tii = nst + ks;\n\titc[ii] = itp;\n\tgoto LoopNode;\n\nL190: /* Marginals already on stack */\n\tstv[ii] = fmin2(v, stv[ii]);\n    }\n    goto LoopNode;\n\n\nL200: /* Pop item from stack */\n    if (nitc > 0) {\n\t/* Stack index */\n\titp = itc[nitc + k] + k;\n\t--nitc;\n\tval = stv[itp];\n\tkey = ist[itp];\n\tist[itp] = -1;\n\t/* Compute marginals */\n\tfor (i = nco; i >= 2; --i) {\n\t    ico[i] = key % kyy;\n\t    key /= kyy;\n\t}\n\tico[1] = key;\n\t/* Set up nt array */\n\tnt[1] = nn - ico[1];\n\tfor (i = 2; i <= nco; ++i)\n\t    nt[i] = nt[i - 1] - ico[i];\n\n\t/* Test for optimality (L90) */\n\tif (iro[nro] <= iro[irl] + nco) {\n\t    xmin = f10act(nro, &iro[irl], nco, &ico[1], &val, fact,\n\t\t\t  &lb[1], &nu[1], &nr[1]);\n\t} else xmin = FALSE;\n\n\tif (!xmin && ico[nco] <= ico[1] + nro)\n\t    xmin = f10act(nco, &ico[1], nro, &iro[irl], &val, fact,\n\t\t\t  &lb[1], &nu[1], &nr[1]);\n\tif (xmin) {\n\t    if (vmn > val)\n\t\tvmn = val;\n\t    goto L200;\n\t}\n\telse goto LnewNode;\n\n    } else if (nro > 2 && nst > 0) {\n        /* Go to next level */\n        nitc = nst;\n        nst = 0;\n        k = ks;\n        ks = ldst - ks;\n        nn -= iro[irl];\n        ++irl;\n        --nro;\n        goto L200;\n    }\n    return  - vmn;\n}\n\nstatic double f4xact(int nrow, int *irow, int ncol, int *icol, double dspt,\n       double *fact, int *icstk, int *ncstk, int *lstk, int *mstk,\n       int *nstk, int *nrstk, int *irstk, double *ystk, const double *tol) {\n/* -----------------------------------------------------------------------\n  Name:\t      F4XACT\n  Purpose:    Computes the shortest path length for a given table.\n\n  Arguments:\n     NROW   - The number of rows in the table.\t(Input)\n     IROW   - Vector of length NROW containing the row sums for the\n\t      table.  (Input)\n     NCOL   - The number of columns in the table.  (Input)\n     ICOL   - Vector of length K containing the column sums for the\n\t      table.  (Input)\n     DSPT   - \"offset\"  for SP computation\n     FACT   - Vector containing the logarithms of factorials.  (Input)\n     ICSTK  - NCOL by NROW+NCOL+1 work array.\n     NCSTK  - Work vector of length NROW+NCOL+1.\n     LSTK   - Work vector of length NROW+NCOL+1.\n     MSTK   - Work vector of length NROW+NCOL+1.\n     NSTK   - Work vector of length NROW+NCOL+1.\n     NRSTK  - Work vector of length NROW+NCOL+1.\n     IRSTK  - NROW by MAX(NROW,NCOL) work array.\n     YSTK   - Work vector of length NROW+NCOL+1.\n     TOL    - Tolerance.\t\t\t\t\t(Input)\n\n  Return Value :\n\n    SP\t    - The shortest path for the table.\t\t\t(Output)\n  ----------------------------------------------------------------------- */\n\n    int i, j, k, l, m, n, ic1, ir1, ict, irt, istk, nco, nro;\n    double y, amx, SP;\n\n    /* Take care of the easy cases first */\n    if (nrow == 1) {\n        SP = 0.;\n        for (i = 0; i < ncol; ++i)\n            SP -= fact[icol[i]];\n        return SP;\n    }\n    if (ncol == 1) {\n        SP = 0.;\n        for (i = 0; i < nrow; ++i)\n            SP -= fact[irow[i]];\n        return SP;\n    }\n    if (nrow * ncol == 4) {\n        if (irow[1] <= icol[1])\n            return -(fact[irow[1]] + fact[icol[1]] + fact[icol[1] - irow[1]]);\n        else\n            return -(fact[icol[1]] + fact[irow[1]] + fact[irow[1] - icol[1]]);\n    }\n\n    /* Parameter adjustments */\n    irstk -= nrow + 1;\n    icstk -= ncol + 1;\n\n    --nrstk;\n    --ncstk;\n    --lstk;\n    --mstk;\n    --nstk;\n    --ystk;\n\n    /* initialization before loop */\n    for (i = 1; i <= nrow; ++i)\n        irstk[i + nrow] = irow[nrow - i];\n\n    for (j = 1; j <= ncol; ++j)\n        icstk[j + ncol] = icol[ncol - j];\n\n    nro = nrow;\n    nco = ncol;\n    nrstk[1] = nro;\n    ncstk[1] = nco;\n    ystk[1] = 0.;\n    y = 0.;\n    istk = 1;\n    l = 1;\n    amx = 0.;\n    SP = dspt;\n\n    /* First LOOP */\n    do {\n\tir1 = irstk[istk * nrow + 1];\n\tic1 = icstk[istk * ncol + 1];\n\tif (ir1 > ic1) {\n\t    if (nro >= nco) {\n            m = nco - 1;\tn = 2;\n\t    } else {\n            m = nro;\tn = 1;\n\t    }\n\t} else if (ir1 < ic1) {\n\t    if (nro <= nco) {\n            m = nro - 1;\tn = 1;\n\t    } else {\n            m = nco;\tn = 2;\n\t    }\n\t} else {\n\t    if (nro <= nco) {\n            m = nro - 1;\tn = 1;\n\t    } else {\n            m = nco - 1;\tn = 2;\n\t    }\n\t}\n\n    L60:\n\tif (n == 1) {\n\t    i = l; j = 1;\n\t} else {\n\t    i = 1; j = l;\n\t}\n\n\tirt = irstk[i + istk * nrow];\n\tict = icstk[j + istk * ncol];\n\ty += fact[imin2(irt, ict)];\n\tif (irt == ict) {\n\t    --nro;\n\t    --nco;\n\t    f11act(&irstk[istk * nrow + 1], i, nro,\n\t\t   &irstk[(istk + 1) * nrow + 1]);\n\t    f11act(&icstk[istk * ncol + 1], j, nco,\n\t\t   &icstk[(istk + 1) * ncol + 1]);\n\t} else if (irt > ict) {\n\t    --nco;\n\t    f11act(&icstk[istk * ncol + 1], j, nco,\n\t\t   &icstk[(istk + 1) * ncol + 1]);\n\t    f8xact(&irstk[istk * nrow + 1], irt - ict, i, nro,\n\t\t   &irstk[(istk + 1) * nrow + 1]);\n\t} else {\n\t    --nro;\n\t    f11act(&irstk[istk * nrow + 1], i, nro,\n\t\t   &irstk[(istk + 1) * nrow + 1]);\n\t    f8xact(&icstk[istk * ncol + 1], ict - irt, j, nco,\n\t\t   &icstk[(istk + 1) * ncol + 1]);\n\t}\n\n\tif (nro == 1) {\n\t    for (k = 1; k <= nco; ++k)\n            y += fact[icstk[k + (istk + 1) * ncol]];\n\t    break;\n\t}\n\tif (nco == 1) {\n\t    for (k = 1; k <= nro; ++k)\n            y += fact[irstk[k + (istk + 1) * nrow]];\n\t    break;\n\t}\n\n\tlstk[istk] = l;\n\tmstk[istk] = m;\n\tnstk[istk] = n;\n\t++istk;\n\tnrstk[istk] = nro;\n\tncstk[istk] = nco;\n\tystk[istk] = y;\n\tl = 1;\n    } while(1);/* end do */\n\n    if (y > amx) {\n        amx = y;\n        if (SP - amx <= *tol)\n            return -dspt;\n    }\n\n    do {\n\t--istk;\n\tif (istk == 0) {\n\t    SP -= amx;\n\t    if (SP - amx <= *tol)\n\t\treturn -dspt;\n\t    else\n\t\treturn SP - dspt;\n\t}\n\tl = lstk[istk] + 1;\n\n\tfor(;; ++l) {\n\t    if (l > mstk[istk])\tbreak;\n\n\t    n = nstk[istk];\n\t    nro = nrstk[istk];\n\t    nco = ncstk[istk];\n\t    y = ystk[istk];\n\t    if (n == 1) {\n\t\tif (irstk[l\t+ istk * nrow] <\n\t\t    irstk[l - 1 + istk * nrow])\tgoto L60;\n\t    }\n\t    else if (n == 2) {\n\t\tif (icstk[l\t+ istk * ncol] <\n\t\t    icstk[l - 1 + istk * ncol])\tgoto L60;\n\t    }\n\t}\n    } while(1);\n}\n\n\nvoid f5xact(double *pastp, const double *tol, int *kval, int *key, int *ldkey,\n       int *ipoin, double *stp, int *ldstp, int *ifrq, int *npoin,\n       int *nr, int *nl, int *ifreq, int *itop, Rboolean psh) {\n/* -----------------------------------------------------------------------\n  Name:\t      F5XACT aka \"PUT\"\n  Purpose:    Put node on stack in network algorithm.\n\n  Arguments:\n     PASTP  - The past path length.\t\t\t\t(Input)\n     TOL    - Tolerance for equivalence of past path lengths.  \t(Input)\n     KVAL   - Key value.  \t\t\t\t\t(Input)\n     KEY    - Vector of length LDKEY containing the key values.\t(in/out)\n     LDKEY  - Length of vector KEY.  \t\t\t\t(Input)\n     IPOIN  - Vector of length LDKEY pointing to the\n\t      linked list of past path lengths.  \t\t(in/out)\n     STP    - Vector of length LSDTP containing the\n\t      linked lists of past path lengths.  \t\t(in/out)\n     LDSTP  - Length of vector STP.  \t\t\t\t(Input)\n     IFRQ   - Vector of length LDSTP containing the past path\n\t      frequencies.  \t\t\t\t\t(in/out)\n     NPOIN  - Vector of length LDSTP containing the pointers to\n\t      the next past path length.  \t\t\t(in/out)\n     NR\t    - Vector of length LDSTP containing the right object\n\t      pointers in the tree of past path lengths.        (in/out)\n     NL\t    - Vector of length LDSTP containing the left object\n\t      pointers in the tree of past path lengths.        (in/out)\n     IFREQ  - Frequency of the current path length.             (Input)\n     ITOP   - Pointer to the top of STP.  \t\t\t(Input)\n     PSH    - Logical.\t\t \t\t\t\t(Input)\n\t      If PSH is true, the past path length is found in the\n\t      table KEY.  Otherwise the location of the past path\n\t      length is assumed known and to have been found in\n\t      a previous call. ==>>>>> USING \"static\" variables\n  ----------------------------------------------------------------------- */\n\n    static int itmp, ird, ipn, itp; /* << *need* static, see PSH above */\n    double test1, test2;\n\n    --nl;\n    --nr;\n    --npoin;\n    --ifrq;\n    --stp;\n\n    /* Function Body */\n    if (psh) {\n\t/* Convert KVAL to int in range 1, ..., LDKEY. */\n\tird = *kval % *ldkey;\n\t/* Search for an unused location */\n\tfor (itp = ird; itp < *ldkey; ++itp) {\n\t    if (key[itp] == *kval)\n\t\tgoto L40;\n\n\t    if (key[itp] < 0)\n\t\tgoto L30;\n\t}\n\tfor (itp = 0; itp < ird; ++itp) {\n\t    if (key[itp] == *kval)\n\t\tgoto L40;\n\n\t    if (key[itp] < 0)\n\t\tgoto L30;\n\t}\n\t/* Return if KEY array is full */\n\t/* KH\n\t  prterr(6, \"LDKEY is too small for this problem.\\n\"\n\t  \"It is not possible to estimate the value of LDKEY \"\n\t  \"required,\\n\"\n\t  \"but twice the current value may be sufficient.\");\n\t  */\n\tprterr(6, \"LDKEY is too small for this problem.\\n\"\n\t       \"Try increasing the size of the workspace.\");\n\n\nL30: /* Update KEY */\n\n\tkey[itp] = *kval;\n\t++(*itop);\n\tipoin[itp] = *itop;\n\t/* Return if STP array full */\n\tif (*itop > *ldstp) {\n\t    /* KH\n\t       prterr(7, \"LDSTP is too small for this problem.\\n\"\n\t       \"It is not possible to estimate the value of LDSTP \"\n\t       \"required,\\n\"\n\t       \"but twice the current value may be sufficient.\");\n\t       */\n\t    prterr(7, \"LDSTP is too small for this problem.\\n\"\n\t\t   \"Try increasing the size of the workspace.\");\n\t}\n\t/* Update STP, etc. */\n\tnpoin[*itop] = -1;\n\tnr   [*itop] = -1;\n\tnl   [*itop] = -1;\n\tstp  [*itop] = *pastp;\n\tifrq [*itop] = *ifreq;\n\treturn;\n    }\n\nL40: /* Find location, if any, of pastp */\n\n    ipn = ipoin[itp];\n    test1 = *pastp - *tol;\n    test2 = *pastp + *tol;\n\n    do {\n\tif (stp[ipn] < test1)\n\t    ipn = nl[ipn];\n\telse if (stp[ipn] > test2)\n\t    ipn = nr[ipn];\n\telse {\n\t    ifrq[ipn] += *ifreq;\n\t    return;\n\t}\n    } while (ipn > 0);\n\n    /* Return if STP array full */\n    ++(*itop);\n    if (*itop > *ldstp) {\n\t/*\n\t  prterr(7, \"LDSTP is too small for this problem.\\n\"\n\t  \"It is not possible to estimate the value of LDSTP \"\n\t  \"required,\\n\"\n\t  \"but twice the current value may be sufficient.\");\n\t  */\n\tprterr(7, \"LDSTP is too small for this problem.\\n\"\n\t       \"Try increasing the size of the workspace.\");\n\treturn;\n    }\n\n    /* Find location to add value */\n    ipn = ipoin[itp];\n    itmp = ipn;\n\nL60:\n    if (stp[ipn] < test1) {\n\titmp = ipn;\n\tipn = nl[ipn];\n\tif (ipn > 0)\n\t    goto L60;\n\t/* else */\n\tnl[itmp] = *itop;\n    }\n    else if (stp[ipn] > test2) {\n\titmp = ipn;\n\tipn = nr[ipn];\n\tif (ipn > 0)\n\t    goto L60;\n\t/* else */\n\tnr[itmp] = *itop;\n    }\n    /* Update STP, etc. */\n    npoin[*itop] = npoin[itmp];\n    npoin[itmp] = *itop;\n    stp\t [*itop] = *pastp;\n    ifrq [*itop] = *ifreq;\n    nl\t [*itop] = -1;\n    nr\t [*itop] = -1;\n}\n\n\nRboolean f6xact(int nrow, int *irow, int *kyy, int *key, int *ldkey, int *last, int *ipn) {\n/* -----------------------------------------------------------------------\n  Name:\t      F6XACT  aka \"GET\"\n  Purpose:    Pop a node off the stack.\n\n  Arguments:\n    NROW    - The number of rows in the table.\t\t\t(Input)\n    IROW    - Vector of length nrow containing the row sums on\n              output.\t\t\t\t\t\t(Output)\n    KYY     - Constant mutlipliers used in forming the hash\n              table key.\t\t\t\t\t(Input)\n    KEY     - Vector of length LDKEY containing the hash table\n              keys.\t\t\t\t\t\t(In/out)\n    LDKEY   - Length of vector KEY.\t\t\t\t(Input)\n    LAST    - Index of the last key popped off the stack.\t(In/out)\n    IPN     - Pointer to the linked list of past path lengths.\t(Output)\n\n  Return value :\n    TRUE if there are no additional nodes to process.           (Output)\n  ----------------------------------------------------------------------- */\n    int kval, j;\n\n    --key;\n\nL10:\n    ++(*last);\n    if (*last <= *ldkey) {\n\tif (key[*last] < 0)\n\t    goto L10;\n\n\t/* Get KVAL from the stack */\n\tkval = key[*last];\n\tkey[*last] = -9999;\n\tfor (j = nrow-1; j > 0; j--) {\n\t    irow[j] = kval / kyy[j];\n\t    kval -= irow[j] * kyy[j];\n\t}\n\tirow[0] = kval;\n\t*ipn = *last;\n\treturn FALSE;\n    } else {\n\t*last = 0;\n\treturn TRUE;\n    }\n}\n\n\nvoid f7xact(int nrow, int *imax, int *idif, int *k, int *ks, int *iflag) {\n/* -----------------------------------------------------------------------\n  Name:\t      F7XACT\n  Purpose:    Generate the new nodes for given marginal totals.\n\n  Arguments:\n    NROW    - The number of rows in the table.\t\t\t(Input)\n    IMAX    - The row marginal totals.\t\t\t\t(Input)\n    IDIF    - The column counts for the new column.\t\t(in/out)\n    K\t    - Indicator for the row to decrement.\t\t(in/out)\n    KS\t    - Indicator for the row to increment.\t\t(in/out)\n    IFLAG   - Status indicator.\t\t\t\t\t(Output)\n\t      If IFLAG is zero, a new table was generated.  For\n\t      IFLAG = 1, no additional tables could be generated.\n  ----------------------------------------------------------------------- */\n    int i, m, kk, mm;\n\n    /* Parameter adjustments */\n    --idif;\n    --imax;\n\n    /* Function Body */\n    *iflag = 0;\n    /* Find node which can be incremented, ks */\n    if (*ks == 0)\n\tdo {\n\t    ++(*ks);\n\t} while (idif[*ks] == imax[*ks]);\n\n    /* Find node to decrement (>ks) */\n    if (idif[*k] > 0 && *k > *ks) {\n\t--idif[*k];\n\tdo {\n\t    --(*k);\n\t} while(imax[*k] == 0);\n\n\tm = *k;\n\n\t/* Find node to increment (>=ks) */\n\twhile (idif[m] >= imax[m]) {\n\t    --m;\n\t}\n\t++idif[m];\n\t/* Change ks */\n\tif (m == *ks && idif[m] == imax[m])\n\t    *ks = *k;\n    }\n    else {\n Loop:\n\t/* Check for finish */\n\tfor (kk = *k + 1; kk <= nrow; ++kk) {\n\t    if (idif[kk] > 0) {\n\t\tgoto L70;\n\t    }\n\t}\n\t*iflag = 1;\n\treturn;\n\n L70:\n\t/* Reallocate counts */\n\tmm = 1;\n\tfor (i = 1; i <= *k; ++i) {\n\t    mm += idif[i];\n\t    idif[i] = 0;\n\t}\n\t*k = kk;\n\n\tdo {\n\t    --(*k);\n\t    m = imin2(mm, imax[*k]);\n\t    idif[*k] = m;\n\t    mm -= m;\n\t} while (mm > 0 && *k != 1);\n\n\t/* Check that all counts reallocated */\n\tif (mm > 0) {\n\t    if (kk != nrow) {\n\t\t*k = kk;\n\t\tgoto Loop;\n\t    }\n\t    *iflag = 1;\n\t    return;\n\t}\n\t/* Get ks */\n\t--idif[kk];\n\t*ks = 0;\n\tdo {\n\t    ++(*ks);\n\t    if (*ks > *k) {\n\t\treturn;\n\t    }\n\t} while (idif[*ks] >= imax[*ks]);\n    }\n}\n\n\nvoid f8xact(int *irow, int is, int i1, int izero, int *new) {\n/* -----------------------------------------------------------------------\n  Name:\t      F8XACT\n  Purpose:    Routine for reducing a vector when there is a zero\n\t      element.\n  Arguments:\n     IROW   - Vector containing the row counts.\t\t\t(Input)\n     IS\t    - Indicator.\t\t\t\t\t(Input)\n     I1\t    - Indicator.\t\t\t\t\t(Input)\n     IZERO  - Position of the zero.\t\t\t\t(Input)\n     NEW    - Vector of new row counts.\t\t\t\t(Output)\n  ----------------------------------------------------------------------- */\n\n    int i;\n\n    /* Parameter adjustments */\n    --new;\n    --irow;\n\n    /* Function Body */\n    for (i = 1; i < i1; ++i)\n        new[i] = irow[i];\n\n    for (i = i1; i <= izero - 1; ++i) {\n        if (is >= irow[i + 1]) break;\n        new[i] = irow[i + 1];\n    }\n\n    new[i] = is;\n\n    for(;;) {\n        ++i;\n        if (i > izero) return;\n        new[i] = irow[i];\n    }\n}\n\nstatic double f9xact(int n, int ntot, int *ir, double *fact) {\n/* -----------------------------------------------------------------------\n  Name:\t      F9XACT\n  Purpose:    Computes the log of a multinomial coefficient.\n\n  Arguments:\n     N\t    - Length of IR.\t\t\t\t\t(Input)\n     NTOT   - Number for factorial in numerator.\t\t(Input)\n     IR\t    - Vector of length N containing the numbers for\n              the denominator of the factorial.\t\t\t(Input)\n     FACT   - Table of log factorials.\t\t\t\t(Input)\n  Returns:\n     \t    - The log of the multinomal coefficient.\t\t(Output)\n  ----------------------------------------------------------------------- */\n    double d = fact[ntot];\n    for (int k = 0; k < n; k++)\n        d -= fact[ir[k]];\n    return d;\n}\n\n\nRboolean f10act(int nrow, int *irow, int ncol, int *icol, double *val,\n       double *fact, int *nd, int *ne, int *m) {\n/* -----------------------------------------------------------------------\n  Name:\t    F10ACT\n  Purpose:  Computes the shortest path length for special tables.\n\n  Arguments:\n     NROW   - The number of rows in the table.\t\t\t(Input)\n     IROW   - Vector of length NROW containing the row totals.\t(Input)\n     NCOL   - The number of columns in the table.  \t\t(Input)\n     ICO    - Vector of length NCOL containing the column totals.(Input)\n     VAL    - The shortest path.  \t\t\t\t(Input/Output)\n     FACT   - Vector containing the logarithms of factorials.   (Input)\n     ND\t    - Workspace vector of length NROW.\t\t\t(Input)\n     NE\t    - Workspace vector of length NCOL.\t\t\t(Input)\n     M\t    - Workspace vector of length NCOL.\t\t\t(Input)\n\n  Returns (VAL and):\n     XMIN   - Set to true if shortest path obtained.  \t\t(Output)\n  ----------------------------------------------------------------------- */\n    int i, is, ix;\n\n    for (i = 0; i < nrow - 1; ++i)\n        nd[i] = 0;\n\n    is = icol[0] / nrow;\n    ix = icol[0] - nrow * is;\n    ne[0] = is;\n    m[0] = ix;\n    if (ix != 0) ++nd[ix-1];\n\n    for (i = 1; i < ncol; ++i) {\n        ix = icol[i] / nrow;\n        ne[i] = ix;\n        is += ix;\n        ix = icol[i] - nrow * ix;\n        m[i] = ix;\n        if (ix != 0) ++nd[ix-1];\n    }\n\n    for (i = nrow - 3; i >= 0; --i)\n        nd[i] += nd[i + 1];\n\n    ix = 0;\n    for (i = nrow; i >= 2; --i) {\n        ix += is + nd[nrow - i] - irow[i-1];\n        if (ix < 0) return FALSE;\n    }\n\n    for (i = 0; i < ncol; ++i) {\n        ix = ne[i];\n        is = m[i];\n        *val +=  is * fact[ix + 1] + (nrow - is) * fact[ix];\n    }\n    return TRUE;\n}\n\nvoid f11act(int *irow, int i1, int i2, int *new) {\n/* -----------------------------------------------------------------------\n  Name:\t      F11ACT\n  Purpose:    Routine for revising row totals.\n\n  Arguments:\n     IROW   - Vector containing the row totals.\t(Input)\n     I1\t    - Indicator.\t\t\t(Input)\n     I2\t    - Indicator.  \t\t\t(Input)\n     NEW    - Vector containing the row totals.\t(Output)\n  ----------------------------------------------------------------------- */\n    int i;\n    for (i = 0;  i < (i1 - 1); ++i)\tnew[i] = irow[i];\n    for (i = i1; i <= i2; ++i)\t      new[i-1] = irow[i];\n}\n\nstatic int iwork(int iwkmax, int *iwkpt, int number, int itype) {\n/* -----------------------------------------------------------------------\n  Name:\t      iwork\n  Purpose:    Routine for allocating workspace.\n\n  Arguments:\n     iwkmax - Maximum (int) amount of workspace.\t\t(Input)\n     iwkpt  - Amount of (int) workspace currently allocated.\t(in/out)\n     number - Number of elements of workspace desired.\t\t(Input)\n     itype  - Workspace type.\t\t\t\t\t(Input)\n\t      ITYPE  TYPE\n\t\t2    integer\n\t\t3    float\n\t\t4    double\n     iwork(): Index in rwrk, dwrk, or iwrk of the beginning of\n              the first free element in the workspace array.\t(Output)\n  ----------------------------------------------------------------------- */\n    int i = *iwkpt;\n    if (itype == 2 || itype == 3)\n        *iwkpt += number;\n    else { /* double */\n        if (i % 2 != 0) ++i;\n        *iwkpt += (number << 1);\n        i /= 2;\n    }\n    Apop_stopif(*iwkpt >iwkmax, has_error=true;return i, 0, \"Out of workspace: %i > %i\", *iwkpt, iwkmax);\n    return i;\n}\n\n#ifndef USING_R\n\nvoid isort(int *n, int *ix) {\n/* -----------------------------------------------------------------------\n  Name:\t      ISORT\n  Purpose:    Shell sort for an int vector.\n\n  Arguments:\n     N\t    - Lenth of vector IX.\t(Input)\n     IX\t    - Vector to be sorted.\t(in/out)\n  ----------------------------------------------------------------------- */\n    static int ikey, i, j, m, il[10], kl, it, iu[10], ku;\n\n    /* Parameter adjustments */\n    --ix;\n\n    /* Function Body */\n    m = 1;\n    i = 1;\n    j = *n;\n\nL10:\n    if (i >= j) \n        goto L40;\n    kl = i;\n    ku = j;\n    ikey = i;\n    ++j;\n    /* Find element in first half */\nL20:\n    ++i;\n    if (i < j) \n        if (ix[ikey] > ix[i]) \n            goto L20;\n    /* Find element in second half */\nL30:\n    --j;\n    if (ix[j] > ix[ikey]) {\n        goto L30;\n    }\n    /* Interchange */\n    if (i < j) {\n        it = ix[i];\n        ix[i] = ix[j];\n        ix[j] = it;\n        goto L20;\n    }\n    it = ix[ikey];\n    ix[ikey] = ix[j];\n    ix[j] = it;\n    /* Save upper and lower subscripts of the array yet to be sorted */\n    if (m < 11) {\n        if (j - kl < ku - j) {\n            il[m - 1] = j + 1;\n            iu[m - 1] = ku;\n            i = kl;\n            --j;\n        } else {\n            il[m - 1] = kl;\n            iu[m - 1] = j - 1;\n            i = j + 1;\n            j = ku;\n        }\n        ++m;\n        goto L10;\n    } else Apop_stopif(1, return, 0, \"This should never occur.\");\n    /* Use another segment */\nL40:\n    --m;\n    if (m == 0) \n        return;\n    i = il[m - 1];\n    j = iu[m - 1];\n    goto L10;\n}\n\nstatic double gammds(double *y, double *p, int *ifault) {\n/* -----------------------------------------------------------------------\n  Name:\t      GAMMDS\n  Purpose:    Cumulative distribution for the gamma distribution.\n  Usage:      PGAMMA (Q, ALPHA,IFAULT)\n  Arguments:\n     Q\t    - Value at which the distribution is desired.  (Input)\n     ALPHA  - Parameter in the gamma distribution.  (Input)\n     IFAULT - Error indicator.\t(Output)\n\t       IFAULT  DEFINITION\n\t\t 0     No error\n\t\t 1     An argument is misspecified.\n\t\t 2     A numerical error has occurred.\n     PGAMMA - The cdf for the gamma distribution with parameter alpha\n\t      evaluated at Q.  (Output)\n  -----------------------------------------------------------------------\n\n  Algorithm AS 147 APPL. Statist. (1980) VOL. 29, P. 113\n\n  Computes the incomplete gamma integral for positive parameters Y, P\n  using and infinite series.\n  */\n\n    static double a, c, f, g;\n\n    /* Checks for the admissibility of arguments and value of F */\n    *ifault = 1;\n    g = 0.;\n    if (*y <= 0. || *p <= 0.)\n        return g;\n    *ifault = 2;\n\n    /*\n      ALOGAM is natural log of gamma function no need to test ifail as\n      an error is impossible\n\n      BK edit: using gsl_sf_lngamma instead. It has more methods--> maybe slower; more precise.\n\n      */\n\n    a = *p + 1.;\n    f = exp(*p * log(*y) - gsl_sf_lngamma(a) - *y);\n    if (f == 0.) \n        return g;\n    *ifault = 0;\n\n    /* Series begins */\n    c = 1.;\n    g = 1.;\n    a = *p;\n    do {\n        a += 1.;\n        c *= (*y / a);\n        g += c;\n    } while (c > 1e-6 * g);\n\n    g *= f;\n    return g;\n}\n\n/** Convert from an \\ref apop_data set to a table of integers.\n\nNot too necessary, but I needed it for the Fisher exact test.\n*/\nstatic int *apop_data_to_int_array(apop_data *intab){\n  int rowct = intab->matrix->size1,\n      colct = intab->matrix->size2,\n      *out  = malloc(sizeof(int)*(rowct* colct));\n    for (int i=0; i< rowct; i++)\n        for (int j=0; j< colct; j++)\n            out[j*rowct + i] = (int) gsl_matrix_get(intab->matrix, i, j);\n    return out;\n}\n\n/** Run the Fisher exact test on an input contingency table.\n\n\\return     An \\ref apop_data set with two rows:<br>\n    \"probability of table\": Probability of the observed table for fixed marginal totals.\t<br>\n    \"p value\":  Table p-value.\tThe probability of a more extreme table,\n\t      where `extreme' is in a probabilistic sense.\n\n\\li If there are processing errors, these values will be NaN.\n\n\\exception out->error=='p' Processing error in the test.\n\nFor example: \n\n\\include test_fisher.c\n*/\napop_data *apop_test_fisher_exact(apop_data *intab){\n    double  prt, pre,\n            expect  = -1,\n            percent = 80,\n            emin    = 1;\n    int     *intified = apop_data_to_int_array(intab),\n            workspace = 200000,\n            mult      = 30,\n            rowct     = intab->matrix->size1,\n            colct     = intab->matrix->size2;\n    has_error=0;\nOMP_critical (fexact) //f3xact and f5exact use static vars for some state-keeping.\n    fexact(&rowct, \n       &colct,\n       intified,\n       &rowct,\n       // Cochran condition for asym.chisq. decision:\n       &expect,\n       &percent,\n       &emin,\n       &prt,\n       &pre,\n       &workspace,\n       &mult);\n    free(intified);\n    apop_data *out = apop_data_alloc();\n    Asprintf(&out->names->title, \"Fisher Exact test\");\n    apop_data_add_named_elmt(out, \"probability of table\", prt);\n    apop_data_add_named_elmt(out, \"p value\", pre);\n    Apop_stopif(has_error, out->error='p'; return out, 0, \"processing error; don't trust the results.\");\n    return out;\n}\n#endif /* not USING_R */\n"
  },
  {
    "path": "apop_hist.m4.c",
    "content": "/** \\file apop_hist.c */\n/* Functions that work with PMFs and histograms.\n\nCopyright (c) 2006--2007, 2010, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n (Except psmirnov2x, Copyright R Project, but also licensed under the GPL.)\n*/\n\n#include \"apop_internal.h\"\n#include <gsl/gsl_rng.h>\n#include <gsl/gsl_sort_vector.h>\n#include <stdbool.h>\n\n/** Make random draws from an \\ref apop_model, and bin them using a binspec in the style\n of \\ref apop_data_to_bins. If you have a data set that used the same binspec, you now have synced histograms, which you can plot or sensibly test hypotheses about.\n\n\n\\param binspec A description of the bins in which to place the draws; see \\ref apop_data_to_bins. (default: as in \\ref apop_data_to_bins.)\n\\param model The model to be drawn from. Because this function works via random draws, the model needs to have a \n\\c draw method. (No default)\n\\param draws The number of random draws to make. (arbitrary default = 10,000)\n\\param bin_count If no bin spec, the number of bins to use (default: as per \\ref apop_data_to_bins, \\f$\\sqrt(N)\\f$)\n\n\\return An \\ref apop_pmf model, with a new binned data set attached (which you may\nhave to <tt>apop_data_free(output_model->data)</tt> to prevent memory leaks). The\nweights on the data set are normalized to sum to one.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_model *apop_model_to_pmf(apop_model *model, apop_data *binspec, long int draws, int bin_count){\n    apop_model* apop_varad_var(model, NULL);\n    Apop_assert(model && model->draw, \"The second argument needs to be an apop_model with a 'draw' function \"\n                              \"that I can use to make random draws.\");\n    apop_data* apop_varad_var(binspec, NULL);\n    int apop_varad_var(bin_count, 0);\n    long int apop_varad_var(draws, 1e4);\nAPOP_VAR_ENDHEAD\n    Get_vmsizes(binspec);\n    apop_data *outd = apop_model_draws(model, draws);\n    apop_data *outbinned = apop_data_to_bins(outd, binspec, .bin_count=bin_count);\n    apop_data_free(outd);\n    apop_vector_normalize(outbinned->weights);\n    return apop_estimate(outbinned, apop_pmf);\n} \n\n/** Test the goodness-of-fit between two \\ref apop_pmf models. \n\nLet \\f$o_i\\f$ be the \\f$i\\f$th observed bin and \\f$e_i\\f$ the expected value of that\nbin; then under typical assumptions, $\\f$\\Sum_i^N (o_i-e_i)^2/e_i \\sim \\Chi^2_{N-1}\\f$.\n\nIf you send two histograms, I assume that the histograms are synced: for PMFs,\nyou've used \\ref apop_data_to_bins to generate two histograms using the same binspec,\nor you've used \\ref apop_data_pmf_compress to guarantee that each observation value\nappears exactly once in each data set.\n\nIn any case, all values in the \\c observed set must appear in the \\c\nexpected set with nonzero weight; otherwise this will return a \\f$\\chi^2\\f$ statistic\nof \\c GSL_POSINF, indicating that it is impossible for the \\c observed data to have\nbeen drawn from the \\c expected distribution.\n\n\\li If an observation row has weight zero, I skip it. if <tt>apop_opts.verbose >=1 </tt> I will show a warning.\n*/\napop_data *apop_histograms_test_goodness_of_fit(apop_model *observed, apop_model *expected){\n    int df = observed->data->weights->size;\n    double diff = 0;\n    for (int i=0; i< observed->data->weights->size; i++){\n        double obs_val = gsl_vector_get(observed->data->weights, i);\n        double exp_val = apop_p(Apop_r(observed->data, i), expected);\n        if (exp_val == 0){\n            diff = GSL_POSINF; \n            break;\n        }\n        if (obs_val==0){\n            Apop_notify(1, \"element %i of the observed data has weight zero. Skipping it.\", i);\n            df --;\n        } else \n            diff += gsl_pow_2(obs_val - exp_val)/exp_val;\n    }\n    //Data gathered. Now output\n    apop_data   *out    = apop_data_alloc();\n    double      toptail = gsl_cdf_chisq_Q(diff, df-1);\n    Asprintf(&out->names->title, \"Goodness-of-fit test via Chi-squared statistic\");\n    apop_data_add_named_elmt(out, \"Chi squared statistic\", diff);\n    apop_data_add_named_elmt(out, \"df\", df-1);\n    apop_data_add_named_elmt(out, \"p value\",  toptail); \n    apop_data_add_named_elmt(out, \"confidence\", 1 - toptail);\n    return out;\n}\n\n/*Everything from here to psmirnov2x (inclusive) is cut/pasted/trivially modified from the R project. Copyright them. */\nstatic void m_multiply(long double *A, long double *B, long double *C, int m) {\n    /* Auxiliary routine used by K().\n       Matrix multiplication.\n    */\n    for (int i = 0; i < m; i++)\n        for (int j = 0; j < m; j++) {\n            long double s = 0;\n            for (int k = 0; k < m; k++)\n                s += A[i * m + k] * B[k * m + j];\n            C[i * m + j] = s;\n        }\n}\n\nstatic void m_power(long double *A, int eA, long double *V, int *eV, int m, int n) {\n    /* Auxiliary routine used by K().\n       Matrix power.\n    */\n    long double *B;\n    int eB, i;\n\n    if (n == 1) {\n        for (i = 0; i < m * m; i++)\n            V[i] = A[i];\n        *eV = eA;\n        return;\n    }\n    m_power(A, eA, V, eV, m, n / 2);\n    B = calloc(m * m, sizeof(long double));\n    m_multiply(V, V, B, m);\n    eB = 2 * (*eV);\n    if ((n % 2) == 0) {\n        for (i = 0; i < m * m; i++)\n            V[i] = B[i];\n        *eV = eB;\n    } else {\n        m_multiply(A, B, V, m);\n        *eV = eA + eB;\n    }\n    if (V[(m / 2) * m + (m / 2)] > 1e140) {\n        for (i = 0; i < m * m; i++)\n            V[i] = V[i] * 1e-140;\n        *eV += 140;\n    }\n    free(B);\n}\n\n/* The two-sided one-sample 'exact' distribution */\nstatic double kolmogorov_2x(int n, double d) {\n    /* Compute Kolmogorov's distribution.\n       Code published in\n\t George Marsaglia and Wai Wan Tsang and Jingbo Wang (2003),\n\t \"Evaluating Kolmogorov's distribution\".\n\t Journal of Statistical Software, Volume 8, 2003, Issue 18.\n\t URL: http://www.jstatsoft.org/v08/i18/.\n    */\n\n   int k, m, i, j, g, eH, eQ;\n   long double h, s, *H, *Q;\n\n   /* The faster right-tail approximation is omitted here.\n      s = d*d*n; \n      if(s > 7.24 || (s > 3.76 && n > 99)) \n          return 1-2*exp(-(2.000071+.331/sqrt(n)+1.409/n)*s);\n   */\n   k = (n * d) + 1;\n   m = 2 * k - 1;\n   h = k - n * d;\n   H = calloc(m * m, sizeof(long double));\n   Q = calloc(m * m, sizeof(long double));\n   for(i = 0; i < m; i++)\n       for(j = 0; j < m; j++)\n           if (i - j + 1 < 0) H[i * m + j] = 0;\n           else               H[i * m + j] = 1;\n   for(i = 0; i < m; i++) {\n       H[i * m] -= pow(h, i + 1);\n       H[(m - 1) * m + i] -= pow(h, (m - i));\n   }\n   H[(m - 1) * m] += ((2 * h - 1 > 0) ? pow(2 * h - 1, m) : 0);\n   for(i = 0; i < m; i++)\n       for(j=0; j < m; j++)\n           if(i - j + 1 > 0)\n               for(g = 1; g <= i - j + 1; g++)\n                   H[i * m + j] /= g;\n   eH = 0;\n   m_power(H, eH, Q, &eQ, m, n);\n   s = Q[(k - 1) * m + k - 1];\n   for(i = 1; i <= n; i++) {\n       s = s * i / n;\n       if(s < 1e-140) {\n           s *= 1e140;\n           eQ -= 140;\n       }\n   }\n   s *= pow(10., eQ);\n   free(H);\n   free(Q);\n   return(s);\n}\n\nstatic double psmirnov2x(double x, int m, int n) {\n    if(m > n) {\n        int tmp = n; n = m; m = tmp;\n    }\n    double md = m;\n    double nd = n;\n        // q has 0.5/mn added to ensure that rounding error doesn't\n        // turn an equality into an inequality, eg abs(1/2-4/5)>3/10\n    long double q = (.5+floor(x * md * nd - 1e-7)) / (md * nd);\n    long double u[n+1];\n\n    for(int j = 0; j <= n; j++) \n        u[j] = ((j / nd) > q) ? 0 : 1;\n    for(int i = 1; i <= m; i++) {\n        long double w = i/(i + nd);\n        u[0] = (i / md) > q \n                ? 0\n                : w * u[0];\n        for(int j = 1; j <= n; j++) \n            u[j] = fabs(i / md - j / nd) > q\n                    ? 0\n                    : w * u[j] + u[j - 1];\n    }\n    return u[n];\n}\n\n/** Run the Kolmogorov-Smirnov test to determine whether two distributions are identical.\n\n\\param m1 A sorted PMF model. I.e., a model estimated via something like \n<tt>apop_model *m1 = apop_estimate(apop_data_sort(input_data), apop_pmf);</tt>\n\n\\param m2  Another \\ref apop_model. If it is a PMF, then I will use a two-sample test,\nwhich is different from the one-sample test used if this is not a PMF.\n\n\\return An \\ref apop_data set including the \\f$p\\f$-value from the Kolmogorov-Smirnov\ntest that the two distributions are equal.\n\n\\exception out->error='m'  Model error: \\c m1 is not an \\ref apop_pmf. I verify this\nby checking whether <tt>m1->cdf == apop_pmf->cdf</tt>.\n\n\\li If you are using a \\ref apop_pmf model, the data set(s) must be sorted before\nyou set up the model, as per the example below. See \\ref apop_data_sort and the\ndiscussion of CDFs in the \\ref apop_pmf documentation. If you don't do this, the test\nwill almost certainly reject the null hypothesis that \\c m1 and \\c m2 are identical.\nA future version of Apophenia may implement a mechanism to allow this function to test\nfor sorted data, but it currently can't.\n\nHere is an example, which tests whether a set of draws from a Normal(0, 1) matches a\nsequence of Normal distributions with increasing mean.\n\n\\include ks_tests.c\n*/\napop_data *apop_test_kolmogorov(apop_model *m1, apop_model *m2){\n    Apop_stopif(m1->cdf != apop_pmf->cdf, apop_return_data_error('m'), \n            0, \"First model has to be a PMF. I check whether m1->cdf == apop_pmf->cdf.\");\n    bool m2_is_pmf = (m2->cdf == apop_pmf->cdf);\n\n    int maxsize1, maxsize2;\n    {Get_vmsizes(m1->data); maxsize1 = maxsize;} //copy one of the macro's variables \n    {Get_vmsizes(m2->data); maxsize2 = maxsize;} //to the full function's scope.\n    double largest_diff = GSL_NEGINF;\n    double sum = 0;\n    for (size_t i=0; i< maxsize1; i++){\n        apop_data *arow = Apop_r(m1->data, i);\n        sum += m1->data->weights ? gsl_vector_get(m1->data->weights, i) : 1./maxsize1;\n        largest_diff = GSL_MAX(largest_diff, fabs(sum-apop_cdf(arow, m2)));\n    }\n    if (m2_is_pmf){\n        double sum = 0;\n        for (size_t i=0; i< maxsize2; i++){     //There could be matched data rows to m1, so there is redundancy.\n            apop_data *arow = Apop_r(m2->data, i);   // Feel free to submit a smarter version.\n            sum += m2->data->weights ? gsl_vector_get(m2->data->weights, i) : 1./maxsize2;\n            largest_diff = GSL_MAX(largest_diff, fabs(sum-apop_cdf(arow, m2)));\n        }\n    }\n    apop_data *out = apop_data_alloc();\n    Asprintf(&out->names->title, \"Kolmogorov-Smirnov test\");\n    apop_data_add_named_elmt(out, \"max distance\", largest_diff);\n    double ps = m2_is_pmf ? psmirnov2x(largest_diff, maxsize1, maxsize2)\n                          : kolmogorov_2x(maxsize1, largest_diff);\n    apop_data_add_named_elmt(out, \"p value, 2 tail\", 1-ps);\n    apop_data_add_named_elmt(out, \"confidence, 2 tail\", ps);\n    return out;\n}\n\n/** Create a histogram from data by putting data into bins of fixed width. Your input\n\\ref apop_data set may be multidimensional, and may include both vector and matrix\nparts, and the bins output will have corresponding dimension.\n\n\\param indata The input data that will be binned, one observation per row. This is\n    copied and the copy will be modified. (No default)\n\\param binspec This is an \\ref apop_data set with the same number of columns as \\c indata. \n    If you want a fixed size for the bins, then the first row of the bin spec is the\n    bin width for each column.  This allows you to specify a width for each dimension,\n    or specify the same size for all with something like:\n\\code\napop_data *binspec = apop_data_copy(Apop_r(indata, 0));\ngsl_matrix_set_all(binspec->matrix, 10); //bins of size 10 for all dim.s\napop_data_to_bins(indata, binspec);\n\\endcode\n    The presumption is that the first bin starts at zero in all cases. You can add a second\n    row to the spec to give the offset for each dimension. (default: NULL)\n\\param bin_count If you don't provide a bin spec, I'll provide this many evenly-sized bins to cover the data set. (Default: \\f$\\sqrt{N}\\f$)\n\\param close_top_bin Normally, a bin covers the range from the point equal to its\n    minimum to points strictly less than the minimum plus the width.  if \\c 'y', then\n    the top bin includes points less than or equal to the upper bound. This solves the\n    problem of displaying histograms where the top bin is just one point. (default:\n    \\c 'y' if \\c binspec==NULL, else \\c 'n')\n\n\\return A pointer to an \\ref apop_data set with the same dimension as your input data.\nEach cell is an integer giving the bin number into which the cell falls.\n\n\\li If no binspec and no binlist, then a grid with offset equal to the min of the\n    column, and bin size such that it takes \\f$\\sqrt{N}\\f$ bins to cover the range to the\n    max element.\n\\li The text segment is not binned. The \\c more pointer, if any, is not followed.\n\\li If there are weights, they are copied to the output via \\ref apop_vector_copy.\n\\li Given \\c NULL input, return \\c NULL output. Print a warning if <tt>apop_opts.verbose >= 2</tt>.\n\nIff you didn't give me a binspec, then I attach one to the output set as a page named\n\\c \\<binspec\\>. This means that you can snap a second data set to the same grid using\n\\code\napop_data_to_bins(first_set, NULL);\napop_data_to_bins(second_set, apop_data_get_page(first_set, \"<binspec>\"));\n\\endcode\n\\li The output has exactly as many rows as the input. Because many rows will be identical\nafter binning, it may be fruitful to run it through \\ref apop_data_pmf_compress to\nproduce a short list with one total weight per bin.\n\nHere is a sample program highlighting \\ref apop_data_to_bins and \\ref apop_data_pmf_compress .\n\n\\include binning.c\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data *apop_data_to_bins(apop_data const *indata, apop_data const *binspec, int bin_count, char close_top_bin){\n    apop_data const *apop_varad_var(indata, NULL);\n    Apop_assert_c(indata, NULL, 2, \"NULL input data set, so returning NULL output data set.\");\n    apop_data const *apop_varad_var(binspec, NULL);\n    char apop_varad_var(close_top_bin, binspec==NULL ? 'y' : 'n');\n    int apop_varad_var(bin_count, 0);\nAPOP_VAR_ENDHEAD\n    Get_vmsizes(indata); //firstcol, vsize, msize1, msize2\n    double binwidth, offset, max=0;\n    apop_data *out = apop_data_alloc(vsize, msize1, msize2);\n    apop_data const *bs = binspec ? binspec\n                    : apop_data_add_page(out, \n                        apop_data_alloc(vsize? 2: 0, msize1? 2: 0, indata->matrix ? msize2: 0),\n                        \"<binspec>\");\n    for (int j= firstcol; j< msize2; j++){\n        gsl_vector *onecol = Apop_cv(out, j);\n        gsl_vector *datacol = Apop_cv(indata, j);\n        if (binspec){\n           binwidth = apop_data_get(binspec, 0, j);\n           offset = ((binspec->vector && binspec->vector->size==2 )\n                   ||(binspec->matrix && binspec->matrix->size1==2)) ? apop_data_get(binspec, 1, j) : 0;\n        } else {\n            gsl_vector *abin = Apop_cv(bs, j);\n            max = gsl_vector_max(datacol);\n            gsl_vector_set(abin, 1, offset = gsl_vector_min(datacol));\n            gsl_vector_set(abin, 0, binwidth = (max - offset)/(bin_count ? bin_count : sqrt(datacol->size)));\n        }\n        for (int i=0; i< onecol->size; i++){\n            double val = gsl_vector_get(datacol, i);\n            double adjust = (close_top_bin=='y' && val == max && val!=offset) ? 2*GSL_DBL_EPSILON : 0;\n            gsl_vector_set(onecol, i, (floor((val-offset-adjust)/binwidth))*binwidth+offset);\n        }\n    }\n    if (indata->weights) out->weights = apop_vector_copy(indata->weights);\n    return out;\n}\n\n/** Return a new vector that is the moving average of the input vector.\n\n\\param v The input vector, unsmoothed\n\\param bandwidth An integer \\f$\\geq 1\\f$ giving the number of elements to be averaged to produce one number.\n\\return A smoothed vector of size <tt>v->size - (bandwidth/2)*2</tt>.\n */\ngsl_vector *apop_vector_moving_average(gsl_vector *v, size_t bandwidth){\n    Apop_stopif(!v, return NULL, 0, \"You asked me to smooth a NULL vector; returning NULL.\");\n    Apop_stopif(!bandwidth, return apop_vector_copy(v), 0, \"Bandwidth must be >=1. Returning a copy of original vector with no smoothing.\");\n    int halfspan = bandwidth/2;\n    Apop_stopif((v->size - halfspan*2)<=0, return NULL, 0, \"Bandwidth wider than the vector. Returning NULL.\");\n    gsl_vector *vout = gsl_vector_calloc(v->size - halfspan*2);\n    for(size_t i=0; i < vout->size; i ++){\n        double *item = gsl_vector_ptr(vout, i);\n        for (int j=-halfspan; j < halfspan+1; j ++)\n            *item += gsl_vector_get(v, j+ i+ halfspan);\n        *item /= halfspan*2 +1;\n    }\n    return vout;\n}\n"
  },
  {
    "path": "apop_internal.h",
    "content": "/* These are functions used here and there to write Apophenia. They're\n not incredibly useful, or even very good form, so they're not public. Cut\n & paste `em into your own code if you'd like.\n */\n\n/* Many Apop functions try to treat the vector and matrix equally, which\n requires knowing which exists and what the sizes are. */\n#define Get_vmsizes(d) \\\n    int firstcol = d && (d)->vector ? -1 : 0; \\\n    int vsize = d && (d)->vector ? (d)->vector->size : 0; \\\n    int wsize = d && (d)->weights ? (d)->weights->size : 0; \\\n    int msize1 = d && (d)->matrix ? (d)->matrix->size1 : 0; \\\n    int msize2 = d && (d)->matrix ? (d)->matrix->size2 : 0; \\\n    int tsize = vsize + msize1*msize2; \\\n    int maxsize = GSL_MAX(vsize, GSL_MAX(msize1, d?d->textsize[0]:0));\\\n    (void)(tsize||wsize||firstcol||maxsize) /*prevent unused variable complaints */;\n\n// Define a static variable, and initialize on first use.\n#define Staticdef(type, name, def) static type (name) = NULL; if (!(name)) (name) = (def);\n\n// Check for NULL and complain if so.\n#define Nullcheck(in, errval) Apop_assert_c(in, errval, apop_errorlevel, \"%s is NULL.\", #in);\n#define Nullcheck_m(in, errval) Apop_assert_c(in, errval, apop_errorlevel, \"%s is a NULL model.\", #in);\n#define Nullcheck_mp(in, errval) Nullcheck_m(in, errval); Apop_assert_c((in)->parameters, errval, apop_errorlevel, \"%s is a model with NULL parameters. Please set the parameters and try again.\", #in);\n#define Nullcheck_d(in, errval) Apop_assert_c(in, errval, apop_errorlevel, \"%s is a NULL data set.\", #in);\n//And because I do them all so often:\n#define Nullcheck_mpd(data, model, errval) Nullcheck_m(model, errval); Nullcheck_p(model, errval); Nullcheck_d(data, errval);\n//deprecated:\n#define Nullcheck_p(in, errval) Nullcheck_mp(in, errval);\n\n//in apop_conversions.c Extend a string.\nvoid xprintf(char **q, char *format, ...);\n#define XN(in) ((in) ? (in) : \"\")\n\n//For a pedantic compiler. Continues on error, because there's not much else to do: the computer is clearly broken.\n#define Asprintf(...) Apop_stopif(asprintf(__VA_ARGS__)==-1, , 0, \"Error printing to a string.\")\n\n#include <sqlite3.h>\n#include <stddef.h>\nint apop_use_sqlite_prepared_statements(size_t col_ct);\nint apop_prepare_prepared_statements(char const *tabname, size_t col_ct, sqlite3_stmt **statement);\nchar *prep_string_for_sqlite(int prepped_statements, char const *astring);//apop_conversions.c\nvoid apop_gsl_error(char const *reason, char const *file, int line, int gsl_errno); //apop_linear_algebra.c\n\n//For when we're forced to use a global variable.\n#undef threadlocal\n#if __STDC_VERSION__ > 201100L\n    #define threadlocal _Thread_local\n#elif defined(__APPLE__) \n    #define threadlocal\n#elif defined(__GNUC__) && !defined(threadlocal)\n    #define threadlocal __thread\n#else\n    #define threadlocal\n#endif\n\n#ifdef _OPENMP\n#define PRAGMA(x) _Pragma(#x)\n#define OMP_critical(tag) PRAGMA(omp critical ( tag ))\n#define OMP_for(...) _Pragma(\"omp parallel for\") for(__VA_ARGS__)\n#define OMP_for_reduce(red, ...) PRAGMA(omp parallel for reduction( red )) for(__VA_ARGS__)\n#else\n#define OMP_critical(tag)\n#define OMP_for(...) for(__VA_ARGS__)\n#define OMP_for_reduce(red, ...) for(__VA_ARGS__)\n#endif\n\n#include \"config.h\"\n#ifndef HAVE___ATTRIBUTE__\n#define __attribute__(...)\n#endif\n\n#ifndef HAVE_ASPRINTF\n#include <stdarg.h>\n\n//asprintf, vararg, &c\nextern int asprintf (char **res, const char *format, ...)\n       __attribute__ ((__format__ (__printf__, 2, 3)));\nextern int vasprintf (char **res, const char *format, va_list args)\n       __attribute__ ((__format__ (__printf__, 2, 0)));\n#endif\n\n#include \"apop.h\"\nvoid add_info_criteria(apop_data *d, apop_model *m, apop_model *est, double ll, int param_ct); //In apop_mle.c\n\napop_model *maybe_prep(apop_data *d, apop_model *m, _Bool *is_a_copy); //in apop_mcmc, for apop_update.\n"
  },
  {
    "path": "apop_linear_algebra.m4.c",
    "content": "/** \\file apop_linear_algebra.c\tAssorted things to do with matrices,\nsuch as take determinants or do singular value decompositions.  Includes\nmany convenience functions that don't actually do math but add/delete\ncolumns, check bounds, et cetera.\n*/ \n/* Copyright (c) 2006--2007, 2012 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n\nvoid apop_gsl_error(const char *reason, const char *file, int line, int gsl_errno){\n    Apop_notify(1, \"%s: %s\", file, reason);\n    Apop_maybe_abort(1);\n}\n#define Checkgsl(...) if (__VA_ARGS__) {goto done;}\n#define Check_gsl_with_out(...) if (__VA_ARGS__) {out->error='m'; goto done;}\n#define Check_gsl_with_outmp(...) if (__VA_ARGS__) {gsl_matrix_free(*out); *out=NULL; goto done;}\n#define Set_gsl_handler gsl_error_handler_t *prior_handler = gsl_set_error_handler(apop_gsl_error);\n#define Unset_gsl_handler gsl_set_error_handler(prior_handler);\n\n/**\nCalculate the determinant of a matrix, its inverse, or both, via LU decomposition. The \\c in matrix is not destroyed in the process.\n\n\\see apop_matrix_determinant,  apop_matrix_inverse\n\n\\param in The matrix to be inverted/determined. \n\\param out If you want an inverse, this is where to place the matrix to be filled with the inverse. Will be allocated by the function. \n\n\\param calc_det \n0: Do not calculate the determinant.<br>\n1: Do.\n\n\\param calc_inv\n0: Do not calculate the inverse.<br>\n1: Do.\n\n\\return If <tt>calc_det == 1</tt>, then return the determinant. Otherwise, just returns zero.  If <tt>calc_inv!=0</tt>, \nthen \\c *out is pointed to the matrix inverse. In case of difficulty, I will set <tt>*out=NULL</tt> and return \\c NaN.\n*/\n\ndouble apop_det_and_inv(const gsl_matrix *in, gsl_matrix **out, int calc_det, int calc_inv) {\n    Set_gsl_handler\n    Apop_stopif(in->size1 != in->size2, *out=NULL; return GSL_NAN, 0, \"You asked me to invert a %zu X %zu matrix, \"\n            \"but inversion requires a square matrix.\", in->size1, in->size2);\n    int sign;\n    double the_determinant = GSL_NAN;\n\tgsl_matrix *invert_me = gsl_matrix_alloc(in->size1, in->size1);\n\tgsl_permutation * perm = gsl_permutation_alloc(in->size1);\n\tgsl_matrix_memcpy (invert_me, in);\n\tCheckgsl(gsl_linalg_LU_decomp(invert_me, perm, &sign))\n\tif (calc_inv){\n\t\t*out = gsl_matrix_alloc(in->size1, in->size1); //square.\n\t\tCheck_gsl_with_outmp(gsl_linalg_LU_invert(invert_me, perm, *out))\n    }\n\tif (calc_det)\n\t\tthe_determinant\t= gsl_linalg_LU_det(invert_me, sign);\n    done:\n\tgsl_matrix_free(invert_me);\n\tgsl_permutation_free(perm);\n    Unset_gsl_handler\n\treturn the_determinant;\n}\n\n/**\nInverts a matrix. The \\c in matrix is not destroyed in the process.\nYou may want to call \\ref apop_matrix_determinant first to check that your input is invertible, or use \\ref apop_det_and_inv to do both at once.\n\n\\param in The matrix to be inverted.\n\\return Its inverse.\n*/\ngsl_matrix * apop_matrix_inverse(const gsl_matrix *in) {\n    gsl_matrix *out = NULL;\n    apop_det_and_inv(in, &out, 0, 1);\n    return out;\n}\n\n/**\nFind the determinant of a matrix. The \\c in matrix is not destroyed in the process.\n\nSee also \\ref apop_matrix_inverse ,  or \\ref apop_det_and_inv to do both at once.\n\n\\param in The matrix to be determined.\n\\return     The determinant.\n*/\ndouble apop_matrix_determinant(const gsl_matrix *in) {\n    return apop_det_and_inv(in, NULL, 1, 0);\n}\n\n/** Principal component analysis: hand in a matrix and (optionally) a number of desired dimensions, and I'll return a data set where each column of the matrix is an eigenvector. The columns are sorted, so column zero has the greatest weight. The vector element of the data set gives the weights.\n\nYou may also specify the number of elements your principal component space should have. If\nthis is equal to the rank of the space in which the input data lives, then the sum of\nweights will be one. If the dimensions desired is less than that (probably so you can\nprepare a plot), then the weights will be accordingly smaller, giving you an indication\nof how much variation these dimensions explain.\n\n\\param data The input matrix.  I modify int in place so that each column has\nmean zero. (No default. If \\c NULL, return \\c NULL and print a warning iff\n<tt>apop_opts.verbose >= 1</tt>.)\n\n\\param dimensions_we_want The singular value decomposition will return this many of the eigenvectors with the largest eigenvalues. (default: the size of the covariance matrix, i.e. <tt>data->size2</tt>)\n\n\\return  Returns an \\ref apop_data set whose matrix is the principal component\nspace. Each column of the returned matrix will be another eigenvector; the columns\nwill be ordered by the eigenvalues.\n\nThe data set's vector will be the largest eigenvalues, scaled by the total of all eigenvalues (including those that were thrown out). The sum of these returned values will give you the percentage of variance explained by the factor analysis.\n\n\\exception out->error=='a'  Allocation error.\n*/\nAPOP_VAR_HEAD apop_data * apop_matrix_pca(gsl_matrix *data, int const dimensions_we_want) {\n    gsl_matrix * apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NULL, 1, \"NULL data input\");\n    int const apop_varad_var(dimensions_we_want, data->size2);\nAPOP_VAR_ENDHEAD\n    Set_gsl_handler\n    apop_data *pc_space\t= apop_data_alloc(0, data->size2, dimensions_we_want);\n    Apop_stopif(pc_space->error, return pc_space, 0, \"Allocation error.\");\n\tpc_space->vector = gsl_vector_alloc(dimensions_we_want);\n    Apop_stopif(!pc_space->vector, pc_space->error='a'; return pc_space, \n                0, \"Allocation error setting up a %i vector.\", dimensions_we_want);\n    gsl_matrix *eigenvectors = gsl_matrix_alloc(data->size2, data->size2);\n    gsl_vector *dummy_v \t = gsl_vector_alloc(data->size2);\n    gsl_vector *all_evalues  = gsl_vector_alloc(data->size2);\n    gsl_matrix *square  \t = gsl_matrix_calloc(data->size2, data->size2);\n    Apop_stopif(!eigenvectors || !dummy_v || !all_evalues || !square, pc_space->error='a'; return pc_space, \n                0, \"Allocation error setting up workspace for %zu dimensions.\", data->size2);\n    double eigentotals\t= 0;\n    for (int i=0; i< data->size2; i++)\n        apop_vector_normalize(Apop_mcv(data, i), NULL, 'm');\n\n\tCheckgsl(gsl_blas_dgemm(CblasTrans,CblasNoTrans, 1, data, data, 0, square))\n\tCheckgsl(gsl_linalg_SV_decomp(square, eigenvectors, all_evalues, dummy_v))\n\tfor (int i=0; i< all_evalues->size; i++)\n\t\teigentotals\t+= gsl_vector_get(all_evalues, i);\n\tfor (int i=0; i<dimensions_we_want; i++){\n\t\tgsl_vector *v = Apop_cv(&(apop_data){.matrix=eigenvectors}, i);\n\t\tgsl_matrix_set_col(pc_space->matrix, i, v);\n\t\tgsl_vector_set(pc_space->vector, i, gsl_vector_get(all_evalues, i)/eigentotals);\n\t}\n    done:\n\tgsl_vector_free(dummy_v); \tgsl_vector_free(all_evalues);\n\tgsl_matrix_free(square); \tgsl_matrix_free(eigenvectors);\n    Unset_gsl_handler\n    return pc_space;\n}\n\nstatic void l10(double *d){ *d = log10(*d); }\nstatic void ln(double *d){ *d = log(*d); }\nstatic void ex(double *d){ *d = exp(*d); }\n\n/** Replace every vector element \\f$v_i\\f$ with log\\f$_{10}(v_i)\\f$.\n\\li If the input vector is \\c NULL, do nothing. \n*/\nvoid apop_vector_log10(gsl_vector *v){\n    if (!v) return;\n    apop_vector_apply(v, l10);\n}\n\n/** Replace every vector element \\f$v_i\\f$ with ln\\f$(v_i)\\f$.\n\\li If the input vector is \\c NULL, do nothing. \n*/\nvoid apop_vector_log(gsl_vector *v){\n    if (!v) return;\n    apop_vector_apply(v, ln);\n}\n\n/** Replace every vector element \\f$v_i\\f$ with exp\\f$(v_i)\\f$.\n\\li If the input vector is \\c NULL, do nothing. \n*/\nvoid apop_vector_exp(gsl_vector *v){\n    if (!v) return;\n    apop_vector_apply(v, ex);\n}\n\n/** Put the first vector on top of the second vector.\n\n\\param  v1  the upper vector (default=\\c NULL, in which case this copies \\c v2)\n\\param  v2  the second vector (default=\\c NULL, in which case nothing is added)\n\\param  inplace If \\c 'y', use \\ref apop_vector_realloc to modify \\c v1 in place;\n    see the caveats on that function. Otherwise, allocate a new vector, leaving \\c v1\n    undisturbed. (default=\\c 'n')\n\\return     the stacked data, either in a new vector or a pointer to \\c v1.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD gsl_vector *apop_vector_stack(gsl_vector *v1, gsl_vector const * v2, char inplace){\n    gsl_vector * apop_varad_var(v1, NULL);\n    gsl_vector const * apop_varad_var(v2, NULL);\n    char apop_varad_var(inplace, 'n');\nAPOP_VAR_ENDHEAD\n    gsl_vector *out;\n    gsl_vector t;\n    if (!v1  && v2){\n        out = gsl_vector_alloc(v2->size);\n        gsl_vector_memcpy(out, v2);\n        return out;\n    } else if (!v2  && v1){\n        if (inplace == 'y')\n            return v1;\n        out = gsl_vector_alloc(v1->size);\n        gsl_vector_memcpy(out, v1);\n        return out;\n    } else if (!v1 && !v2)\n        return NULL;\n    //else:\n    size_t v1size = v1->size; //save in case of reallocing.\n    if (inplace == 'y' )\n        out = apop_vector_realloc(v1, v1->size+v2->size);\n    else {\n        out = gsl_vector_alloc(v1->size + v2->size);\n        t   = gsl_vector_subvector(out, 0, v1size).vector;\n        gsl_vector_memcpy(&t, v1);\n    }\n    t   = gsl_vector_subvector(out, v1size, v2->size).vector;\n    gsl_vector_memcpy(&t, v2);\n    return out;\n}\n\n/** Put the first matrix either on top of or to the right of the second matrix.\nReturns a new matrix, meaning that at the end of this function, until you \\c gsl_matrix_free() the original matrices, you will be taking up twice as much memory. Plan accordingly.\n\n\\param  m1  the upper/rightmost matrix (default: \\c NULL, in which case this copies \\c m2)\n\\param  m2  the second matrix (default: \\c NULL, in which case \\c m1 is returned)\n\\param  posn    If \\c 'r', stack rows on top of other rows. If \\c 'c' stack  columns next to columns. (default: \\c 'r')\n\\param  inplace If \\c 'y', use \\ref apop_matrix_realloc to modify \\c m1 in place; see the caveats on that function. Otherwise, allocate a new matrix, leaving \\c m1 undisturbed. (default: \\c 'n')\n\\return     the stacked data, either in a new matrix or a pointer to \\c m1.\n\nFor example, here is a function to merge four matrices into a single two-part-by-two-part matrix. The original matrices are unchanged.\n\\code\ngsl_matrix *apop_stack_two_by_two(gsl_matrix *ul, gsl_matrix *ur, gsl_matrix *dl, gsl_matrix *dr){\n  gsl_matrix *output, *t;\n    output = apop_matrix_stack(ul, ur, 'c');\n    t = apop_matrix_stack(dl, dr, 'c');\n    apop_matrix_stack(output, t, 'r', .inplace='y');\n    gsl_matrix_free(t);\n    return output;\n}\n\\endcode\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD gsl_matrix *apop_matrix_stack(gsl_matrix *m1, gsl_matrix const * m2, char posn, char inplace){\n    gsl_matrix *apop_varad_var(m1, NULL);\n    gsl_matrix const *apop_varad_var(m2, NULL);\n    char apop_varad_var(posn, 'r');\n    char apop_varad_var(inplace, 'n');\nAPOP_VAR_ENDHEAD\n    gsl_matrix      *out;\n    gsl_vector_view tmp_vector;\n    if (!m1 && m2){\n        out = gsl_matrix_alloc(m2->size1, m2->size2);\n        gsl_matrix_memcpy(out, m2);\n        return out;\n    } else if (!m2 && m1) {\n        if (inplace =='y')\n            return m1;\n        out = gsl_matrix_alloc(m1->size1, m1->size2);\n        gsl_matrix_memcpy(out, m1);\n        return out;\n    } else if (!m2  && !m1) \n        return NULL;\n\n    if (posn == 'r'){\n        Apop_stopif(m1->size2 != m2->size2, return NULL, 0, \"When stacking matrices on top of each other, they have to have the same number of columns, but  m1->size2==%zu and m2->size2==%zu. Returning NULL.\", m1->size2, m2->size2);\n        int m1size = m1->size1;\n        if (inplace =='y')\n            out = apop_matrix_realloc(m1, m1->size1 + m2->size1, m1->size2);\n        else {\n            out     = gsl_matrix_alloc(m1->size1 + m2->size1, m1->size2);\n            for (int i=0; i< m1size; i++){\n                    tmp_vector  = gsl_matrix_row(m1, i);\n                    gsl_matrix_set_row(out, i, &(tmp_vector.vector));\n            }\n        }\n        for (int i=m1size; i< m1size + m2->size1; i++){\n            gsl_vector_const_view tmp_vector = gsl_matrix_const_row(m2, i- m1size);\n            gsl_matrix_set_row(out, i, &(tmp_vector.vector));\n        }\n        return out;\n    } else {\n        Apop_stopif(m1->size1 != m2->size1, return NULL, 0, \"When stacking matrices side by side, \"\n                \"they have to have the same number of rows, but m1->size1==%zu and m2->size1==%zu. Returning NULL.\"\n                , m1->size1, m2->size1);\n        int m1size = m1->size2;\n        if (inplace =='y')\n            out = apop_matrix_realloc(m1, m1->size1, m1->size2 + m2->size2);\n        else {\n            out     = gsl_matrix_alloc(m1->size1, m1->size2 + m2->size2);\n            for (int i=0; i< m1size; i++)\n                gsl_matrix_set_col(out, i, Apop_mcv(m1, i));\n        }\n        for (int i=0; i< m2->size2; i++)\n            gsl_matrix_set_col(out, i+ m1size, Apop_mcv((gsl_matrix*)m2, i));\n        return out;\n    } \n}\n\n/** Test that all elements of a vector are within bounds, so you can preempt a procedure\nthat is about to break on infinite or too-large values.\n\n\\param in  A <tt>gsl_vector</tt>\n\\param max An upper and lower bound to the elements of the vector. (default: INFINITY)\n\\return  1 if everything is bounded: not Inf, -Inf, or NaN, and \\f$-\\max < x < \\max\\f$;<br> 0 otherwise. \n \n\\li A \\c NULL vector has no unbounded elements, so \\c NULL input returns 1. You get a warning if <tt>apop_opts.verbosity >=2</tt>.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD int apop_vector_bounded(const gsl_vector *in, long double max){\n    const gsl_vector * apop_varad_var(in, NULL)\n    Apop_stopif(!in, return 1, 2, \"You sent in a NULL vector; returning 1.\");\n    long double apop_varad_var(max, INFINITY)\nAPOP_VAR_END_HEAD\n    for (size_t i=0; i< in->size; i++){\n        double x = gsl_vector_get(in, i);\n        if (!gsl_finite(x) || x> max || x< -max)\n            return 0;\n    }\n    return 1;\n}\n\n\nstatic gsl_vector* dot_for_apop_dot(const gsl_matrix *m, const gsl_vector *v, \n                             const CBLAS_TRANSPOSE_t flip){\n    #define Check_gslv(...) if (__VA_ARGS__) {gsl_vector_free(out); out=NULL;}\n    gsl_vector *out = (flip ==CblasNoTrans)\n                        ? gsl_vector_calloc(m->size1)\n                        : gsl_vector_calloc(m->size2);\n    Check_gslv(gsl_blas_dgemv (flip, 1.0, m, v, 0.0, out))\n    return out;\n}\n\n/** A convenience function for dot products, which requires less prep and typing than the <tt>gsl_cblas_dgexx</tt> functions.\n\nIt makes use of the semi-overloading of the \\ref apop_data structure. \\c d1 may be a vector or a matrix, and the same for \\c d2, so this function can do vector dot matrix, matrix dot matrix, and so on. If \\c d1 includes both a vector and a matrix, then later parameters will indicate which to use.\n\n\\param d1 the left part of \\f$ d1 \\cdot d2\\f$\n\\param d2 the right part of \\f$ d1 \\cdot d2\\f$\n\\param form1 't' or 'p': transpose or prime \\c d1->matrix, or, if \\c d1->matrix is \\c NULL, read \\c d1->vector as a row vector.<br>\n                    'n' or 0: use matrix if present; no transpose. (the default)<br>\n                    'v': ignore the matrix and use the vector.\n\n\\param form2 As above, with \\c d2.\n\\return     an \\ref apop_data set. If two matrices come in, the vector element is \\c NULL and the \n            matrix has the dot product; if either or both are vectors,\n            the vector has the output and the matrix is \\c NULL.\n\n\\exception out->error='a'  Allocation error.\n\\exception out->error='d'  dimension-matching error.\n\\exception out->error='m'  GSL math error.\n\\exception NULL If you ask me to take the dot product of NULL, I return NULL.\n\n\\li Some systems auto-transpose non-conforming matrices. You input a \\f$3 \\times 5\\f$ and\na \\f$3 \\times 5\\f$ matrix, and the system assumes that you meant to transpose the second,\nproducing a \\f$(3 \\times 5) \\cdot (5 \\times 3) \\rightarrow (3 \\times 3)\\f$ output. Apophenia\ndoes not do this. First, it's ambiguous whether the output should be \\f$3 \\times 3\\f$\nor \\f$5 \\times 5\\f$. Second, your next run might have three observations, and two \\f$3 \\times 3\\f$ \nmatrices don't require transposition; auto-transposition thus creates situations where\nbugs can pop up on only some iterations of a loop.\n\\li For a vector \\f$\\cdot\\f$ a matrix, the vector is always treated as a row vector,\nmeaning that a \\f$(3\\times 1)\\f$ dot a \\f$(3\\times 4)\\f$ matrix is correct, and produces a\n\\f$(1 \\times 4)\\f$ vector.  For a matrix \\f$\\cdot\\f$ a vector, the vector is always treated\nas a column vector. Requests for transposing the vector are ignored in both cases.  \n\\li As a corrollary to the above rule, a vector dot a vector always produces a scalar,\n which will be put in the zeroth element of the output vector;\nsee the example. \n\\li If you want to multiply an \\f$N \\times 1\\f$ vector \\f$\\cdot\\f$ a \\f$1 \\times N\\f$\nvector to produce an \\f$N \\times N\\f$ matrix, then use \\ref apop_vector_to_matrix to turn\nyour vectors into matrices; see the example.\n\\li A note for readers of <em>Modeling with Data</em>: the awkward instructions on using\nthis function on p 130 are now obsolete, thanks to the designated initializer syntax\nfor function calls. Notably, in the case where <tt>d1</tt> is a vector and <tt>d2</tt>\na matrix, then <tt>apop_dot(d1,d2,'t')</tt> won't work, because <tt>'t'</tt> now refers\nto <tt>d1</tt>. Instead use <tt>apop_dot(d1,d2,.form2='t')</tt> or  <tt>apop_dot(d1,d2,0,\n't')</tt>\n\\li This function uses the \\ref designated syntax for inputs.\n\nSample code:\n\\include dot_products.c\n*/\nAPOP_VAR_HEAD apop_data * apop_dot(const apop_data *d1, const apop_data *d2, char form1, char form2){\n    const apop_data * apop_varad_var(d1, NULL)\n    const apop_data * apop_varad_var(d2, NULL)\n    Apop_stopif(!d1, return NULL, 1, \"d1 is NULL; returning NULL\");\n    Apop_stopif(!d2, return NULL, 1, \"d2 is NULL; returning NULL\");\n    char apop_varad_var(form1, 0)\n    char apop_varad_var(form2, 0)\nAPOP_VAR_ENDHEAD\n    Set_gsl_handler\n    int         uselm, userm;\n    gsl_matrix  *lm = d1->matrix, \n                *rm = d2->matrix;\n    gsl_vector  *lv = d1->vector, \n                *rv = d2->vector;\n\n    if (d1->matrix && form1 != 'v') uselm = 1;\n    else if (d1->vector)            uselm = 0;\n    else {\n        Apop_stopif(form1 == 'v', return NULL, 0,\n                    \"You asked for a vector from the left data set, but \"\n                    \"its vector==NULL. Returning NULL.\");\n        Apop_stopif(1, return NULL, 0, \"The left data set has neither non-NULL \"\n                                  \"matrix nor vector. Returning NULL.\");\n    }\n    if (d2->matrix && form2 != 'v') userm = 1;\n    else if (d2->vector)            userm = 0;\n    else {\n        Apop_stopif(form2 == 'v', return NULL, 0, \n                    \"You asked for a vector from the right data set, but \"\n                    \"its vector==NULL. Returning NULL.\");\n        Apop_stopif(1, return NULL, 0, \"The right data set has neither non-NULL \"\n                                  \"matrix nor vector. Returning NULL.\");\n    }\n    apop_data *out = apop_data_alloc();\n    #define Dimcheck(lr, lc, rr, rc) Apop_stopif((lc)!=(rr), out->error='d'; goto done,\\\n        0, \"mismatched dimensions: %zuX%zu dot %zuX%zu. %s\", (lr), (lc), (rr), (rc),\\\n        ((lr)==(rr)) ? \" Maybe transpose the first?\" \\\n        : ((rc)==(lc)) ? \" Maybe transpose the second?\" : \"\");\n\n    CBLAS_TRANSPOSE_t lt, rt;\n    lt  = (form1 == 'p' || form1 == 't' || form1 == 1) \n            ? CblasTrans: CblasNoTrans;\n    rt  = (form2 == 'p' || form2 == 't' || form2 == 1) \n            ? CblasTrans: CblasNoTrans;\n    if (uselm && userm){\n        Dimcheck((lt== CblasNoTrans) ? lm->size1:lm->size2,\n                 (lt== CblasNoTrans) ? lm->size2:lm->size1,\n                 (rt== CblasNoTrans) ? rm->size1:rm->size2,\n                 (rt== CblasNoTrans) ? rm->size2:rm->size1)\n        gsl_matrix *outm = gsl_matrix_calloc((lt== CblasTrans)? lm->size2: lm->size1, \n                                             (rt== CblasTrans)? rm->size1: rm->size2);\n        Check_gsl_with_out(gsl_blas_dgemm (lt,rt, 1, lm, rm, 0, outm))\n        out->matrix = outm;\n    } else if (!uselm && userm){\n        Dimcheck((size_t)1, lv->size,\n                 (rt== CblasNoTrans) ? rm->size1:rm->size2,\n                 (rt== CblasNoTrans) ? rm->size2:rm->size1)\n        //dgemv is always matrix first, then vector, so reverse from vm to mv:\n        // if output vector has dimension matrix->size2, send CblasTrans\n        // if output vector has dimension matrix->size1, send CblasNoTrans\n        out->vector = dot_for_apop_dot(rm, lv\n                        , (rt == CblasNoTrans) ? CblasTrans : CblasNoTrans);\n        Apop_stopif(!out->vector, out->error='m'; goto done, 0, \"GSL-level math error\");\n    } else if (uselm && !userm){\n        Dimcheck((lt== CblasNoTrans) ? lm->size1:lm->size2,\n                 (lt== CblasNoTrans) ? lm->size2:lm->size1,\n                  rv->size , (size_t)1)\n        out->vector = dot_for_apop_dot(lm, rv , lt);\n        Apop_stopif(!out->vector, out->error='m'; goto done, 0, \"GSL-level math error\");\n    } else if (!uselm && !userm){ \n        double outd;\n        Check_gsl_with_out(gsl_blas_ddot(lv, rv, &outd))\n        out->vector = gsl_vector_alloc(1);\n        gsl_vector_set(out->vector, 0, outd);\n    }\n\n    //If using the vector, there's no meaningful name to assign.\n    if (d1->names && uselm){\n        if (lt == CblasTrans) apop_name_stack(out->names, d1->names, 'r', 'c');\n        else                  apop_name_stack(out->names, d1->names, 'r');\n    }\n    if (d2->names && userm){\n        if (rt == CblasTrans) apop_name_stack(out->names, d2->names, 'c', 'r');\n        else                  apop_name_stack(out->names, d2->names, 'c');\n    }\n\ndone:\n    Unset_gsl_handler\n    return out;\n}\n"
  },
  {
    "path": "apop_linear_constraint.m4.c",
    "content": "/** \\file apop_linear_constraint.c \n  \\c apop_linear_constraint finds a point that meets a set of linear constraints. This takes a lot of machinery, so it gets its own file.\n\nCopyright (c) 2007, 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n*/\n#include \"apop_internal.h\"\n\nstatic double magnitude(gsl_vector *v){\n    double out;\n    gsl_blas_ddot(v, v, &out);\n    return out;\n}\n\nstatic void find_nearest_point(gsl_vector *V, double k, gsl_vector *B, gsl_vector *out){\n    /* Find X such that BX =K and there is an S such that X + SB=V. */\n    double S=0; //S = (BV-K)/B'B.\n    gsl_blas_ddot(B, V, &S);\n    S   -= k;\nassert(!gsl_isnan(S));\n    S   /= magnitude(B);\nassert(!gsl_isnan(S));\n    gsl_vector_memcpy(out, B); //X = -SB +V\n    gsl_vector_scale(out, -S);\n    gsl_vector_add(out, V);\nassert(!gsl_isnan(gsl_vector_get(out,0)));\n}\n\nstatic int binds(gsl_vector const *v, double k, gsl_vector const *b, double margin){\n    double d;\n    gsl_blas_ddot(v, b, &d);\n    return d < k + margin;\n}\n\nstatic double trig_bit(gsl_vector *dimv, gsl_vector *otherv, double off_by){\n    double theta, costheta, dot, out;\n    gsl_blas_ddot(dimv, otherv, &dot);\n    costheta = dot/(magnitude(dimv)*magnitude(otherv));\n    theta = acos(costheta);\n    out = off_by/gsl_pow_2(sin(theta)); \n    return out;\n}\n\n/* The hard part is when your candidate point does not satisfy other\n   constraints, so you need to translate the point until it meets the new hypersurface.\n   How far is that? Project beta onto the new surface, and find the\n   distance between that projection and the original surface. Then\n   translate beta toward the original surface by that amount. The\n   projection of the translated beta onto the new surface now also touches the old\n   surface.\n   */\nstatic void get_candiate(gsl_vector *beta, apop_data *constraint, int current, gsl_vector *candidate, double margin){\n    double k, ck, off_by, s;\n    gsl_vector *pseudobeta        = NULL;\n    gsl_vector *pseudocandidate   = NULL;\n    gsl_vector *pseudocandidate2  = NULL;\n    gsl_vector *fix               = NULL;\n    gsl_vector *cc = Apop_rv(constraint, current);\n    ck = gsl_vector_get(constraint->vector, current);\n    find_nearest_point(beta, ck, cc, candidate);\n    for (size_t i=0; i< constraint->vector->size; i++){\n        if (i!=current){\n            gsl_vector *other = Apop_rv(constraint, i);\n            k   =apop_data_get(constraint, i, -1);\n            if (binds(candidate, k, other, margin)){\n                if (!pseudobeta){\n                    pseudobeta          = gsl_vector_alloc(beta->size);\n                    gsl_vector_memcpy(pseudobeta, beta);\n                    pseudocandidate     = gsl_vector_alloc(beta->size);\n                    pseudocandidate2    = gsl_vector_alloc(beta->size);\n                    fix                 = gsl_vector_alloc(beta->size);\n                }\n                find_nearest_point(pseudobeta, k, other, pseudocandidate);\n                find_nearest_point(pseudocandidate, ck, cc, pseudocandidate2);\n                off_by  = apop_vector_distance(pseudocandidate, pseudocandidate2);\n                s       = trig_bit(cc, other, off_by);\n                gsl_vector_memcpy(fix, cc);\n                gsl_vector_scale(fix, magnitude(cc));\n                gsl_vector_scale(fix, s);\n                gsl_vector_add(pseudobeta, fix);\n                find_nearest_point(pseudobeta, k, other, candidate);\n                gsl_vector_memcpy(pseudobeta, candidate);\n            } \n        }\n    }\n    if (fix){ \n        gsl_vector_free(fix); gsl_vector_free(pseudobeta);\n        gsl_vector_free(pseudocandidate); gsl_vector_free(pseudocandidate2);\n    }\n}\n\n/** This is designed to be called from within the constraint method of your \\ref\napop_model. Just write the constraint vector+matrix and this will do the rest.\nSee \\ref constr for detailed discussion. \n \n\\param beta    The proposed vector about to be tested. No default, must not be \\c NULL.\n\n\\param constraint  \nA vector/matrix pair [v | m1 m2 ... mn] where each row is interpreted as a less-than inequality:\n\\f$v < m1x1+ m2x2 + ... + mnxn\\f$.  For example, say your constraints are \n\\f$3 < 2x + 4y - 7z\\f$ and \\f$y\\f$ is positive, i.e. \\f$0 < y\\f$.\nAllocate and fill the matrix representing these two constraints via:\n\\code\napop_data *constr = apop_data_falloc((2,2,3), 3,  2, 4, 7,\n                                              0,  0, 1, 0);\n\\endcode\n. Default: each elements is greater than zero. For three parameters this would be equivalent to setting\n\\code\napop_data *constr = apop_data_falloc((3,3,3), 0,  1, 0, 0,\n                                              0,  0, 1, 0,\n                                              0,  0, 0, 1);\n\\endcode\n\n\\param margin If zero, then this is a >= constraint, otherwise I will return a point this amount within the borders. You could try \\c GSL_DBL_EPSILON, which is the smallest value a \\c double can hold, or something like 1e-3. Default = 0.\n\n\\return The penalty: the distance between beta and the closest point that meets the constraints.\nIf the constraint is met, the penalty is zero.\nIf the constraint is not met, this \\c beta is shifted by \\c margin (Euclidean distance) to meet the constraints. \n\n  \\li If your \\ref apop_data has more structure than a vector, try \\ref apop_data_pack to pack it\ninto a vector. This is what \\ref apop_maximum_likelihood does.\n  \\li The function doesn't check for odd cases like coplanar constraints.\n  \\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD long double  apop_linear_constraint(gsl_vector *beta, apop_data * constraint, double margin){\n    static threadlocal apop_data *default_constraint;\n    gsl_vector * apop_varad_var(beta, NULL);\n    double apop_varad_var(margin, 0);\n    apop_data * apop_varad_var(constraint, NULL);\n    Apop_assert(beta, \"The vector to be checked is NULL.\");\n    if (!constraint){\n        if (default_constraint && beta->size != default_constraint->vector->size){\n            apop_data_free(default_constraint);\n            default_constraint = NULL;\n        }\n        if (!default_constraint){\n            default_constraint = apop_data_alloc(0,beta->size, beta->size);\n            default_constraint->vector = gsl_vector_calloc(beta->size);\n            gsl_matrix_set_identity(default_constraint->matrix);\n        }\n        constraint = default_constraint;\n    }\nAPOP_VAR_ENDHEAD\n    static threadlocal gsl_vector *closest_pt = NULL;\n    static threadlocal gsl_vector *candidate  = NULL;\n    static threadlocal gsl_vector *fix        = NULL;\n    int constraint_ct = constraint->matrix->size1;\n    int bindlist[constraint_ct];\n    int i, bound = 0;\n    /* For added efficiency, keep a scratch vector or two on hand. */\n    if (closest_pt==NULL || closest_pt->size != constraint->matrix->size2){\n        closest_pt  = gsl_vector_calloc(beta->size);\n        candidate   = gsl_vector_alloc(beta->size);\n        fix         = gsl_vector_alloc(beta->size);\n        closest_pt->data[0] = GSL_NEGINF;\n    }\n    /* Do any constraints bind?*/\n    memset(bindlist, 0, sizeof(int)*constraint_ct);\n    for (i=0; i< constraint_ct; i++){\n        gsl_vector *c = Apop_rv(constraint, i);\n        bound       +=\n        bindlist[i] = binds(beta, apop_data_get(constraint, i, -1), c, margin);\n    }\n    if (!bound) return 0;   //All constraints met.\n    gsl_vector *base_beta = apop_vector_copy(beta);\n    /* With only one constraint, it's easy. */\n    if (constraint->vector->size==1){\n        gsl_vector *c = Apop_rv(constraint, 0);\n        find_nearest_point(base_beta, constraint->vector->data[0], c, beta);\n        goto add_margin;\n    }\n    /* Finally, multiple constraints, at least one binding.\n       For each surface, pick a candidate point.\n       Check whether the point meets the other constraints. \n            if not, translate to a new point that works.\n            [Do this by maintaining a pseudopoint that translates by the\n            necessary amount.]\n        Once you have a candidate point, compare its distance to the\n        current favorite; keep the best.\n     */\n    for (i=0; i< constraint_ct; i++)\n        if (bindlist[i]){\n            get_candiate(base_beta, constraint, i, candidate, margin);\n            if(apop_vector_distance(base_beta, candidate) < apop_vector_distance(base_beta, closest_pt))\n                gsl_vector_memcpy(closest_pt, candidate);\n        }\n    gsl_vector_memcpy(beta, closest_pt);\nadd_margin:\n    for (i=0; i< constraint_ct; i++){\n        if(bindlist[i]){\n            gsl_vector_memcpy(fix, Apop_rv(constraint, i));\n            gsl_vector_scale(fix, magnitude(fix));\n            gsl_vector_scale(fix, margin);\n            gsl_vector_add(beta, fix);\n        }\n    }\n    long double out = apop_vector_distance(base_beta, beta);\n    gsl_vector_free(base_beta);\n    return out;\n}\n"
  },
  {
    "path": "apop_mapply.m4.c",
    "content": "/** \\file apop_mapply.c vector/matrix map/apply.  */\n/* Copyright (c) 2007, 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING. \n \n   This file is a tour de force of the if statement. There are several possibilities:\n\n   --user wants the vector, matrix, rows, columns, all items in the data set\n   --user wants output in a new location, or written to the old.\n   --user has extra parameters\n   --user needs to know the index of the function\n   --user wants the sum of the result (e.g., to find how many elements are NAN, or a sum of log-likelihoods).\n\n   Further, Apophenia v0.22 introduced variadic, optional arguments, so\n   we have a somewhat more robust syntax post-22, and also the prior syntax,\n   which some may find useful.\n\n   We thus have a lot of functions that all feed in to mapply_core, which, after several if statements,\n   then dispatches segments do different threads, and either the vectorloop or forloop that does all the\n   actual math.\n\n */\n#include \"apop_internal.h\"\n#include <stdbool.h>\nstatic gsl_vector*mapply_core(apop_data *d, gsl_matrix *m, gsl_vector *vin, void *fn, gsl_vector *vout, bool use_index, bool use_param,void *param, char post_22, bool by_apop_rows);\n\ntypedef double apop_fn_v(gsl_vector*);\ntypedef void apop_fn_vtov(gsl_vector*);\ntypedef double apop_fn_d(double);\ntypedef void apop_fn_dtov(double*);\ntypedef double apop_fn_r(apop_data*);\ntypedef double apop_fn_vp(gsl_vector*, void *);\ntypedef double apop_fn_dp(double, void *);\ntypedef double apop_fn_rp(apop_data*, void *);\ntypedef double apop_fn_vpi(gsl_vector*, void *, int);\ntypedef double apop_fn_dpi(double, void *, int);\ntypedef double apop_fn_rpi(apop_data*, void *, int);\ntypedef double apop_fn_vi(gsl_vector*, int);\ntypedef double apop_fn_di(double, int);\ntypedef double apop_fn_ri(apop_data*, int);\n\n\n/** Apply a function to every element of a data set, matrix or vector; or, apply a\nvector-taking function to every row or column of a matrix.\n\nYour function could take any combination of a \\c gsl_vector, a \\c double, an \\ref apop_data, a parameter set, and the position of the element in the vector or matrix. As such, the function takes twelve function inputs, one for each combination of vector/matrix, params/no params, index/no index. Fortunately, because \nthis function uses the \\ref designated syntax for inputs, you will specify only one.\n\nFor example, here is a function that will cut off each element of the input data to\nbetween \\f$(-1, +1)\\f$.  It takes in a lone \\c double and a parameter in a \\c void*,\nso it gets sent to \\ref apop_map via <tt>.fn_dp=cutoff</tt>.\n\\code\ndouble cutoff(double in, void *limit_in){ \n    double *limit = limit_in;\n    return GSL_MAX(-*limit, GSL_MIN(*limit, in)); \n}\n\ndouble param = 1;\napop_map(your_data, .fn_dp=cutoff, .param=&param, .inplace='y');\n\\endcode\n\n\\param fn_v A function of the form <tt>double your_fn(gsl_vector *in)</tt>\n\\param fn_d A function of the form <tt>double your_fn(double in)</tt>\n\\param fn_r A function of the form <tt>double your_fn(apop_data *in)</tt>\n\\param fn_vp A function of the form <tt>double your_fn(gsl_vector *in, void *param)</tt>\n\\param fn_dp A function of the form <tt>double your_fn(double in, void *param)</tt>\n\\param fn_rp A function of the form <tt>double your_fn(apop_data *in, void *param)</tt>\n\\param fn_vpi A function of the form <tt>double your_fn(gsl_vector *in, void *param, int index)</tt>\n\\param fn_dpi A function of the form <tt>double your_fn(double in, void *param, int index)</tt>\n\\param fn_rpi A function of the form <tt>double your_fn(apop_data *in, void *param, int index)</tt>\n\\param fn_vi A function of the form <tt>double your_fn(gsl_vector *in, int index)</tt>\n\\param fn_di A function of the form <tt>double your_fn(double in, int index)</tt>\n\\param fn_ri A function of the form <tt>double your_fn(apop_data *in, int index)</tt>\n\n\\param in   The input data set. If \\c NULL, I'll return \\c NULL immediately.\n\\param param   A pointer to the parameters to be passed to those function forms taking a \\c *param.\n\n\\param part Which part of the \\c apop_data struct should I use?<br>\n'v'==Just the vector<br>\n'm'==Every element of the matrix, in turn<br>\n'a'==Both 'v' and 'm'<br>\n'r'==Apply a function \\c gsl_vector \\f$\\to\\f$ \\c double to each row of the  matrix<br>\n'c'==Apply a function \\c gsl_vector \\f$\\to\\f$ \\c double to each column of the  matrix<br>\nDefault is 'a', but notice that I'll ignore a \\c NULL vector or matrix, so if your data set has only a vector or only a matrix, that's what I'll use.\n\n\\param all_pages If \\c 'y', then follow the \\c more pointer to subsequent pages. If \\c 'n',\n    handle only the first page of data.  Default: \\c 'n'. \n\n\\param inplace  If 'n' (the default), generate a new \\ref apop_data set for output,\nwhich will contain the mapped values (and the names from the original set).<br>\nIf 'y',\nmodify in place. The \\c double \\f$\\to\\f$ \\c double versions, \\c 'v', \\c 'm', and \\c\n'a', write to exactly the same location as before. The \\c gsl_vector \\f$\\to\\f$ \\c\ndouble versions, \\c 'r', and \\c 'c', will write to the vector. Be careful: if you\nare writing in place and there is already a vector there, then the original vector is\nlost.<br>\nIf 'v' (as in void), return \\c NULL.  (Default = 'n')\n\n\\exception out->error='p' missing or mismatched parts error, such as \\c NULL matrix when you sent a function acting on the matrix element.\n\n\\li The function forms with <tt>r</tt> in them, like \\c fn_ri, are row-by-row. I'll use\n\\ref Apop_r to get each row in turn, and send it to the function. The first\nimplication is that your function should be expecting a \\ref apop_data set with\nexactly one row in it. The second is that \\c part is ignored: it only makes sense to go\nrow-by-row. \n  \\li For these \\c r functions, if you set \\c inplace='y', then you will be modifying\nyour input data set, row by row; if you set \\c inplace='n', then I will return an \\ref\napop_data set whose \\c vector element is as long as your data set (i.e., as long as\nthe longest of your text, vector, or matrix parts).\n  \\li If you set <tt>omp_set_num_threads(n)</tt> using \\f$n>1\\f$,\nsplit the data set into as many chunks as you specify and process them\nsimultaneously. You need to watch out for the usual hang-ups about multithreaded\nprogramming, but if your data is iid, and each row's processing is independent of the\nothers, you should have no problems. Bear in mind that generating threads takes some\nsmall overhead, so simple cases like adding a few hundred numbers will actually be\nslower when threading.\n  \\li See \\ref mapply for many more examples and notes.\n\\see apop_map_sum\n\\ingroup all_public\n*/\nAPOP_VAR_HEAD apop_data* apop_map(apop_data *in, apop_fn_d *fn_d, apop_fn_v *fn_v, apop_fn_r *fn_r, apop_fn_dp *fn_dp, apop_fn_vp *fn_vp, apop_fn_rp *fn_rp,  apop_fn_dpi *fn_dpi, apop_fn_vpi *fn_vpi, apop_fn_rpi *fn_rpi, apop_fn_di *fn_di,  apop_fn_vi *fn_vi, apop_fn_ri *fn_ri, void *param, int inplace, char part, int all_pages){ \n    apop_data * apop_varad_var(in, NULL)\n    if (!in) return NULL;\n    apop_fn_v * apop_varad_var(fn_v, NULL)\n    apop_fn_d * apop_varad_var(fn_d, NULL)\n    apop_fn_r * apop_varad_var(fn_r, NULL)\n    apop_fn_vp * apop_varad_var(fn_vp, NULL)\n    apop_fn_dp * apop_varad_var(fn_dp, NULL)\n    apop_fn_rp * apop_varad_var(fn_rp, NULL)\n    apop_fn_vpi * apop_varad_var(fn_vpi, NULL)\n    apop_fn_dpi * apop_varad_var(fn_dpi, NULL)\n    apop_fn_rpi * apop_varad_var(fn_rpi, NULL)\n    apop_fn_vi * apop_varad_var(fn_vi, NULL)\n    apop_fn_di * apop_varad_var(fn_di, NULL)\n    apop_fn_ri * apop_varad_var(fn_ri, NULL)\n    int apop_varad_var(inplace, 'n')\n    void * apop_varad_var(param, NULL)\n    int by_vectors = fn_v || fn_vp || fn_vpi || fn_vi;\n    char apop_varad_var(part, by_vectors ? 'r' : 'a')\n    int apop_varad_var(all_pages, 'n')\nAPOP_VAR_ENDHEAD\n    int use_param = (fn_vp || fn_dp || fn_rp || fn_vpi || fn_rpi || fn_dpi);\n    int use_index  = (fn_vi || fn_di || fn_ri || fn_vpi || fn_rpi|| fn_dpi);\n    //Give me the first non-null input function.\n    void *fn = fn_v ? (void *)fn_v : fn_d ? (void *)fn_d : fn_r ? (void *)fn_r : fn_vp ? (void *)fn_vp : fn_dp ? (void *)fn_dp :fn_rp ? (void *)fn_rp : fn_vpi ? (void *)fn_vpi : fn_rpi ? (void *)fn_rpi: fn_dpi ? (void *)fn_dpi : fn_vi ? (void *)fn_vi : fn_di ? (void *)fn_di : fn_ri ? (void *)fn_ri : NULL;\n\n    int by_apop_rows = fn_r || fn_rp || fn_rpi || fn_ri;\n\n    Apop_stopif((part=='c' || part=='r') && (fn_d || fn_dp || fn_dpi || fn_di), \n                        apop_return_data_error(p),\n                        0, \"You asked for a vector-oriented operation (.part='r' or .part='c'), but \"\n                        \"gave me a scalar-oriented function. Did you mean part=='a'?\");\n\n    //Allocate output\n    Get_vmsizes(in); //vsize, msize1, msize2, maxsize\n    apop_data *out =   (inplace=='y') ? in\n                     : (inplace=='v') ? NULL\n                     : by_apop_rows ? apop_data_alloc(GSL_MAX(in->textsize[0], maxsize))\n                     : part == 'v' || (in->vector && ! in->matrix) ? apop_data_alloc(vsize)\n                     : part == 'm' ? apop_data_alloc(msize1, msize2)\n                     : part == 'a' ? apop_data_alloc(vsize, msize1, msize2)\n                     : part == 'r' ? apop_data_alloc(maxsize)\n                     : part == 'c' ?  apop_data_alloc(msize2) : NULL;\n    Apop_stopif(inplace=='y' && (part=='r'||part=='c') && !in->vector, in->vector=gsl_vector_alloc(maxsize), 2, \n                            \"No vector in your input data set for me to write outputs to; \"\n                            \"allocating one for you of size %i\", maxsize);\n    if (in->names && out && !(inplace=='y')){\n        if (part == 'v'  || (in->vector && ! in->matrix)) {\n             apop_name_stack(out->names, in->names, 'v');\n             apop_name_stack(out->names, in->names, 'r');\n        }\n        else if (part == 'm'){\n             apop_name_stack(out->names, in->names, 'r');\n             apop_name_stack(out->names, in->names, 'c');\n        }\n        else if (!by_apop_rows && part == 'a'){\n             apop_name_free(out->names);\n             out->names = apop_name_copy(in->names);\n        } else if (by_apop_rows || part == 'r')\n             apop_name_stack(out->names, in->names, 'r');\n        else if (part == 'c')\n            apop_name_stack(in->names, out->names, 'r', 'c');\n    }\n\n    if (by_apop_rows) mapply_core(in, NULL, NULL, fn, out ? out->vector : NULL, use_index, use_param, param, 'r', by_apop_rows);\n    else {\n        if (in->vector && (part == 'v' || part=='a'))\n            mapply_core(NULL, NULL, in->vector, fn, out ? out->vector : NULL, use_index, use_param, param, 'r', by_apop_rows);\n        if (in->matrix && (part == 'm' || part=='a')){\n            int smaller_dim = GSL_MIN(in->matrix->size1, in->matrix->size2);\n            for (int i=0; i< smaller_dim; i++){\n                if (smaller_dim == in->matrix->size1){\n                    gsl_vector *onevector = Apop_rv(in, i);\n                    if (inplace=='v')\n                         mapply_core(NULL, NULL, onevector, fn, NULL, use_index, use_param, param, 'r', by_apop_rows);\n                    else mapply_core(NULL, NULL, onevector, fn, Apop_rv(out, i), use_index, use_param, param, 'r', by_apop_rows);\n                } else {\n                    gsl_vector *onevector = Apop_cv(in, i);\n                    if (inplace=='v')\n                        mapply_core(NULL, NULL, onevector, fn, NULL, use_index, use_param, param, 'c', by_apop_rows);\n                    else {\n                        gsl_vector *twovector = Apop_cv(out, i);\n                        mapply_core(NULL, NULL, onevector, fn, twovector, use_index, use_param, param, 'c', by_apop_rows);\n                    }\n                }\n            }\n        }\n        if (part == 'r' || part == 'c'){\n            Apop_stopif(!in->matrix, if (!out) out=apop_data_alloc(); out->error='p'; return out,\n                           0, \"You asked for me to operate on the %cs of the matrix, but the matrix is NULL.\", part);\n            mapply_core(NULL, in->matrix, NULL, fn, out ? out->vector : NULL, use_index, use_param, param, part, by_apop_rows);\n        }\n    }\n    if ((all_pages=='y' || all_pages=='Y') && in->more){\n        out->more = apop_map_base(in->more, fn_d, fn_v, fn_r, fn_dp, fn_vp, fn_rp, fn_dpi, fn_vpi, fn_rpi, fn_di, fn_vi, fn_ri, param, inplace, part, all_pages);\n        Apop_stopif(out->more->error, out->error=out->more->error, 1, \"Error in subpage; marked parent page with same error code.\");\n    }\n    return out;\n}\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    void *fn;\n    gsl_matrix  *m;\n    gsl_vector  *v, *vin;\n    apop_data *d;\n    bool use_index, use_param;\n    char rc;\n    void *param;\n} threadpass;\n/** \\endcond */\n\n/* Mapply_core splits the database into an array of threadpass structs, then one of the following\n  ...loop functions gets called, which does the actual for loop to step through the rows/columns/elements.  */\n\nstatic void rowloop(threadpass *tc){\n    apop_fn_r   *rtod=tc->fn;\n    apop_fn_rp  *fn_rp=tc->fn;\n    apop_fn_rpi *fn_rpi=tc->fn;\n    apop_fn_ri  *fn_ri=tc->fn;\n    Get_vmsizes(tc->d); //maxsize\n    OMP_for (int i=0; i< maxsize; i++){\n        apop_data *onerow = Apop_r(tc->d, i);\n        double val = \n        tc->use_param ? (tc->use_index ? fn_rpi(onerow, tc->param, i) : fn_rp(onerow, tc->param) )\n                      : (tc->use_index ? fn_ri(onerow, i) : rtod(onerow) );\n        if (tc->v) gsl_vector_set(tc->v, i, val);\n    }\n}\n\nstatic void forloop(threadpass *tc){\n    apop_fn_v   *vtod=tc->fn;\n    apop_fn_vp  *fn_vp=tc->fn;\n    apop_fn_vpi *fn_vpi=tc->fn;\n    apop_fn_vi  *fn_vi=tc->fn;\n    int max = tc->rc == 'r' ? tc->m->size1 : tc->m->size2;\n    OMP_for (int i= 0; i< max; i++){\n        gsl_vector view = tc->rc == 'r' ? gsl_matrix_row(tc->m, i).vector : gsl_matrix_column(tc->m, i).vector;\n        double val  = \n            tc->use_param ? (tc->use_index ? fn_vpi(&view, tc->param, i) : fn_vp(&view, tc->param) )\n                      : (tc->use_index ? fn_vi(&view, i) : vtod(&view) );\n        if (tc->v) gsl_vector_set(tc->v, i, val);\n    }\n}\n\nstatic void oldforloop(threadpass *tc){\n    apop_fn_vtov *vtov=tc->fn;\n    if (tc->v){\n        tc->rc = 'r';\n        return forloop(tc);\n    }\n    OMP_for (int i=0; i< tc->m->size1; i++)\n        vtov(Apop_mrv(tc->m, i));\n}\n\n//if mapping to self, then set tc.v = in_v\nstatic void vectorloop(threadpass *tc){\n    apop_fn_d   *dtod=tc->fn;\n    apop_fn_dp  *fn_dp=tc->fn;\n    apop_fn_dpi *fn_dpi=tc->fn;\n    apop_fn_di  *fn_di=tc->fn;\n    OMP_for (int i= 0; i< tc->vin->size; i++){\n        double inval = gsl_vector_get(tc->vin, i);\n        double outval =\n        tc->use_param ? (tc->use_index ? fn_dpi(inval, tc->param, i) : \n                                     fn_dp(inval, tc->param))\n                     : (tc->use_index ? fn_di(inval, i) : \n                                     dtod(inval));\n        if (tc->v) gsl_vector_set(tc->v, i, outval);\n    }\n}\n\nstatic void oldvectorloop(threadpass *tc){\n    apop_fn_dtov *dtov=tc->fn;\n    if (tc->v) return vectorloop(tc);\n    OMP_for (int i= 0; i< tc->vin->size; i++){\n        double *inval = gsl_vector_ptr(tc->vin, i);\n        dtov(inval);\n    }\n}\n\nstatic gsl_vector*mapply_core(apop_data *d, gsl_matrix *m, gsl_vector *vin, void *fn, gsl_vector *vout, bool use_index, bool use_param, void *param, char post_22, bool by_apop_rows){\n    Get_vmsizes(d); //maxsize\n    threadpass tp =\n         (threadpass) {\n            .fn = fn, .m = m, .d = d,\n            .vin = vin, .v = vout,\n            .use_index = use_index, .use_param= use_param,\n            .param = param, .rc = post_22\n        };\n    if (by_apop_rows) rowloop(&tp);\n    else if (m) post_22 ? forloop(&tp) : oldforloop(&tp);\n    else        post_22 ? vectorloop(&tp) : oldvectorloop(&tp);\n    return vout;\n}\n\n/** Map a function onto every row of a matrix.  The function that you input takes in a\n\\c gsl_vector and returns a \\c double. This function will produce a sequence of vector\nviews of each row of the input matrix, and send each to your function. It will output\na \\c gsl_vector holding your function's output for each row.\n\n  \\param m  The matrix\n  \\param fn A function of the form <tt>double fn(gsl_vector* in)</tt>\n\n  \\return A \\c gsl_vector with the corresponding value for each row.\n\n  \\li If you input a \\c NULL matrix, I return \\c NULL.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\ngsl_vector *apop_matrix_map(const gsl_matrix *m, double (*fn)(gsl_vector*)){\n    if (!m) return NULL;\n    gsl_vector *out = gsl_vector_alloc(m->size1);\n    return mapply_core(NULL, (gsl_matrix*) m, NULL, fn, out, 0, 0, NULL, 0, false);\n}\n\n/** Apply a function to every row of a matrix.  The function that you input takes in\na \\c gsl_vector and returns nothing. \\c apop_matrix_apply will produce a vector view of\neach row, and send each row to your function.\n\n  \\param m  The matrix\n  \\param fn A function of the form <tt>void fn(gsl_vector* in)</tt> which may modify\n  the data at the \\c in pointer in place.\n\n  \\li If the matrix is \\c NULL, this is a no-op and returns immediately.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\nvoid apop_matrix_apply(gsl_matrix *m, void (*fn)(gsl_vector*)){\n    if (!m) return;\n    mapply_core(NULL, m, NULL, fn, NULL, 0, 0, NULL, 0, false);\n}\n\n/** Map a function onto every element of a vector. Thus function will send each\nelement to the function you provide, and will output a \\c gsl_vector holding your\nfunction's output for each row.\n\n  \\param v  The input vector\n  \\param fn A function of the form <tt>double fn(double in)</tt>\n\n  \\return A \\c gsl_vector (allocated by this function) with the corresponding value for each row.\n\n  \\li If you input a \\c NULL vector, I return \\c NULL.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\ngsl_vector *apop_vector_map(const gsl_vector *v, double (*fn)(double)){\n    if (!v) return NULL;\n    gsl_vector *out = gsl_vector_alloc(v->size);\n    return mapply_core(NULL, NULL, (gsl_vector*) v, fn, out, 0, 0, NULL, 0, false);\n}\n\n/** Apply a function to every row of a matrix.  The function that you input takes in\na \\c double* and may modify the input value in place. This function will send a pointer\nto each element of your vector to your function.\n\n  \\param v  The input vector\n  \\param fn A function of the form <tt>void fn(double in)</tt>\n\n  \\li If the vector is \\c NULL, this is a no-op.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map\n*/\nvoid apop_vector_apply(gsl_vector *v, void (*fn)(double*)){\n    if (!v) return;\n    mapply_core(NULL, NULL, v, fn, NULL, 0, 0, NULL, 0, false); }\n\nstatic void apop_matrix_map_all_vector_subfn(const gsl_vector *in, gsl_vector *outv, double (*fn)(double)){\n    mapply_core(NULL, NULL, (gsl_vector *) in, fn, outv, 0, 0, NULL, 0, false); }\n\n/** Maps a function to every element in a matrix (as opposed to every row).\n\n  \\param in The matrix whose elements will be inputs to the function\n  \\param fn A function with a form like <tt>double f(double in)</tt>.\n  \\return a matrix of the same size as the original, with the function applied.\n\n  \\li If you input a \\c NULL matrix, I return \\c NULL.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\n\ngsl_matrix * apop_matrix_map_all(const gsl_matrix *in, double (*fn)(double)){\n    if (!in) return NULL;\n    gsl_matrix *out = gsl_matrix_alloc(in->size1, in->size2);\n    OMP_for (size_t i=0; i< in->size1; i++){\n        gsl_vector_const_view inv = gsl_matrix_const_row(in, i);\n        apop_matrix_map_all_vector_subfn(&inv.vector, Apop_mrv(out, i), fn);\n    }\n    return out;\n}\n\n/** Applies a function to every element in a matrix (as opposed to every row)\n\n  \\param in The matrix whose elements will be inputs to the function\n  \\param fn A function with a form like <tt>void f(double *in)</tt> which may modify\n  the data at the \\c in pointer in place.\n\n  \\li If the matrix is \\c NULL, this is a no-op and returns immediately.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\nvoid apop_matrix_apply_all(gsl_matrix *in, void (*fn)(double *)){\n    if (!in) return;\n    OMP_for (size_t i=0; i< in->size1; i++){\n        apop_vector_apply(Apop_mrv(in, i), fn);\n    }\n}\n\n/** Returns the sum of the output of \\c apop_vector_map. For example,\n<tt>apop_vector_map_sum(v, isnan)</tt> returns the count of elements of <tt>v</tt>\nthat are \\c NaN.\n\n  \\li If you input a \\c NULL vector, I return the sum of zero items: zero.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\ndouble apop_vector_map_sum(const gsl_vector *in, double(*fn)(double)){\n    if (!in) return 0;\n    gsl_vector *m = apop_vector_map (in, fn);\n    double out = apop_vector_sum(m);\n    gsl_vector_free(m);\n    return out;\n}\n\n/** Like \\c apop_matrix_map_all, but returns the sum of the resulting mapped function. For example, <tt>apop_matrix_map_all_sum(v, isnan)</tt> returns the number of elements of <tt>m</tt> that are \\c NaN.\n\n  \\li If you input a \\c NULL matrix, I return the sum of zero items: zero.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\ndouble apop_matrix_map_all_sum(const gsl_matrix *in, double (*fn)(double)){\n    if (!in) return 0;\n    gsl_matrix *m = apop_matrix_map_all (in, fn);\n    double out = apop_matrix_sum(m);\n    gsl_matrix_free(m);\n    return out;\n}\n\n/** Like \\c apop_matrix_map, but returns the sum of the resulting mapped vector. For example, let \\c log_like be a function that returns the log likelihood of an input vector; then <tt>apop_matrix_map_sum(m, log_like)</tt> returns the total log likelihood of the rows of \\c m.\n\n  \\li If you input a \\c NULL matrix, I return the sum of zero items: zero.\n  \\li See \\ref mapply \"the map/apply page\" for details.\n\\see \\ref apop_map, \\ref apop_map_sum\n*/\ndouble apop_matrix_map_sum(const gsl_matrix *in, double (*fn)(gsl_vector*)){\n    if (!in) return 0;\n    gsl_vector *v = apop_matrix_map (in, fn);\n    double out = apop_vector_sum(v);\n    gsl_vector_free(v);\n    return out;\n}\n\n/** A function that effectively calls \\ref apop_map and returns the sum of the resulting\nelements. Thus, this function returns a \\c double. See the \\ref apop_map page for\ndetails of the inputs, which are the same here, except that \\c inplace doesn't make\nsense---this function will always just add up the input function outputs.\n\n\\li I don't copy the input data to send to your input function. Therefore, if your\nfunction modifies its inputs as a side-effect, your data set will be modified as this\nfunction runs.\n  \\li The sum of zero elements is zero, so that is what is returned if the input \\ref\napop_data set is \\c NULL. If <tt>apop_opts.verbose >= 2</tt> print a warning.\n  \\li See \\ref mapply for many more examples and notes.\n  \\li This function uses the \\ref designated syntax for inputs.\n\\ingroup all_public\n*/\nAPOP_VAR_HEAD double apop_map_sum(apop_data *in, apop_fn_d *fn_d, apop_fn_v *fn_v, apop_fn_r *fn_r, apop_fn_dp *fn_dp, apop_fn_vp *fn_vp, apop_fn_rp *fn_rp, apop_fn_dpi *fn_dpi,  apop_fn_vpi *fn_vpi, apop_fn_rpi *fn_rpi, apop_fn_di *fn_di, apop_fn_vi *fn_vi, apop_fn_ri *fn_ri, void *param, char part, int all_pages){ \n    apop_data * apop_varad_var(in, NULL)\n    Apop_stopif(!in, return 0, 2, \"NULL input. Returning zero.\");\n    apop_fn_v * apop_varad_var(fn_v, NULL)\n    apop_fn_d * apop_varad_var(fn_d, NULL)\n    apop_fn_r * apop_varad_var(fn_r, NULL)\n    apop_fn_vp * apop_varad_var(fn_vp, NULL)\n    apop_fn_dp * apop_varad_var(fn_dp, NULL)\n    apop_fn_rp * apop_varad_var(fn_rp, NULL)\n    apop_fn_vpi * apop_varad_var(fn_vpi, NULL)\n    apop_fn_dpi * apop_varad_var(fn_dpi, NULL)\n    apop_fn_rpi * apop_varad_var(fn_rpi, NULL)\n    apop_fn_vi * apop_varad_var(fn_vi, NULL)\n    apop_fn_di * apop_varad_var(fn_di, NULL)\n    apop_fn_ri * apop_varad_var(fn_ri, NULL)\n    void * apop_varad_var(param, NULL)\n    char apop_varad_var(part, ((fn_v||fn_vp||fn_vpi||fn_vi) ? 'r' : 'a'));\n    int apop_varad_var(all_pages, 'n')\nAPOP_VAR_ENDHEAD \n    apop_data *mapped = apop_map(in, .fn_d=fn_d, .fn_v=fn_v, .fn_r=fn_r, \n                        .fn_dp=fn_dp, .fn_vp=fn_vp, .fn_rp=fn_rp, \n                        .fn_dpi=fn_dpi,  .fn_vpi=fn_vpi, .fn_rpi=fn_rpi, \n                        .fn_di=fn_di, .fn_vi=fn_vi, .fn_ri=fn_ri, \n                        .param=param, .part=part, .inplace='n', .all_pages='n');\n    double outsum =   (mapped->vector ? apop_sum(mapped->vector) : 0)\n                    + (mapped->matrix ? apop_matrix_sum(mapped->matrix) : 0);\n    apop_data_free(mapped);\n    return outsum + \n                    (((all_pages=='y' || all_pages=='Y') && in->more) ? \n                        apop_map_sum_base(in->more, fn_d, fn_v, fn_r, fn_dp, \n                        fn_vp, fn_rp, fn_dpi, fn_vpi, fn_rpi, fn_di, fn_vi, \n                        fn_ri, param, part, all_pages) : 0);\n}\n/** \\} */\n"
  },
  {
    "path": "apop_mcmc.m4.c",
    "content": "/** \\file \n  Markov Chain Monte Carlo. */ \n/* Copyright (c) 2014 by Ben Klemens. Licensed under the GNU GPL v2; see COPYING. */\n\n#include \"apop_internal.h\"\n#include <stdbool.h>\n\n\n///default step and adapt fns.\n\nstatic void step_to_vector(double const *d, apop_mcmc_proposal_s *ps, apop_mcmc_settings *ms){\n    apop_model *m = ps->proposal;\n    memcpy(m->parameters->vector->data, d, sizeof(double)*m->parameters->vector->size);\n\n    ps->adapt_fn(ps, ms);\n}\n\nint sigma_adapt(apop_mcmc_proposal_s *ps, apop_mcmc_settings *ms){\n    apop_model *m = ps->proposal;\n    //accept rate. Add 1% * target to numerator; 1% to denominator, to slow early jumps\n    double ar = (ps->accept_count + .01*ms->periods *ms->target_accept_rate)\n               /(ps->accept_count + ps->reject_count + .01*ms->periods);\n/*    double std_dev_scale= (ar > ms->target_accept_rate) \n                        ? (2 - (1.-ar)/(1.-ms->target_accept_rate))\n                        : (1/(2-((ar+0.0)/ms->target_accept_rate)));\n                        */\n    double scale = ar/ms->target_accept_rate;\n    scale = 1+ (scale-1)/100.;\n    //gsl_matrix_scale(m->parameters->matrix, scale > .1? ( scale < 10 ? scale : 10) : .1);\n    gsl_matrix_scale(m->parameters->matrix, scale);\n    return 0;\n}\n\n/////// apop_mcmc_settings\n\nApop_settings_init(apop_mcmc,\n   Apop_varad_set(periods, 6e3);\n   Apop_varad_set(burnin, 0.05);\n   Apop_varad_set(target_accept_rate, 0.35);\n   Apop_varad_set(gibbs_chunks, 'b');\n   Apop_varad_set(start_at, '1');\n   Apop_varad_set(base_step_fn, step_to_vector);\n   Apop_varad_set(base_adapt_fn, sigma_adapt);\n   //all else defaults to zero/NULL\n)\n\nApop_settings_copy(apop_mcmc, \n    if (in->block_count){\n        out->proposals = calloc(in->block_count, sizeof(apop_mcmc_proposal_s));\n        for (int i=0; i< in->block_count; i++){\n            out->proposals[i] = in->proposals[i];\n            out->proposals[i].proposal = apop_model_copy(in->proposals[i].proposal);\n        }\n        out->proposal_is_cp=1;\n    }\n)\n\nApop_settings_free(apop_mcmc, \n        if (in->proposal_is_cp) {\n            for (int i=0; i< in->block_count; i++)\n                apop_model_free(in->proposals[i].proposal);\n        free(in->proposals);\n        }\n)\n\nstatic void setup_normal_proposals(apop_mcmc_proposal_s *s, int tsize, apop_mcmc_settings *settings){\n    apop_model *mvn =  apop_model_copy(apop_multivariate_normal);\n    mvn->parameters = apop_data_alloc(tsize, tsize, tsize);\n    gsl_vector_set_all(mvn->parameters->vector, 1);\n    gsl_matrix_set_identity(mvn->parameters->matrix);\n    s->proposal = mvn;\n    s->step_fn = settings->base_step_fn;\n    s->adapt_fn = settings->base_adapt_fn;\n}\n\nstatic void set_block_count_and_block_starts(apop_data *in, \n                                  apop_mcmc_settings *s, size_t total_len){\n    if (s->gibbs_chunks =='a') {\n        s->block_count = 1;\n        s->block_starts = calloc(2, sizeof(size_t));\n        s->block_starts[1] = total_len;\n    } else if (s->gibbs_chunks =='b') {\n        s->block_count = 0;\n        for (apop_data *d = in; d; d=d->more)\n            s->block_count += !!d->vector + !!d->matrix + !!d->weights;\n\n        s->block_starts = calloc(s->block_count+1, sizeof(size_t));\n        int this=1, ctr=0;\n        for (apop_data *d = in; d; d=d->more){\n            #define markit(test, value) if (test)  \\\n                s->block_starts[this++] = ctr += value; \n\n            markit(d->vector, d->vector->size);\n            markit(d->matrix, d->matrix->size1*d->matrix->size2);\n            markit(d->weights, d->weights->size);\n        }\n    } else { // item-by-item\n        s->block_count = total_len;\n        s->block_starts = calloc(total_len+1, sizeof(size_t));\n        for (int i=1; i<total_len+1; i++) s->block_starts[i] = i;\n    }\n}\n\nstatic void one_step(apop_data *d, gsl_vector *draw, apop_model *m, apop_mcmc_settings *s, gsl_rng *rng, int *constraint_fails, apop_data *out, size_t block, int out_row){\n    gsl_vector *clean_copy = apop_vector_copy(draw);\n    newdraw:\n    apop_draw(draw->data + s->block_starts[block], rng, s->proposals[block].proposal);\n    apop_data_unpack(draw, m->parameters);\n    if (m->constraint && m->constraint(d, m)){\n        (*constraint_fails)++;\n        goto newdraw;\n    }\n    double ll = apop_log_likelihood(d, m);\n\n    Apop_notify(3, \"ll=%g for parameters:\\t\", ll);\n    if (apop_opts.verbose >=3) apop_data_print(m->parameters, .output_pipe=apop_opts.log_file);\n\n    Apop_stopif(gsl_isnan(ll) || !isfinite(ll), goto newdraw, \n            1, \"Trouble evaluating the m function at vector beginning with %g. \"\n            \"Throwing it out and trying again.\\n\"\n            , m->parameters->vector->data[0]);\n\n    double ratio = ll - s->last_ll;\n    if (ratio >= 0 || log(gsl_rng_uniform(rng)) < ratio){//success\n        if (s->proposals[block].step_fn) \n            s->proposals[block].step_fn(draw->data + s->block_starts[block], s->proposals+block, s);\n        s->last_ll = ll;\n        s->proposals[block].accept_count++;\n        s->accept_count++;\n    } else {\n        s->proposals[block].reject_count++;\n        s->reject_count++;\n        Apop_notify(3, \"reject, with exp(ll_now-ll_proposal) = exp(%g-%g) = %g.\", ll, s->last_ll, exp(ratio));\n        gsl_vector_memcpy(draw, clean_copy);\n        apop_data_unpack(draw, m->parameters); //keep the last success in m->parameters.\n    }\n    if (out_row>=0) gsl_vector_memcpy(Apop_rv(out, out_row), draw);\n}\n\n\n/** The draw method for models estimated via \\ref apop_model_metropolis.\n\nThat method produces an \\ref apop_pmf, typically with a few thousand draws from the\nmodel in a batch. If you want to get a single next step from the Markov chain, use this.\n\nA Markov chain works by making a new draw and then accepting or rejecting the draw. If\nthe draw is rejected, the last value is reported as the next step in the chain. Users\nsometimes mitigate this repetition by making a batch of draws (say, ten at a time) and \nusing only the last.\n\nIf you run this without first running \\ref apop_model_metropolis, I will run it for\nyou, meaning that there will be an initial burn-in period before the first draw that\ncan be reported to you. That run is done using \\c model->data as input.\n\n\\param out An array of \\c doubles, which will hold the draw, in the style of \\ref apop_draw.\n\\param rng A \\c gsl_rng, already initialized, probably via \\ref apop_rng_alloc.\n\\param model A model which was probably already run through \\ref apop_model_metropolis.\n\\return On return, \\c out is filled with the next step in the Markov chain. The <tt>->data</tt> element of the PMF model is extended to include the additional steps in the chain.\nIf a proposal failed the model constraints, then return 1; else return 0. See the notes in the documentation for \\ref apop_model_metropolis.\n\n  \\li After pulling the attached settings group, the parent model is ignored. One expects\nthat \\c base_model in the mcmc settings group == the parent model.\n  \\li If your settings break the model parameters into several chunks, this function\nreturns after stepping through all chunks.\n\\ingroup all_public\n*/\nint apop_model_metropolis_draw(double *out, gsl_rng* rng, apop_model *model){\n    apop_mcmc_settings *s = apop_settings_get_group(model, apop_mcmc);\n    if (!s || !s->pmf) {\n        apop_model_metropolis(model->data, rng, model);\n        s = apop_settings_get_group(model, apop_mcmc);\n    }\n    int constraint_fails = 0;\nOMP_critical (metro_draw)\n{\n    apop_model *m = s->base_model;\n    gsl_vector_view vv = gsl_vector_view_array(out, s->block_starts[s->block_count]);\n    apop_data_pack(m->parameters, &(vv.vector));\n    apop_data *earlier_draws = s->pmf->data;\n\n    int block = 0, done = 0;\n    while (!done){\n        s->proposal_count++;\n        earlier_draws->matrix = apop_matrix_realloc(earlier_draws->matrix, earlier_draws->matrix->size1+1, earlier_draws->matrix->size2);\n        one_step(s->base_model->data, &(vv.vector), m, s, rng, &constraint_fails, \n                            earlier_draws, block, earlier_draws->matrix->size1-1);\n        block = (block+1) % s->block_count;\n        done = !block; //have looped back to the start.\n        s->proposals[block].adapt_fn(s->proposals+block, s);\n    }\n\n    Apop_stopif(constraint_fails, , 2, \"%i proposals failed to meet your model's parameter constraints\", constraint_fails);\n}\n    return !!constraint_fails;\n}\n\n\nvoid main_mcmc_loop(apop_data *d, apop_model *m, apop_data *out, gsl_vector *draw,\n                        apop_mcmc_settings *s, gsl_rng *rng, int *constraint_fails){\n\t\tdouble integerpart_periods_burnin = GSL_NAN; modf((double)(s->periods)*s->burnin,&integerpart_periods_burnin);\n    s->accept_count = 0;\n    int out_row = -lround(integerpart_periods_burnin);\n    int block = 0;\n    for (s->proposal_count=1; s->proposal_count< s->periods+1; s->proposal_count++, out_row++){\n        one_step(d, draw, m, s, rng, constraint_fails, out, block, out_row);\n        block = (block+1) % s->block_count;\n        s->proposals[block].adapt_fn(s->proposals+block, s);\n        //if (constraint_fails>10000) break;\n    }\n}\n\n/** Use <a href=\"https://en.wikipedia.org/wiki/Metropolis-Hastings\">Metropolis-Hastings\nMarkov chain Monte Carlo</a> to make draws from the given model.\n\nThe basic storyline is that draws are made from a proposal distribution, and the\nlikelihood of your model given your data and the drawn parameters evaluated. At each\nstep, a new set of proposal parameters are drawn, and if they are more likely\nthan the previous set the new proposal is accepted as the next step, else with probability (prob of new params)/(prob of old params),\nthey are accepted as the next step anyway. Otherwise the last accepted proposal is repeated.\n\nThe output is an \\ref apop_pmf model with a data set listing the draws that were\naccepted, including those repetitions. The output model is modified so that subsequent\ndraws are one more step from the Markov chain, via \\ref apop_model_metropolis_draw.\n\n\\param d The \\ref apop_data set used for evaluating the likelihood of a proposed parameter set.\n\n\\param rng A \\c gsl_rng, probably allocated via \\ref apop_rng_alloc. (Default: an RNG from \\ref apop_rng_get_thread)\n\n\\param m The \\ref apop_model from which parameters are being drawn. (No default; must not be \\c NULL)\n\n\\return A modified \\ref apop_pmf model representing the results of the search. It has\na specialized \\c draw method that returns another step from the Markov chain with each draw.\n\n\\exception out->error='c'  Proposal was outside of a constraint; see below.\n\n\\li If a proposal fails to meet the \\c constraint element of the model you input, then\nthe proposal is thrown out and a new one selected. By the default proposal\ndistribution, this is not mathematically correct (it breaks detailed balance),\nand values near the constraint will be oversampled. The output model will have\n<tt>outmodel->error=='c'</tt>. It is up to you to decide whether the resulting\ndistribution is good enough for your purposes or whether to take the time to write a\ncustom proposal and step function to accommodate the constraint.\n\nAttach an \\ref apop_mcmc_settings group to your model to specify the proposal\ndistribution, burnin, and other details of the search. See the \\ref apop_mcmc_settings\ndocumentation for details.\n\n  \\li The default proposal includes an adaptive step: you specify a target accept rate\n(default: .35), and if the accept rate is currently higher the variance of the proposals\nis widened to explore more of the space; if the accept rate is currently lower the\nvariance is narrowed to stay closer to the last accepted proposal. Technically, this\nbreaks ergodicity of the Markov chain, but the consensus seems to be that this is\nnot a serious problem. If it does concern you, you can set the \\c base_adapt_fn in the \\ref apop_mcmc_settings group to a do-nothing function, or one that damps its adaptation as \\f$n\\to\\infty\\f$.\n  \\li If you have a univariate model, \\ref apop_arms_draw may be a suitable simpler alternative.\n  \\li Note the \\c gibbs_chunks element of the \\ref apop_mcmc_settings group. If you set \\c\ngibbs_chunks='a', all parameters are drawn as a set, and accepted/rejected as a set. The\nvariances are adapted at an identical rate. If you set \\c gibbs_chunks='i',\nthen each scalar parameter is assigned its own proposal distribution, which is adapted\nat its own pace. With \\c gibbs_chunks='b' (the default), then each of the vector, matrix,\nand weights of your model's parameters are drawn/accepted/adapted as a block (and so\non to additional chunks if your model has <tt>->more</tt> pages). This works well for\ncomplex models which naturally break down into subsets of parameters.\n  \\li Each chunk counts as a step in the Markov chain. Therefore, if there are\nseveral chunks, you can expect chunks to repeat from step to step. If you want a\ndraw after cycling through all chunks, try using \\ref apop_model_metropolis_draw,\nwhich has that behavior.\n  \\li If the likelihood model has \\c NULL parameters, I will allocate them. That\nmeans you can use one of the stock models that ship with Apophenia. If I need\nto run the model's prep routine to get the size of the parameters, then I will\nmake a copy of the likelihood model, run prep, and then allocate parameters\nfor that copy of a model.\n  \\li On exit, the \\c parameters element of your likelihood model has the last accepted parameter proposal.\n  \\li If you set <tt>apop_opts.verbose=2</tt> or greater, I will report the accept\nrate of the M-H sampler. It is a common rule of thumb to select a proposal so that\nthis is between 20% and 50%. Set <tt>apop_opts.verbose=3</tt> to see the stream\nof proposal points, their likelihoods, and the acceptance odds. You may want to\nset <tt>apop_opts.log_file=fopen(\"yourlog\", \"w\")</tt> first.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_model *apop_model_metropolis(apop_data *d, gsl_rng *rng, apop_model *m){\n    apop_data *apop_varad_var(d, NULL);\n    apop_model *apop_varad_var(m, NULL);\n    Apop_stopif(!m, return NULL, 0, \"NULL model input.\");\n    gsl_rng *apop_varad_var(rng, apop_rng_get_thread(-1));\nAPOP_VAR_END_HEAD\n    apop_model *outp;\n    OMP_critical(metropolis)\n    {\n    apop_mcmc_settings *s = apop_settings_get_group(m, apop_mcmc);\n    if (!s)\n        s = Apop_model_add_group(m, apop_mcmc);\n    apop_prep(d, m); //typically a no-op\n    s->last_ll = GSL_NEGINF;\n    gsl_vector * drawv = apop_data_pack(m->parameters);\n    const double double_periods = (double)(s->periods);\n    Apop_stopif(s->burnin > 1, s->burnin/=double_periods,\n                1, \"Burn-in should be a fraction of the number of periods, \"\n                   \"not a whole number of periods. Rescaling to burnin=%g.\"\n                   , s->burnin/double_periods);\n\t\tdouble integerpart_periods_cburnin = GSL_NAN; modf(double_periods*(1.0-s->burnin),&integerpart_periods_cburnin);\n\t\tconst size_t data_size1 = llround(integerpart_periods_cburnin);\n    apop_data *out = apop_data_alloc(data_size1, drawv->size);\n\n    if (!s->proposals){\n        set_block_count_and_block_starts(m->parameters, s, drawv->size);\n        s->proposals = calloc(s->block_count, sizeof(apop_mcmc_proposal_s));\n        s->proposal_is_cp = 1;\n        for (int i=0; i< s->block_count; i++){\n            apop_mcmc_proposal_s *p = s->proposals+i;\n            setup_normal_proposals(p, s->block_starts[i+1]-s->block_starts[i], s);\n            if (!p->proposal->parameters) {\n                apop_prep(NULL, p->proposal+i);\n                if(p->proposal->parameters->matrix) gsl_matrix_set_all(p->proposal->parameters->matrix, 1);\n                if(p->proposal->parameters->vector) gsl_vector_set_all(p->proposal->parameters->vector, 1);\n            }\n        }\n    }\n\n    //if s->start_at =='p', we already have m->parameters in drawv.\n    if (s->start_at == '1') gsl_vector_set_all(drawv, 1);\n    int constraint_fails = 0;\n\n    main_mcmc_loop(d, m, out, drawv, s, rng, &constraint_fails);\n\n    Apop_notify(2, \"M-H sampling accept percent = %3.3f%%\", 100*(0.0+s->accept_count)/s->periods);\n    Apop_stopif(constraint_fails, out->error='c', 2, \"%i proposals failed to meet your model's parameter constraints\", constraint_fails);\n\n    out->weights = gsl_vector_alloc(s->periods*(1-s->burnin));\n    gsl_vector_set_all(out->weights, 1);\n    outp = apop_estimate(out, apop_pmf);\n    s->pmf = outp;\n    s->base_model = m;\n    outp->draw = apop_model_metropolis_draw;\n    apop_settings_copy_group(outp, m, \"apop_mcmc\");\n\n    gsl_vector_free(drawv);\n    }\n    return outp;\n}\n"
  },
  {
    "path": "apop_missing_data.m4.c",
    "content": "/** \\file apop_missing_data.c Some missing data handlers. */\n/* Copyright (c) 2007, 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <regex.h>\n\n\n/** If there is an NaN anywhere in the row of data (including the matrix, the vector, the weights, and the text) then delete the row from the data set.\n\n\\li If every row has a NaN, then this returns \\c NULL.\n\\li If \\c apop_opts.nan_string is not \\c NULL, then I will make case-insensitive comparisons to the text elements to check for bad data as well.\n\\li If \\c inplace = 'y', then I'll free each element of the input data\n    set and refill it with the pruned elements. I'll still take up (up to)\n    twice the size of the data set in memory during the function. If\n    every row has a NaN, then your \\c apop_data set will end up with\n    \\c NULL vector, matrix, .... if \\c inplace = 'n', then the original data set is\n    left where it was, though internal elements may be moved.\n\\li I only look at the first page of data (i.e. the \\c more element is ignored).\n\\li Listwise deletion is often not a statistically valid means of dealing with missing data.\n    It is typically better to impute the data (preferably multiple times). See \\ref\n    apop_ml_impute for a less-invalid means, or <a href=\"https://github.com/rodri363/tea\">Tea\n    for survey imputation</a> for heavy-duty survey editing and imputation.\n\\li This function uses the \\ref designated syntax for inputs.\n\n\\param d       The data, with NaNs\n\\param inplace If \\c 'y', clear out the pointer-to-\\ref apop_data that\nyou sent in and refill with the pruned data. If \\c 'n', leave the\nset alone and return a new data set. Default=\\c 'n'.\n\\return        A (potentially shorter) copy of the data set, without\nNaNs. If <tt>inplace=='y'</tt>, a pointer to the input, which was shortened in place. If the entire data set is cleared out, then this will be \\c NULL.\n\\see apop_data_rm_rows\n*/\nAPOP_VAR_HEAD apop_data * apop_data_listwise_delete(apop_data *d, char inplace){\n    apop_data * apop_varad_var(d, NULL);\n    if (!d) return NULL;\n    char apop_varad_var(inplace, 'n');\nAPOP_VAR_ENDHEAD\n    Get_vmsizes(d) //defines firstcol, vsize, wsize, msize1, msize2.\n    Apop_stopif(!msize1 && !vsize && !*d->textsize, return NULL, 0, \n            \"You sent to apop_data_listwise_delete a data set with NULL matrix, NULL vector, and no text. \"\n            \"Confused, it is returning NULL.\");\n    //find out where the NaNs are\n    int len = GSL_MAX(vsize ? vsize : msize1, d->textsize[0]); //still some size assumptions here.\n    int not_empty = 0;\n    int *marked = calloc(len, sizeof(int));\n    for (int i=0; i< (vsize ? vsize: msize1); i++)\n        for (int j=firstcol; j <msize2; j++){\n            if (gsl_isnan(apop_data_get(d, i, j))){\n                    marked[i] = 1;\n                    break;\n            }\n        }\n    for (int i=0; i< wsize; i++)\n        if (gsl_isnan(gsl_vector_get(d->weights, i)))\n            marked[i] = 1;\n    if (d->textsize[0] && apop_opts.nan_string){\n        for(int i=0; i< d->textsize[0]; i++)\n            if (!marked[i])\n                for(int j=0; j< d->textsize[1]; j++)\n                    if (!strcasecmp(apop_opts.nan_string, d->text[i][j])){\n                        marked[i] ++;\n                        break;\n                    }\n    }\n\n    //check that at least something isn't NULL.\n    for (int i=0; i< len; i++)\n        if (!marked[i]){\n            not_empty ++;\n            break;\n        }\n    if (!not_empty){\n        free(marked);\n        return NULL;\n    }\n    apop_data *out = (inplace=='y'|| inplace=='Y') ? d : apop_data_copy(d);\n    apop_data_rm_rows(out, marked);\n    free(marked);\n    return out;\n}\n\n//ML imputation\n\n/** \\hideinitializer */\n#define Switch_back    \\\n    apop_data *real_data = ml_model->parameters;   \\\n    apop_model *actual_base = ml_model->more; \\\n    actual_base->parameters = d; \n\nstatic void i_est(apop_data *d, apop_model *ml_model){\n    Switch_back\n    actual_base = apop_estimate(real_data, actual_base);\n}\n\nstatic long double i_ll(apop_data *d, apop_model *ml_model){\n    Switch_back\n    return apop_log_likelihood(real_data, actual_base);\n}\n\nstatic long double i_p(apop_data *d, apop_model *ml_model){\n    Switch_back\n    return apop_p(real_data, actual_base);\n}\n\n//doesn't actually move the parameters\nstatic long double i_constraint(apop_data *d, apop_model *ml_model){\n    Switch_back\n    if (!actual_base->constraint) return 0;\n    apop_data *original_params = apop_data_copy(actual_base->parameters);\n    long double out = actual_base->constraint(real_data, actual_base);\n    if (out) apop_data_memcpy(actual_base->parameters, original_params);\n    apop_data_free(original_params);\n    return out;\n}\n\napop_model *apop_swap_model = &(apop_model){\"Model with data and params swapped\", .estimate=i_est, .p = i_p, .log_likelihood=i_ll, .constraint = i_constraint};\n\n/** Impute the most likely data points to replace NaNs in the data, and insert them into \nthe given data. That is, the data set is modified in place.\n\nHow it works: this uses the machinery for \\ref apop_model_fix_params. The only difference is \nthat this searches over the data space and takes the parameter space as fixed, while basic \nfix params model searches parameters and takes data as fixed. So this function just does the\nnecessary data-parameter switching to make that happen.\n\n\\param  d       The data set. It comes in with NaNs and leaves entirely filled in.\n\\param  mvn A parametrized \\ref apop_model from which you expect the data was derived.\nif \\c NULL, then I'll use the Multivariate Normal that best fits the data after listwise deletion.\n\n\\return An estimated \\ref apop_model. Also, the data input will be filled in and ready to use.\n*/\napop_model * apop_ml_impute(apop_data *d,  apop_model* mvn){\n    if (!mvn){\n        apop_data *list_d = apop_data_listwise_delete(d);\n        Apop_stopif(!list_d, return NULL, 0, \"Listwise deletion returned no whole rows, \"\n                            \"so I couldn't fit a Multivariate Normal to your data. \"\n                            \"Please provide a pre-estimated initial model.\");\n        mvn = apop_estimate(list_d, apop_multivariate_normal);\n        apop_data_free(list_d);\n    }\n    apop_model *impute_me = apop_model_copy(apop_swap_model);\n    impute_me->parameters = d;\n    impute_me->more = mvn;\n    apop_model *fixed = apop_model_fix_params(impute_me);\n    Apop_model_add_group(fixed, apop_parts_wanted);\n    apop_model *m = apop_estimate(mvn->parameters, fixed);\n    apop_model_free(fixed);\n    return m;\n}\n"
  },
  {
    "path": "apop_mle.m4.c",
    "content": "/** \\file apop_mle.c */\n/*Copyright (c) 2006--2010 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n#include \"apop_internal.h\"\n#include <setjmp.h>\n#include <signal.h>\n#include <gsl/gsl_deriv.h>\n#include <gsl/gsl_siman.h>\n#include <gsl/gsl_randist.h>\n#include <gsl/gsl_multimin.h>\n#include <gsl/gsl_multiroots.h>\n\ntypedef long double (*apop_fn_with_params) (apop_data *, apop_model *);\ntypedef\tvoid (*apop_df_with_void)(const gsl_vector *beta, void *d, gsl_vector *gradient);\ntypedef\tvoid (*apop_fdf_with_void)(const gsl_vector *beta, void *d, double *f, gsl_vector *df);\n\n/** \\cond doxy_ignore */\ntypedef struct {\n\tgsl_vector\t*beta;\n\tint\t\t    dimension;\n} grad_params;\n\ntypedef struct {\n    apop_model *model;\n    apop_data *data;\n    apop_fn_with_params *f;\n    grad_params *gp; //Used only by apop_internal_numerical_gradient.\n    gsl_vector  *beta, *starting_pt;\n    int         use_constraint;\n    double      best_ll;\n    char        want_cov, want_predicted, want_tests, want_info;\n    jmp_buf     bad_eval_jump;\n    apop_data** path;\n}   infostruct;\n/** \\endcond */ //End of Doxygen ignore.\n\nstatic apop_model * find_roots (infostruct p); //see end of file.\n\nm4_define(<|default_delta|>,1e-3) //as a macro, we can put it in documentation\n\n/* Generate support fns (esp. initializers) for apop_mle_settings and apop_parts_wanted structs. */\nApop_settings_copy(apop_parts_wanted, )\nApop_settings_free(apop_parts_wanted, )\nApop_settings_init(apop_parts_wanted,\n    Apop_varad_set(covariance, 'n');\n    Apop_varad_set(predicted, 'n');\n    Apop_varad_set(tests, 'n');\n    Apop_varad_set(info, 'n');\n)\n\nApop_settings_copy(apop_mle, )\nApop_settings_free(apop_mle, )\nApop_settings_init(apop_mle,\n    Apop_varad_set(starting_pt, NULL);\n    Apop_varad_set(tolerance, 1e-5);\n    Apop_varad_set(max_iterations, 5000);\n    Apop_varad_set(method, \"\");//default picked in apop_maximum_likelihood\n    Apop_varad_set(verbose, 0);\n    Apop_varad_set(step_size, 0.05);\n    Apop_varad_set(delta, default_delta);\n    Apop_varad_set(dim_cycle_tolerance, 0);\n//siman:\n    //siman also uses step_size  = 1.;  \n    Apop_varad_set(n_tries, 5);  //The number of points to try for each step. \n    Apop_varad_set(iters_fixed_T, 5);   //The number of iterations at each temperature. \n    Apop_varad_set(k, 1.0);  //The maximum step size in the random walk. \n    Apop_varad_set(t_initial, 50);   //cooling schedule data\n    Apop_varad_set(mu_t, 1.002); \n    Apop_varad_set(t_min, 5.0e-1);\n    Apop_varad_set(rng, NULL);\n)\n\n//      MLE support functions\n//Including numerical differentiation and a couple of functions to\n//negate the likelihood fns without bothering the user.\n\nstatic void apop_annealing(infostruct*); //below.\n\nstatic double one_d(double b, void *in){\n    infostruct *i  = in;\n    long double penalty = 0;\n    gsl_vector_set(i->gp->beta, i->gp->dimension, b);\n    apop_data_unpack(i->gp->beta, i->model->parameters);\n\tif (i->model->constraint)\n\t\tpenalty\t= i->model->constraint(i->data, i->model);\n\treturn (*(i->f))(i->data, i->model) + penalty;\n}\n\n//Numeric first and second derivatives.\n\n/* For each element of the parameter set, jiggle it to find its\n gradient. Return a vector as long as the parameter list. */\nstatic void apop_internal_numerical_gradient(apop_fn_with_params ll, \n                            infostruct* info, gsl_vector *out, double delta){\n    double result, err;\n    gsl_vector *beta = apop_data_pack(info->model->parameters);\n    infostruct i = *info;\n    i.f = &ll;\n    i.gp = &(grad_params){ .beta = gsl_vector_alloc(beta->size)};\n    gsl_function F = { .function= one_d, \n                       .params\t= &i };\n\tfor (size_t j=0; j< beta->size; j++){\n\t\ti.gp->dimension = j;\n\t\tgsl_vector_memcpy(i.gp->beta, beta);\n\t\tgsl_deriv_central(&F, gsl_vector_get(beta,j), delta, &result, &err);\n\t\tgsl_vector_set(out, j, result);\n\t}\n    gsl_vector_free(beta);\n}\n\n/**\nA wrapper around the GSL's one-dimensional \\c gsl_deriv_central to find a numeric differential for each dimension of the input \\ref apop_model's log likelihood (or \\c p if \\c log_likelihood is \\c NULL).\n\n\\param data The \\ref apop_data set to use for all evaluations.\n\\param model The \\ref apop_model, expressing the function whose derivative is sought. The gradient is taken via small changes along the model parameters.\n\\param delta The size of the differential. (default: default_delta, but see below)\n \n \\code\n gsl_vector *gradient = apop_numerical_gradient(data, your_parametrized_model);\n \\endcode\n\n\\li If you do not set \\ref delta as an input, I first look for an \\ref apop_mle_settings\n    group attached to the input model, and check that for a \\c delta element. If that is\n    also missing, use the default of default_delta.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD gsl_vector * apop_numerical_gradient(apop_data *data, apop_model *model, double delta){\n    apop_data * apop_varad_var(data, NULL);\n    apop_model * apop_varad_var(model, NULL);\n    Nullcheck(model, NULL) Nullcheck_p(model, NULL)\n    double apop_varad_var(delta, 0);\n    if (!delta){\n        apop_mle_settings *mp = apop_settings_get_group(model, apop_mle);\n        delta = mp ? mp->delta : default_delta;\n    }\nAPOP_VAR_ENDHEAD\n    Get_vmsizes(model->parameters); //tsize\n    apop_fn_with_params ll = model->log_likelihood ? model->log_likelihood : model->p;\n    Apop_stopif(!ll, return NULL, 0, \"Input model has neither p nor log_likelihood method. Returning NULL.\");\n    gsl_vector *out = gsl_vector_calloc(tsize);\n    infostruct i = (infostruct) {.model = model, .data = data};\n    apop_internal_numerical_gradient(ll, &i, out, delta);\n    return out;\n}\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    apop_model *base_model;\n    int *current_index;\n} apop_model_for_infomatrix_struct;\n/** \\endcond */\n\nstatic long double apop_fn_for_infomatrix(apop_data *d, apop_model *m){\n    static threadlocal gsl_vector *v = NULL;\n    apop_model_for_infomatrix_struct *settings = m->more;\n    apop_model *mm = settings->base_model;\n    apop_score_type ms = apop_score_vtable_get(mm);\n    if (ms){\n        Get_vmsizes(mm->parameters); //tsize\n        if (!v || v->size != tsize){\n            if (v) gsl_vector_free(v);\n            v = gsl_vector_alloc(tsize);\n        }\n        ms(d, v, mm);\n        return gsl_vector_get(v, *settings->current_index);\n    } //else:\n        gsl_vector *vv = apop_numerical_gradient(d, mm);\n        double out = gsl_vector_get(vv, *settings->current_index);\n        gsl_vector_free(vv);\n        return out;\n}\n\napop_model *apop_model_for_infomatrix = &(apop_model){\"Ad hoc model for working out the information matrix.\", \n                                                .log_likelihood = apop_fn_for_infomatrix};\n\n/** Numerically estimate the matrix of second derivatives of the parameter values, via\na series of re-evaluations at small differential steps. [Therefore, it may be expensive\nto do this for a very computationally-intensive model.]\n\n\\param data The \\ref apop_data at which the model was estimated (default: \\c NULL)\n\\param model The \\ref apop_model, with parameters already estimated (no default, must not be \\c NULL)\n\\param delta the step size for the differentials. (default: default_delta, but see below)\n\\return The matrix of estimated second derivatives at the given data and parameter values.\n \n\\li If you do not set \\ref delta as an input, I first look for an \\ref apop_mle_settings group attached to the input model, and check that for a \\c delta element. If that is also missing, use the default of default_delta.\n\\li This function uses the \\ref designated syntax for inputs.\n */\nAPOP_VAR_HEAD apop_data * apop_model_hessian(apop_data * data, apop_model *model, double delta){\n    apop_data * apop_varad_var(data, NULL);\n    apop_model * apop_varad_var(model, NULL);\n    Nullcheck(model, NULL)\n    double apop_varad_var(delta, 0);\n    if (!delta){\n        apop_mle_settings *mp = apop_settings_get_group(model, apop_mle);\n        delta = mp ? mp->delta : default_delta;\n    }\nAPOP_VAR_ENDHEAD\n    int k;\n    Get_vmsizes(model->parameters) //tsize\n    size_t betasize  = tsize;\n    apop_data *out = apop_data_calloc(0, betasize, betasize);\n    gsl_vector *dscore = gsl_vector_alloc(betasize);\n    apop_model_for_infomatrix_struct ms = { .base_model = model, .current_index = &k, };\n    apop_model *m = apop_model_copy(apop_model_for_infomatrix);\n    m->parameters = model->parameters;\n    m->more = &ms;\n    if (apop_settings_get_group(model, apop_mle))\n        apop_settings_copy_group(m, model, \"apop_mle\");\n    for (k=0; k< betasize; k++){\n        dscore = apop_numerical_gradient(data, m, delta);\n        //We get two estimates of the (k,j)th element, which are often very close,\n        //and take the mean.\n        for (size_t j=0; j< betasize; j++){\n            *gsl_matrix_ptr(out->matrix, k, j) += gsl_vector_get(dscore, j)/2;\n            *gsl_matrix_ptr(out->matrix, j, k) += gsl_vector_get(dscore, j)/2;\n        }\n        gsl_vector_free(dscore);\n    }\n    if (model->parameters->names->row){\n        apop_name_stack(out->names, model->parameters->names, 'r');\n        apop_name_stack(out->names, model->parameters->names, 'c', 'r');\n    }\n    return out;\n}\n\n/** Produce the covariance matrix for the parameters of an estimated model via the derivative of the score function at the parameter. I.e., I find the second derivative via \\ref apop_model_hessian , and take the negation of the inverse.\n\nI follow Efron and Hinkley in using the estimated information matrix---the value of the information matrix at the estimated value of the score---not the expected information matrix that is the integral over all possible data. See Pawitan 2001 (who cribbed a little off of Efron and Hinkley) or Klemens 2008 (who directly cribbed off of both) for further details. \n\n\\param data The data by which your model was estimated\n\\param model A model whose parameters have been estimated.\n\\param delta The differential by which to step for sampling changes. (default: default_delta, but see below)\n\\return A covariance matrix for the data. Also, if the data does not have a\n <tt>\"<Covariance>\"</tt> page, I'll set it to the result as well [i.e., I won't overwrite an\n existing covariance page].  \n\n\\li If you do not set \\ref delta as an input, I first look for an \\ref apop_mle_settings group attached to the input model, and check that for a \\c delta element. If that is also missing, use the default of default_delta.\n\\li This function uses the \\ref designated syntax for inputs.\n */\nAPOP_VAR_HEAD apop_data * apop_model_numerical_covariance(apop_data * data, apop_model *model, double delta){\n    apop_data * apop_varad_var(data, NULL);\n    apop_model * apop_varad_var(model, NULL);\n    Nullcheck(model, NULL)\n    double apop_varad_var(delta, 0);\n    if (!delta){\n        apop_mle_settings *mp = apop_settings_get_group(model, apop_mle);\n        delta = mp ? mp->delta : default_delta;\n    }\nAPOP_VAR_ENDHEAD\n    apop_data *hessian = apop_model_hessian(data, model, delta);\n    if (apop_opts.verbose > 1){\n        printf(\"The estimated Hessian:\\n\");\n        apop_data_show(hessian);\n    }\n    apop_data *out = apop_data_alloc();\n    out->matrix = apop_matrix_inverse(hessian->matrix);\n    gsl_matrix_scale(out->matrix, -1);\n    if (hessian->names->row){\n        apop_name_stack(out->names, hessian->names, 'r');\n        apop_name_stack(out->names, hessian->names, 'c');\n    }\n    apop_data_free(hessian);\n    if (!apop_data_get_page(model->parameters, \"<Covariance>\"))\n        apop_data_add_page(model->parameters, out, \"<Covariance>\");\n    return out;\n}\n\n///On to the interfaces between the models and the methods\n\nstatic void tracepath(const gsl_vector *beta, double value, apop_data **path){\n    size_t msize1 = (*path && (*path)->matrix) ? (*path)->matrix->size1: 0;\n    if (!*path) {\n        *path = apop_data_alloc();\n        (*path)->names->title = strdup(\"Path of ML search\");\n        apop_name_add((*path)->names, \"f(x)\", 'v');\n        apop_name_add((*path)->names, \"x\", 'm');\n    }\n    (*path)->matrix = apop_matrix_realloc((*path)->matrix, msize1+1, beta->size);\n    gsl_vector_memcpy(Apop_rv(*path, msize1), beta);\n\n    (*path)->vector = apop_vector_realloc((*path)->vector, msize1+1);\n    gsl_vector_set((*path)->vector, msize1, value);\n}\n\n/* Every actual evaluation of the function go through the negshell and dnegshell fns,\n   because there are several things that have to be done beyond just getting\n   model.log_likelihood:\n\n--Negate, because statisticians and social scientists like to maximize; physicists like to minimize.\n--Work out if the model provides log_likelihood or p.\n--Call \\ref trace_path if needed.\n--Go from a single vector to a full apop_data set and back (via apop_data_pack/unpack)\n--Check the derivative function if available.\n--Check constraints.\n*/\n\nstatic double negshell (const gsl_vector *beta, void * in){\n    infostruct *i = in;\n    double penalty = 0,\n           out     = 0; \n    long double (*f)(apop_data *, apop_model *);\n    f = i->model->log_likelihood? i->model->log_likelihood : i->model->p;\n    Apop_stopif(!f, longjmp(i->bad_eval_jump, -1),\n                0, \"The model you sent to the MLE function has neither log_likelihood element nor p element.\");\n    apop_data_unpack(beta, i->model->parameters);\n\tif (i->use_constraint && i->model->constraint)\n\t\tpenalty\t= i->model->constraint(i->data, i->model);\n    if (penalty) apop_data_pack(i->model->parameters, (gsl_vector*) beta);\n    double f_val = f(i->data, i->model);\n    out = penalty - f_val; //negative llikelihood\n    Apop_stopif(gsl_isnan(out), longjmp(i->bad_eval_jump, -1),\n                0, \"I got a NaN in evaluating the objective function.%s\", \n                    !i->model->constraint ? \" Maybe add a constraint to your model?\" : \"\");\n    if (i->path) tracepath(i->model->parameters->vector, -out, i->path);\n    if (i->want_info =='y'){\n        //I report the log likelihood under the assumption that the final param set \n        //matches the best ll evaluated.\n        long double this_ll = i->model->log_likelihood? -out : log(-out); //negative negative llikelihood.\n\n        if(gsl_isnan(this_ll)){\n            Apop_stopif(!i->model->log_likelihood && penalty > f_val, i->want_info='n',\n                            1, \"Model's p=%g, penalty=%g, for a negative adjusted p=%g. \"\n                               \"Continuing, but can not report covariance or \"\n                               \"log likelihood-based statistics.\", f_val, penalty, f_val-penalty);\n            Apop_stopif(1, apop_data_show(i->model->parameters); i->want_info='n',\n                        1, \"NaN resulted from the following value tried by the maximum likelihood system.\");\n        }\n        i->best_ll = GSL_MAX(i->best_ll, this_ll);\n    }\n    return out;\n}\n\nstatic int dnegshell (const gsl_vector *beta, void * in, gsl_vector * g){\n/* The derivative-calculating routine.\nIf the constraint binds\n    then: take the numerical derivative of negshell, which will be the\n    numerical derivative of the penalty.\n    else: just find dlog_likelihood. If the model doesn't have a\n    dlog likelihood or the user asked to ignore it, then the main\n    maximum likelihood fn replaced model.score with\n    apop_numerical_gradient anyway.\nFinally, reverse the sign, since the GSL is trying to minimize instead of maximize.\n*/\n    infostruct *i = in;\n    apop_mle_settings *mp =  apop_settings_get_group(i->model, apop_mle);\n    apop_data_unpack(beta, i->model->parameters);\n    /* In all cases, negshell gets called first, so the constraint is already\n       checked and beta nudged accordingly.\n    if(i->model->constraint && i->model->constraint(i->data, i->model))\n            apop_data_pack(i->model->parameters, (gsl_vector *) beta); */\n    apop_score_type ms = apop_score_vtable_get(i->model);\n    if (ms) ms(i->data, g, i->model);\n    else {\n        apop_fn_with_params ll = i->model->log_likelihood ? i->model->log_likelihood : i->model->p;\n        apop_internal_numerical_gradient(ll, i, g, mp->delta);\n    }\n    if (mp->path) negshell (beta,  in);\n    gsl_vector_scale(g, -1);\n    return GSL_SUCCESS;\n}\n\n//This is just to satisfy the GSL's format.\nstatic void fdf_shell(const gsl_vector *beta, void *i, double *f, gsl_vector *df){\n    *f\t= negshell(beta, i);\n    dnegshell(beta, i, df);\n}\n\nstatic int ctrl_c;\nstatic void mle_sigint(int){ ctrl_c ++; }\n\nstatic int setup_starting_point(apop_mle_settings *mp, gsl_vector *x){\n    Apop_stopif(!x, return -1, 0, \"The vector I'm trying to optimize over is NULL.\");\n\tif (!mp->starting_pt) gsl_vector_set_all (x, 1);\n\telse for (int i=0; i< x->size; i++)\n            x->data[i] = mp->starting_pt[i];\n    return 0;\n}\n\nvoid add_info_criteria(apop_data *d, apop_model *m, apop_model *est, double ll, int param_ct){\n    //Did the sending function save last value of f()?\n    if (!ll) ll = apop_log_likelihood(d, m);\n\n    if (!est->info) est->info = apop_data_alloc();\n    apop_data_add_named_elmt(est->info, \"log likelihood\", ll);\n    double AIC = 2*param_ct - 2 *ll;\n    apop_data_add_named_elmt(est->info, \"AIC\", AIC);\n    if (d){//some models have NULL data.\n        int n;\n        {Get_vmsizes(d); n = maxsize;}\n        apop_data_add_named_elmt(est->info, \"AIC_c\", AIC  + 2*param_ct *(param_ct + 1.0)/(n - param_ct - 1.0));\n        Get_vmsizes(d); //vsize, msize1, tsize\n        apop_data_add_named_elmt(est->info, \"BIC\", param_ct * log(n) - 2 *ll);\n    }\n}\n\nstatic void auxinfo(apop_data *params, infostruct *i, int status, double ll){\n    apop_model *est = i->model; //just an alias.\n    /* This catches too many near-misses\n       if(est->constraint)\n        apop_assert(!est->constraint(i->data, est), \"the maximum likelihood search ended \"\n                                            \"at a point that doesn't satisfy the model's constraints.\");*/\n    if (i->want_cov=='y' && est->parameters){\n        apop_model_numerical_covariance(i->data, est, Apop_settings_get(est,apop_mle,delta));\n        if (i->want_tests=='y')\n            apop_estimate_parameter_tests (est);\n    }\n    if (!est->info) est->info = apop_data_alloc();\n    apop_data_add_named_elmt(est->info, \"status\", status);\n    if (i->want_info=='y') add_info_criteria(i->data, i->model, est, ll, i->beta->size);\n}\n\nstatic void apop_maximum_likelihood_w_d(apop_data * data, infostruct *i){\n/* The maximum likelihood calculations, given a derivative of the log likelihood.\n\nIf no derivative exists, will calculate a numerical gradient.\n\nInside the infostruct, you'll find these elements:\n\n\\param data\tthe data matrix\n\\param\tdist\tthe \\ref apop_model object: probit, zipf, &c.\n\\param\tstarting_pt\tan array of doubles suggesting a starting point. If NULL, use a vector whose elements are all 0.1 (zero has too many pathological cases).\n\\param step_size\tthe initial step size.\n\\param tolerance\tthe precision the minimizer uses. Only vaguely related to the precision of the actual var.\n\\return\tan \\ref apop_model with the parameter estimates, &c. If returned_estimate->status == 0, then optimum parameters were found; if status != 0, then there were problems.\n*/\n    gsl_multimin_fdfminimizer *s;\n    apop_model\t\t*est = i->model; //just an alias.\n    apop_mle_settings *mp  = apop_settings_get_group(est, apop_mle);\n    int iter \t= 0, \n\t    status  = 0,\n\t    apopstatus  = 0,\n\t    betasize= i->beta->size;\n    if (!strcasecmp(mp->method, \"BFGS cg\"))\n\t    s = gsl_multimin_fdfminimizer_alloc(gsl_multimin_fdfminimizer_vector_bfgs2, betasize);\n    else if (!strcasecmp(mp->method, \"PR cg\"))\n\t    s = gsl_multimin_fdfminimizer_alloc(gsl_multimin_fdfminimizer_conjugate_pr, betasize);\n    else //Default:    \"FR CG\"      conjugate gradient (Fletcher-Reeves)\n\t    s = gsl_multimin_fdfminimizer_alloc(gsl_multimin_fdfminimizer_conjugate_fr, betasize);\n    gsl_multimin_function_fdf minme = {\n        .f\t\t= negshell,\n        .df\t    = (apop_df_with_void) dnegshell,\n        .fdf\t= (apop_fdf_with_void) fdf_shell,\n        .n\t\t= betasize,\n        .params\t= i};\n    ctrl_c = 0;\n    if (setjmp(i->bad_eval_jump))\n        Apop_stopif(1, return, 0, \"Failure evaluating likelihood at the starting point. Add a starting point?\");\n\tgsl_multimin_fdfminimizer_set (s, &minme, i->beta, mp->step_size, mp->tolerance);\n    signal(SIGINT, mle_sigint);\n    do { \t\n        iter++;\n        if (setjmp(i->bad_eval_jump)) {\n            apopstatus = -1;\n            break;\n        }\n        status \t= gsl_multimin_fdfminimizer_iterate(s);\n        if(status && status!=GSL_CONTINUE) break; //commented out error msg because too many GSL_ENOPROG false positives.\n        //Apop_stopif(status && status!=GSL_CONTINUE, break, 0, \"GSL error: %s\", gsl_strerror(status));\n        status = gsl_multimin_test_gradient(s->gradient,  mp->tolerance);\n        if(status && status!=GSL_CONTINUE) break; //commented out error msg because too many GSL_ENOPROG false positives.\n        //Apop_stopif(status && status!=GSL_CONTINUE, break, 0, \"GSL error: %s\", gsl_strerror(status));\n        if (mp->verbose)\n            printf (\"%5i %.5f  f()=%10.5f gradient=%.3f\\n\", iter, gsl_vector_get (s->x, 0),  s->f, gsl_vector_get(s->gradient,0));\n        Apop_stopif(status == GSL_SUCCESS, apopstatus=0, 2, \"Optimum found.\");\n    } while (status == GSL_CONTINUE && iter < mp->max_iterations && !ctrl_c);\n    signal(SIGINT, NULL);\n\tApop_stopif(iter==mp->max_iterations, apopstatus = -1, 1, \"Max iterations reached, implying that I did not find an optimum.\");\n\t//Clean up, copy results to output estimate.\n    apop_data_unpack(s->x, est->parameters);\n\tgsl_multimin_fdfminimizer_free(s);\n    gsl_vector_free(i->beta);\n    auxinfo(est->parameters, i, apopstatus, i->best_ll);\n}\n\n/* See apop_maximum_likelihood_w_d for notes. */\nstatic void apop_maximum_likelihood_no_d(apop_data * data, infostruct * i){\n    apop_model *est = i->model;\n    apop_mle_settings *mp = apop_settings_get_group(est, apop_mle);\n    int status=0,\n        apopstatus = 0,\n        iter = 0,\n        betasize= i->beta->size;\n    gsl_multimin_fminimizer *s;\n    gsl_vector *ss;\n    double size;\n    s = gsl_multimin_fminimizer_alloc(gsl_multimin_fminimizer_nmsimplex, betasize);\n    ss = gsl_vector_alloc(betasize);\n    ctrl_c      =\n    apopstatus = 0; //assume failure until we score a success.\n    gsl_vector_set_all (ss,  mp->step_size);\n    gsl_multimin_function  minme = {.f = negshell, .n= betasize, .params = i};\n    if (setjmp(i->bad_eval_jump))\n        Apop_stopif(1, return, 0, \"Failure evaluating likelihood at the starting point. Add a starting point?\");\n    gsl_multimin_fminimizer_set (s, &minme, i->beta,  ss);\n    //i->beta = s->x;\n    signal(SIGINT, mle_sigint);\n    do {  \n        iter++;\n        if (setjmp(i->bad_eval_jump)) {\n            apopstatus = -1;\n            break;\n        }\n    status  = gsl_multimin_fminimizer_iterate(s);\n    if (status)  break; \n    size = gsl_multimin_fminimizer_size(s);\n    status  = gsl_multimin_test_size (size, mp->tolerance); \n    if(mp->verbose){\n        printf (\"%5d \", iter);\n        for (size_t j = 0; j < betasize; j++) \n            printf (\"%8.3e \", gsl_vector_get (s->x, j)); \n        printf (\"f()=%7.3f size=%.3f\\n\", s->fval, size);\n            if (status == GSL_SUCCESS) {\n                  printf (\"Optimum found at:\\n\");\n                  printf (\"%5d \", iter);\n                  for (size_t j = 0; j < betasize; j++)\n                      printf (\"%8.3e \", gsl_vector_get (s->x, j)); \n                  printf (\"f()=%7.3f size=%.3f\\n\", s->fval, size);\n            }\n        }\n    } while (status == GSL_CONTINUE && iter < mp->max_iterations && !ctrl_c);\n    signal(SIGINT, NULL);\n\tApop_stopif(iter == mp->max_iterations && mp->verbose, /*continue*/, \n                1, \"Optimization reached maximum number of iterations.\");\n    if (status == GSL_SUCCESS) apopstatus = 0;\n    apop_data_unpack(s->x, est->parameters);\n\tgsl_multimin_fminimizer_free(s);\n    auxinfo(est->parameters, i, apopstatus, i->best_ll);\n}\n\n/*There is a basically standard location for the log likelihood. Search there, and if you don't\nfind it, then recalculate it.*/\nstatic double get_ll(apop_data *d, apop_model *est){\n    int index = est->info ? apop_name_find(est->info->names, \"log likelihood\", 'r') : -2;\n    if (index>-2) return apop_data_get(est->info, index);\n    //last resort: recalculate\n    return apop_log_likelihood(d, est);\n}\n\nstatic void dim_cycle(apop_data *d, apop_model *est, infostruct info){\n    double last_ll, this_ll = GSL_NEGINF;\n    int iteration = 0;\n    apop_mle_settings *mp = Apop_settings_get_group(est, apop_mle);\n    double tol = mp->dim_cycle_tolerance;\n    int betasize = info.beta->size;\n    Apop_settings_set(est, apop_mle, dim_cycle_tolerance, 0);//so sub-estimations won't use this function.\n    gsl_vector *paramv = apop_data_pack(est->parameters);\n    apop_model *full_est = NULL; //an alias\n    do {\n        if (mp->verbose){\n            if (!(iteration++))\n                printf(\"Cycling toward an optimum. Listing (dim):log likelihood.\\n\");\n            printf(\"Iteration %i:\\n\", iteration);\n        }\n        last_ll = this_ll;\n        for (int i=0; i< betasize; i++){\n            gsl_vector_set(info.beta, i, GSL_NAN);\n            apop_data_unpack(info.beta, est->parameters);\n            apop_model *m_onedim = apop_model_fix_params(est);\n            apop_prep(d, m_onedim);\n            apop_maximum_likelihood(d, m_onedim);\n            gsl_vector_set(info.beta, i, m_onedim->parameters->vector->data[0]);\n            full_est = apop_model_fix_params_get_base(m_onedim);//points to est, but filled.\n            this_ll = get_ll(d, full_est);//only used on the last iteration.\n            if (mp->verbose) printf(\"(%i):%g\\t\", i, this_ll), fflush(NULL);\n            apop_model_free(m_onedim);\n        }\n        if (mp->verbose) printf(\"\\n\");\n        apop_data_pack(full_est->parameters, paramv);\n        Apop_settings_add(est, apop_mle, starting_pt, paramv->data);\n    } while (fabs(this_ll - last_ll) > tol);\n    Apop_settings_set(est, apop_mle, dim_cycle_tolerance, tol);\n    gsl_vector_free(paramv);\n}\n\nvoid get_desires(apop_model *m, infostruct *info){\n    apop_parts_wanted_settings *want = apop_settings_get_group(m, apop_parts_wanted);\n\n    info->want_tests = (want && want->tests =='y') ? 'y' : 'n';\n    info->want_cov = (info->want_tests=='y' || (want && want->covariance =='y'))\n                            ? 'y' : 'n';\n    info->want_info = want ? (want->info =='y' ? 'y' : 'n') : 'y';\n\n    //doesn't do anything at the moment.\n    info->want_predicted = (want && want->predicted =='y') ? 'y' : 'n';\n}\n\nint check_method (char *m){\n#define Onecheck(str) if (!strcasecmp(m, #str)) return 0;\nif(!m || strlen(m)==0) return 0;\nOnecheck(NM simplex)\nOnecheck(FR cg)\nOnecheck(BFGS cg)\nOnecheck(PR cg)\nOnecheck(Annealing)\nOnecheck(Newton)\nOnecheck(Newton hybrid)\nOnecheck(Newton hybrid no scale)\nreturn 1;\n}\n\n/** Find the likelihood-maximizing parameters of a model given data.\n\n\\li I assume that \\ref apop_prep has been called on your model. The easiest way to guarantee this is to use \\ref apop_estimate, which calls this function if the input model has no \\c estimate method.\n\n\\li All of the settings are specified by adding a\n  \\ref apop_mle_settings struct to your model, so see the many notes there. Notably,\n  the default method is the Fletcher-Reeves conjugate gradient method, and if your model\n  does not have a dlog likelihood function, then a numeric gradient will be calculated\n  via \\ref apop_numerical_gradient. Add an \\ref apop_mle_settings group to your model\n  to set tuning parameters or select other methods, including the Nelder-Mead simplex,\n  simulated annealing, and root-finding.\n\n\\param data\t    An \\ref apop_data set.\n\n\\param\tdist\tThe \\ref apop_model object: \\ref apop_gamma, \\ref apop_probit, \\ref apop_zipf, &amp;c. You can add\n    an \\c apop_mle_settings struct to it (<tt>Apop_model_add_group(your_model, apop_mle,\n    .verbose=1, .method=\"PR cg\", and_so_on)</tt>).\n\n\\return\tNone, but the input model is modified to include the parameter estimates, &c. \n\n\\li There is auxiliary info in the <tt>->info</tt> element of the post-estimation struct. Get elements via, e.g.:\n\\code\napop_model *est = apop_estimate(your_data, apop_probit);\n\n\nint status = apop_data_get(est->info, .rowname=\"status\");\nif (status)\n    //trouble\nelse\n    //optimum found\n    apop_data_print(est->parameters); //Here are the estimated parameters\n\\endcode\n\n\\li During the search for an optimum, ctrl-C (SIGINT) will halt the search, and the function will return whatever parameters the search was on at the time.\n*/\nvoid apop_maximum_likelihood(apop_data * data, apop_model *dist){\n    apop_mle_settings *mp = apop_settings_get_group(dist, apop_mle);\n    if (!mp) mp = Apop_model_add_group(dist, apop_mle);\n    apop_score_type ms = apop_score_vtable_get(dist);\n    Apop_stopif(check_method(mp->method), return, 0, \"You set the method='%s', \"\n            \"which is not on my list of allowable methods. See the apop_mle_settings \"\n            \"documentation for the list of options\", mp->method);\n    if (!mp->method || !strlen(mp->method)) mp->method = ms ? \"FR cg\" : \"NM simplex\";\n\n    Apop_stopif(!dist->parameters, dist->error='p'; return, 0, \"Not enough information to allocate parameters over which to optimize. If this was not called from apop_estimate, did you call apop_prep first?\");\n    infostruct info = {.data           = data,\n                       .use_constraint = 1,\n                       .path           = mp->path,\n                       .model          = dist};\n    get_desires(dist, &info);\n    info.beta = apop_data_pack(dist->parameters);\n    if (setup_starting_point(mp, info.beta)) return;\n    info.model->data = data;\n    if (mp->dim_cycle_tolerance)            dim_cycle(data, dist, info);\n    else if (!strcasecmp(mp->method, \"annealing\"))   apop_annealing(&info);  //below.\n    else if (!strcasecmp(mp->method, \"NM simplex\"))  apop_maximum_likelihood_no_d(data, &info);\n    else if (!strcasecmp(mp->method, \"Newton\") ||\n            !strcasecmp(mp->method, \"Newton hybrid\")||\n            !strcasecmp(mp->method, \"Newton hybrid no scale\")) find_roots (info);\n    else   /* Conjugate Gradient*/   apop_maximum_likelihood_w_d(data, &info);\n}\n\n/** Maximum likelihod searches are not guaranteed to find a global optimum, and it can be\ndifficult to tune a search such that it covers a wide space, but also accurately hones in\non the optimum. In both cases, one could restart the search using a different starting\npoint or different parameters.\n\nThe simplest use of this function is to restart a model at the latest parameter estimates.\n\n  \\code\napop_model *m = apop_estimate(data, model_using_an_MLE_search);\nfor (int i=0; i< 10; i++)\n    m = apop_estimate_restart(m);\napop_data_show(m);\n  \\endcode\n\nBy adding a line to reduce the tolerance each round [e.g., <tt>Apop_settings_set(m, apop_mle, tolerance, pow(10,-i))</tt>], you can start broad and hone in on a precise optimum.\n \nYou may have a new estimation method, such as first doing a coarse simulated annealing search, then a fine conjugate gradient search. When reading this example, recall that the form for adding a new settings group differs from the form for modifying existing settings:\n  \\code\nApop_model_add_settings(your_base_model, apop_mle, .method=APOP_SIMAN);\napop_model *m = apop_estimate(data, your_base_model);\nApop_settings_set(m, apop_mle, method, APOP_CG_PR);\nm = apop_estimate_restart(m);\napop_data_show(m);\n  \\endcode\n\nOnly one estimate is returned, either the one you sent in or a new\none. The loser (which may be the one you sent in) is freed, to prevent memory leaks.\n\n\\param e   An \\ref apop_model that is the output from a prior MLE estimation. (No default, must not be \\c NULL.)\n\\param copy  Another not-yet-parametrized model that will be re-estimated with (1) the same data and (2) a <tt>starting_pt</tt> as per the next setting (probably\n to the parameters of <tt>e</tt>). If this is <tt>NULL</tt>, then copy <tt>e</tt>. (Default = \\c NULL)\n\\param starting_pt \"ep\"=last estimate of the first model (i.e., its current parameter estimates)<br>\n\"es\"= starting point originally used by the first model<br>\n\"np\"=current parameters of the new (second) model<br>\n\"ns\"=starting point specified by the new model's MLE settings. (default = \"ep\")\n\\param boundary I test whether the starting point you give me has magintude greater\n than this bound, so I can warn you if there's divergence in your sequence of\n re-estimations. (default: 1e8)\n\n\\return  If the new estimated parameters  include any NaNs/Infs, then\n    the old estimate is returned (even if the old estimate included\n    NaNs/Infs). Otherwise, the estimate with the largest log likelihood\n    is returned.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/ \nAPOP_VAR_HEAD apop_model * apop_estimate_restart (apop_model *e, apop_model *copy, char * starting_pt, double boundary){\n    apop_model * apop_varad_var(e, NULL);\n    Nullcheck_m(e, NULL);\n    apop_model * apop_varad_var(copy, NULL);\n    char * apop_varad_var(starting_pt, \"ep\");\n    double apop_varad_var(boundary, 1e8);\nAPOP_VAR_ENDHEAD\n    gsl_vector *v = NULL;\n    if (!copy) copy = apop_model_copy(e);\n    apop_mle_settings* prm0 = apop_settings_get_group(e, apop_mle);\n    apop_mle_settings* prm = apop_settings_get_group(copy, apop_mle);\n            //copy off the old params; modify the starting pt, method, and scale\n    if (!strcmp(starting_pt, \"es\"))\n        v = apop_array_to_vector(prm0->starting_pt);\n    else if (!strcmp(starting_pt, \"ns\")){\n        int size =sizeof(prm->starting_pt)/sizeof(double);\n        v = apop_array_to_vector(prm->starting_pt, size);\n        prm0->starting_pt\t= malloc(sizeof(double)*size);\n        memcpy(prm0->starting_pt, prm->starting_pt, sizeof(double)*size);\n    }\n    else if (!strcmp(starting_pt, \"np\")){\n        v = apop_data_pack(copy->parameters); \n        prm->starting_pt = malloc(sizeof(double)*v->size);\n        memcpy(prm->starting_pt, v->data, sizeof(double)*v->size);\n    }\n    else if (e->parameters){//\"ep\" or default.\n        v = apop_data_pack(e->parameters); \n        prm->starting_pt = malloc(sizeof(double)*v->size);\n        memcpy(prm->starting_pt, v->data, sizeof(double)*v->size);\n    }\n    Apop_stopif(!apop_vector_bounded(v, boundary), return e, \n                0, \"Your model has diverged (element(s) > %g);\"\n                   \" returning your original model without restarting.\", boundary);\n    gsl_vector_free(v);\n        \n    apop_model *newcopy = apop_estimate(e->data, copy);\n    apop_model_free(copy);\n    //Now check whether the new output is better than the old\n    if (apop_vector_bounded(newcopy->parameters->vector, boundary) \n            && get_ll(e->data, newcopy) > get_ll(e->data, e)){\n        apop_model_free(e);\n        return newcopy;\n    } //else:\n    apop_model_free(newcopy);\n    return e;\n}\n\n// Simulated Annealing.\n\nstatic double annealing_energy(void *in) {\n    infostruct *i = in;\n    return negshell(i->beta, i);\n}\n\nstatic double annealing_distance(void *xin, void *yin) {\n/** We use the Manhattan metric to correspond to the annealing_step fn below.  */\n    gsl_vector *from = apop_vector_copy(((infostruct*)xin)->beta);\n    gsl_vector *to = apop_vector_copy(((infostruct*)yin)->beta);\n    gsl_vector_div(from, ((infostruct*)xin)->starting_pt);\n    gsl_vector_div(to, ((infostruct*)xin)->starting_pt);//starting pts are the same.\n    return apop_vector_distance(from, to, .metric='m');\n}\n\nstatic void annealing_check_constraint(infostruct *i){\n    apop_data_unpack(i->beta, i->model->parameters);\n    if (i->model->constraint && i->model->constraint(i->data, i->model))\n        apop_data_pack(i->model->parameters, i->beta);\n}\n\nstatic void annealing_step(const gsl_rng * r, void *in, double step_size){\n/** The algorithm: \n    --randomly pick dimension\n    --shift by some amount of remaining step size\n    --repeat for all dims\nThis will give a move \\f$\\leq\\f$ step_size on the Manhattan metric.\n*/\n    infostruct *i = in;\n    int sign;\n    double amt, scale;\n    double cutpoints[i->beta->size+1];\n    cutpoints[0]             = 0;\n    cutpoints[i->beta->size] = 1;\n    for (size_t j=1; j< i->beta->size; j++)\n        cutpoints[j] = gsl_rng_uniform(r);\n\n    for (size_t j=0; j< i->beta->size; j++){\n        sign  = (gsl_rng_uniform(r) > 0.5) ? 1 : -1;\n        scale = gsl_vector_get(i->starting_pt, j);\n        amt   = cutpoints[j+1]- cutpoints[j];\n        *gsl_vector_ptr(i->beta, j) += amt * sign * scale * step_size;\n    }\n    annealing_check_constraint(i);\n}\n\nstatic void annealing_print(void *xp) {\n    apop_vector_show(((infostruct*)xp)->beta);\n}\n\nstatic void annealing_print2(void *xp) { return; }\n\nstatic void annealing_memcpy(void *xp, void *yp){\n    infostruct *yi = yp;\n    infostruct *xi = xp;\n    *yi = *xi;\n    yi->beta = apop_vector_copy(xi->beta);\n}\n\nstatic void *annealing_copy(void *xp){\n    infostruct *out = malloc(sizeof(infostruct));\n    annealing_memcpy(xp, out);\n    return out;\n}\n\nstatic void annealing_free(void *xp){\n    gsl_vector_free(((infostruct*)xp)->beta);\n    free(xp);\n}\n\n//I abuse the starting point element to hold the list of scaling factors. They can't be zero.\nstatic double set_start(double in){ return in ? in : 1; }\n\njmp_buf anneal_jump;\nstatic void anneal_sigint(int){ longjmp(anneal_jump,1); }\n\nstatic void apop_annealing(infostruct *i){\n    apop_model *ep = i->model;\n    apop_mle_settings *mp = apop_settings_get_group(ep, apop_mle);\n    Apop_stopif(!mp, ep->error='l'; return, 0, \"The model you sent to the MLE function has neither log_likelihood element nor p element.\");\n    gsl_siman_params_t simparams = (gsl_siman_params_t) {\n                         .n_tries       = mp->n_tries, \n                         .iters_fixed_T = mp->iters_fixed_T,\n                         .step_size     = mp->step_size,\n                         .k             = mp->k,\n                         .t_initial     = mp->t_initial,\n                         .mu_t          = mp->mu_t,\n                         .t_min         = mp->t_min};\n    gsl_rng *r = mp->rng ? mp->rng : apop_rng_get_thread();\n    //these two are done at apop_maximum_likelihood:\n    //i->beta = apop_data_pack(ep->parameters);\n    //setup_starting_point(mp, i->beta);\n    int betasize = i->beta->size;\n    int apopstatus = -1;\n    i->starting_pt    = apop_vector_map(i->beta, set_start);\n    i->use_constraint = 0; //negshell doesn't check it; annealing_step does.\n    gsl_siman_print_t printing_fn = NULL;\n    if (mp && mp->verbose>1)    printing_fn = annealing_print;\n    else if (mp && mp->verbose) printing_fn = annealing_print2;\n    annealing_check_constraint(i); //shift starting point if needed.\n    if (setjmp(i->bad_eval_jump)) {\n        apopstatus = -1;\n        goto done;\n    }\n    if (!setjmp(anneal_jump)){\n        signal(SIGINT, anneal_sigint);\n        gsl_siman_solve(r,    // const gsl_rng * r\n          i,                  // void * x0_p\n          annealing_energy,   // gsl_siman_Efunc_t Ef\n          annealing_step,     // gsl_siman_step_t take_step\n          annealing_distance, // gsl_siman_metric_t distance\n          printing_fn,        // gsl_siman_print_t print_position\n          annealing_memcpy,   // gsl_siman_copy_t copyfunc\n          annealing_copy,     // gsl_siman_copy_construct_t copy_constructor\n          annealing_free,     // gsl_siman_destroy_t destructor\n          betasize,           // size_t element_size\n          simparams);         // gsl_siman_params_t params\n    }\n    signal(SIGINT, NULL);\n    apop_data_unpack(i->beta, i->model->parameters); \n    apop_estimate_parameter_tests(i->model);\n    apopstatus = 0;\ndone:\n    if (mp->rng) r = NULL;\n    auxinfo(i->model->parameters, i, apopstatus, i->best_ll);\n}\n\n/* This function calls the various GSL root-finding algorithms to find the zero of the score.\n   Cut/pasted/modified from the GSL documentation.  */\nstatic apop_model * find_roots (infostruct p) {\n    const gsl_multiroot_fsolver_type *T;\n    gsl_multiroot_fsolver *s;\n    apop_model *dist = p.model;\n    apop_mle_settings *mlep = apop_settings_get_group(dist, apop_mle);\n    int status=0, betasize = p.beta->size,\n              apopstatus = -1;   //assume failure until we score a success.\n    size_t iter = 0;\n    gsl_multiroot_function f = {dnegshell, betasize, &p};\n    T =   !strcasecmp(mlep->method, \"Newton\")         ? gsl_multiroot_fsolver_dnewton\n        : !strcasecmp(mlep->method, \"Newton hybrid no scale\") ? gsl_multiroot_fsolver_hybrids\n                                                   : gsl_multiroot_fsolver_hybrid;\n    s = gsl_multiroot_fsolver_alloc (T, betasize);\n    gsl_multiroot_fsolver_set (s, &f, p.beta);\n    do {\n        iter++;\n        if (setjmp(p.bad_eval_jump)) break;\n        status = gsl_multiroot_fsolver_iterate (s);\n        if (!mlep || mlep->verbose)\n            printf (\"iter = %3zu x = % .3f f(x) = % .3e\\n\", iter, gsl_vector_get (s->x, 0), gsl_vector_get (s->f, 0));\n        if (status)   /* check if solver is stuck */\n            break;\n        status = gsl_multiroot_test_residual (s->f, mlep->tolerance);\n    } while (status == GSL_CONTINUE && iter < mlep->max_iterations);\n    if (GSL_SUCCESS) apopstatus = 0;\n    Apop_notify(2, \"status = %s\\n\", gsl_strerror(status));\n    apop_data_unpack(s->x, dist->parameters);\n    gsl_multiroot_fsolver_free (s);\n    gsl_vector_free (p.beta);\n    auxinfo(dist->parameters, &p, apopstatus, 0); //root-finders don't store best val.\n    return dist;\n}\n"
  },
  {
    "path": "apop_model.m4.c",
    "content": "/** \\file apop_model.c\t sets up the estimate structure which outputs from the various regressions and MLEs.*/\n/* Copyright (c) 2006--2011 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#define Declare_type_checking_fns\n#include \"apop_internal.h\"\n\n/** Set up the \\c parameters and \\c info elements of the \\c apop_model: \n\nAt close, the input model has parameters of the correct size.\n\n\\li This is the default action for \\ref apop_prep, and many models with a custom prep routine\ncall \\ref apop_model_clear at the end. Also, \\ref apop_estimate calls this function internally, which means that you robably never have to call this function directly.\n\\li If the model has already been prepped, this function should be a no-op.\n\n\\param data If your params vary with the size of the data set, then the function needs a data set to calibrate against. Otherwise, it's OK to set this to \\c NULL.\n\\param model    The model whose output elements will be modified.\n\\return A pointer to the same model, should you need it.\n\\exception outmodel->error=='d' dimension error.\n*/\napop_model * apop_model_clear(apop_data * data, apop_model *model){\n    Get_vmsizes(data)\n    int width = msize2 ? msize2 : -firstcol;//use the vector only if there's no matrix.\n    Apop_stopif(model->dsize==-1 && !width, model->error='d', 0, \"The model's dsize==-1, meaning size=data width, but the input data has NULL vector and matrix.\");\n    Apop_stopif(model->vsize==-1 && !width, model->error='d', 0, \"The model's vsize==-1, meaning size=data width, but the input data has NULL vector and matrix.\");\n    Apop_stopif(model->msize1==-1 && !width, model->error='d', 0, \"The model's msize1==-1, meaning size=data width, but the input data has NULL vector and matrix.\");\n    Apop_stopif(model->msize2==-1 && !width, model->error='d', 0, \"The model's msize2==-1, meaning size=data width, but the input data has NULL vector and matrix.\");\n\n    model->dsize  = (model->dsize == -1 ? width : model->dsize);\n    vsize  = model->vsize  == -1 ? width : model->vsize;\n    msize1 = model->msize1 == -1 ? width : model->msize1 ;\n    msize2 = model->msize2 == -1 ? width : model->msize2 ;\n    if (!model->parameters && (vsize || msize1*msize2)) \n        model->parameters = apop_data_alloc(vsize, msize1, msize2);\n    if (!model->info) model->info = apop_data_alloc();\n    if (model->info->names->title && !strlen(model->info->names->title))\n        free(model->info->names->title);\n    Asprintf(&model->info->names->title, \"<Info>\");\n    if (!model->data) model->data = data;\n\treturn model;\n}\n\n/** Free an \\ref apop_model structure.\n\n  \\li The \\c parameters and \\c settings are freed.  These are the elements that are\ncopied by \\c apop_model_copy.\n  \\li The \\c data element is not freed, because the odds are you still need it.\n  \\li If <tt>free_me->more_size</tt> is positive, the function runs\n<tt>free(free_me->more)</tt>. But it has no idea what the \\c more element contains;\nif it points to other structures (like an \\ref apop_data set), you need to free them\nbefore calling this function.\n  \\li If \\c free_me is \\c NULL, this does nothing.\n\n\\param free_me A pointer to the model to be freed.\n*/\nvoid apop_model_free (apop_model * free_me){\n    if (!free_me) return;\n    apop_data_free(free_me->parameters);\n    if (free_me->settings){\n        int   i=0;\n        while (free_me->settings[i].name[0]){\n            if (free_me->settings[i].free)\n                ((void (*)(void*))(free_me->settings[i].free))(free_me->settings[i].setting_group);\n            i++;\n        }\n        free(free_me->settings);\n    }\n    if (free_me->more_size)\n        free(free_me->more);\n    if (free_me->info)\n        apop_data_free(free_me->info);\n\tfree(free_me);\n}\n\n/** Print the results of an estimation for a human to look over.\n\n\\param model The model whose information should be displayed (No default. If \\c NULL, print <tt>NULL</tt>)\n\\param output_pipe  The output stream. Default: \\c stdout. If you'd like something else, use \\c fopen. E.g.:\n\\code\nFILE *out =fopen(\"outfile.txt\", \"w\"); //or \"a\" to append.\napop_model_print(the_model, out);\nfclose(out);  //optional in many cases.\n\\endcode\n\n\\li The default prints the name, parameters, info, &c. but I check a vtable for\nalternate methods you define; see \\ref vtables for details. The typedef new functions\nmust conform to and the hash used for lookups are:\n\n\\code\ntypedef void (*apop_model_print_type)(apop_model *params, FILE *out);\n#define apop_model_print_hash(m1) ((m1)->log_likelihood ? (size_t)(m1)->log_likelihood : \\\n            (m1)->p ? (size_t)(m1)->p*33 : \\\n            (m1)->estimate ? (size_t)(m1)->estimate*33*33 : \\\n            (m1)->draw ? (size_t)(m1)->draw*33*27  : \\\n            (m1)->cdf ? (size_t)(m1)->cdf*27*27 : 27)\n\\endcode\n\nWhen building a special print method, all output should \\c fprintf to the input \\c FILE* handle. \n  Apophenia's output routines also accept a file handle; e.g., if the file handle is\n  named \\c out, then if the \\c thismodel print method uses \\c apop_data_print to\n  print the parameters, it must do so via a form like <tt>apop_data_print(thismodel->parameters,\n  .output_pipe=ap)</tt>.\n\nYour \\c print method can use both by masking itself for a few lines:\n \\code\nvoid print_method(apop_model *in, FILE* ap){\n  void *temp = in->estimate;\n  in->estimate = NULL;\n  apop_model_print(in, ap);\n  in->estimate = temp;\n\n  printf(\"Additional info:\\n\");\n  ...\n}\n \\endcode\n\n\\li Print methods are intended for human consumption and are subject to change.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD void apop_model_print (apop_model * model, FILE *output_pipe){\n    FILE * apop_varad_var(output_pipe, stdout);\n    apop_model* apop_varad_var(model, NULL);\n    if (!model) {fprintf(output_pipe, \"NULL\\n\"); return;}\nAPOP_VAR_ENDHEAD\n    apop_model_print_type mpf = apop_model_print_vtable_get(model);\n    if (mpf){\n        mpf(model, output_pipe);\n        return;\n    }\n    if (strlen(model->name)) fprintf (output_pipe, \"%s\", model->name);\n    fprintf(output_pipe, \"\\n\\n\");\n\tif (model->parameters) apop_data_print(model->parameters, .output_pipe=output_pipe);\n    Get_vmsizes(model->info); //maxsize\n    if (model->info && maxsize) apop_data_print(model->info, .output_pipe=output_pipe);\n}\n\n/* Alias for \\ref apop_model_print. Use that one. */\nvoid apop_model_show (apop_model * print_me){\n    apop_model_print(print_me, NULL);\n}\n\n/** Outputs a copy of the \\ref apop_model input.\n\n\\param in The model to be copied\n\\return A copy of the original. Includes copies\nof all settings groups, and the \\c parameters (if not \\c NULL, copied via \\ref\napop_data_copy).\n\n\\li If <tt>in.more_size > 0</tt> I <tt>memcpy</tt> the \\c more pointer from the original data set.\n\\li The data set at \\c in->data is not copied, but is also pointed to.\n\n\\exception out->error=='a' Allocation error. In extreme cases, where there aren't even a few hundred bytes available, I will return \\c NULL.\n\\exception out->error=='s' Error copying settings groups.\n\\exception out->error=='p' Error copying parameters or info page; the given \\ref apop_data struct may be \\c NULL or may have its own <tt>->error</tt> element.\n*/\napop_model * apop_model_copy(apop_model *in){\n    Apop_stopif(!in, return NULL, 1, \"Copying a NULL input; returning NULL.\");\n    apop_model * out = malloc(sizeof(apop_model));\n    Apop_stopif(!out, return NULL, 0, \"Serious allocation error; returning NULL.\");\n    memcpy(out, in, sizeof(apop_model));\n    if (in->more_size){\n        out->more  = malloc(in->more_size);\n        Apop_stopif(!out->more, out->error='a'; return out, 0, \"Allocation error setting up the ->more pointer.\");\n        memcpy(out->more, in->more, in->more_size);\n    }\n    int i=0; \n    out->settings = NULL;\n    if (in->settings)\n        do \n            apop_settings_copy_group(out, in, in->settings[i].name);\n        while (strlen(in->settings[i++].name));\n    out->parameters = apop_data_copy(in->parameters);\n    Apop_stopif(in->parameters && (!out->parameters || out->parameters->error), \n                    out->error='p'; return out, 0, \"Error copying the model parameters.\");\n    out->info = apop_data_copy(in->info);\n    Apop_stopif(in->info && (!out->info || out->info->error), \n                    out->error='p'; return out, 0, \"Error copying the info segment.\");\n    return out;\n}\n\n/** \\def apop_model_set_parameters(in, ...)\nTake in an unparameterized \\c apop_model and return a new \\c apop_model with the given parameters.  \nFor example, if you need a N(0,1) quickly:\n\\code\napop_model *std_normal = apop_model_set_parameters(apop_normal, 0, 1);\n\\endcode\n\nThis doesn't take in data, so it won't work with models that take the number of\nparameters from the data, and it will only set the vector of the model's parameter \\ref\napop_data set. This is most standard models. If you have a situation where these options\nare out, you could\n\\li manually set Set \\c .vsize and/or \\c .msize1 and \\c .msize2 first, then call this function, or\n\\li prep the model via something like <tt>apop_model *new = apop_model_copy(in);\napop_prep(your_data, new);</tt> (because \\ref apop_prep is required to correctly\nallocate \\c new->parameters to conform to your data).\n\n\\param in An unparameterized model, like \\ref apop_normal or \\ref apop_poisson.\n\\param ... The list of parameters.\n\\return A copy of the input model, with parameters set.\n\\exception out->error=='d' dimension error: you gave me a model with an indeterminate\nnumber of parameters.  See notes above.\nSet \\c .vsize or \\c .msize1 and \\c .msize2 first, then call this function, or use\n<tt>apop_model *new = apop_model_copy(in); apop_prep(your_data, new);</tt> and then\ncall this .\n\\see apop_data_fill\n\\hideinitializer   \n*/\napop_model *apop_model_set_parameters_base(apop_model *in, double ap[]){\n    apop_model *out = apop_model_copy(in);\n    apop_prep(NULL, out);\n    Apop_stopif((in->vsize == -1) || (in->msize1 == -1) || (in->msize2 == -1), out->error='d', \n            0, \"This function only works with models whose number of params does not \"\n            \"depend on data size. You'll have to use apop_model *new = apop_model_copy(in); \"\n           \" apop_model_clear(your_data, in); and then set in->parameters using your data.\");\n    apop_data_fill_base(out->parameters, ap);\n    return out; \n}\n\n/** Estimate the parameters of a model given data.\n\nThis function copies the input model, preps it (see \\ref apop_prep), and calls \\c\nm.estimate(d, m) (which users are encouraged to never call directly). If your model\nhas no \\c estimate method, then call \\c apop_maximum_likelihood(d, m), with the default\nMLE settings.\n\n\\param d    The data\n\\param m    The model\n\\return     A pointer to an output model, which typically matches the input model but has its \\c parameters element filled in.\n*/\napop_model *apop_estimate(apop_data *d, apop_model *m){\n    apop_model *out = apop_model_copy(m);\n    apop_prep(d, out);\n    if (out->estimate) out->estimate(d, out); \n    else               apop_maximum_likelihood(d, out);\n    return out;\n}\n\n/** Find the probability of a data/parametrized model pair.\n\n\\param d The data\n\\param m The parametrized model, which must have either a \\c log_likelihood or a \\c p method.\n*/\ndouble apop_p(apop_data *d, apop_model *m){\n    Nullcheck_m(m, GSL_NAN);\n    if (m->p)\n        return m->p(d, m);\n    else if (m->log_likelihood)\n        return exp(m->log_likelihood(d, m));\n    Apop_stopif(0, , 0, \"You asked for the probability of a model that has neither p nor log_likelihood methods.\");\n    return GSL_NAN;\n}\n\n/** Find the log likelihood of a data/parametrized model pair.\n\n\\param d    The data\n\\param m    The parametrized model, which must have either a \\c log_likelihood or a \\c p method.\n*/\ndouble apop_log_likelihood(apop_data *d, apop_model *m){\n    Nullcheck_m(m, GSL_NAN); //Nullcheck_p(m); //Too many models don't use the params.\n    if (m->log_likelihood)\n        return m->log_likelihood(d, m);\n    else if (m->p)\n        return log(m->p(d, m));\n    Apop_stopif(0, , 0, \"You asked for the log likelihood of a model that has neither p nor log_likelihood methods.\");\n    return GSL_NAN;\n}\n\n/** Find the vector of first derivatives (aka the gradient) of the log likelihood of a data/parametrized model pair.\n\nOn input, the model \\c m must already be sufficiently prepped\nthat the log likelihood can be evaluated; see \\ref psubsection for details.\n\nOn output, the \\c gsl_vector input to the function will be filled with the gradients\n(or <tt>NaN</tt>s on errors). If the model parameters have a more complex shape\nthan a simple vector, then the vector will be in \\c apop_data_pack order; use \\c\napop_data_unpack to reformat to the preferred shape.\n\n\\param d    The \\ref apop_data set at which the score is being evaluated.\n\\param out  The score to be returned. I expect you to have allocated this already.\n\\param m    The parametrized model, which must have either a \\c log_likelihood or a \\c p method.\n\n\\li The default is to use \\ref apop_numerical_gradient, but special-case calculations\nfor certain models are held in a vtable; see \\ref vtables for details. The typedef\nnew functions must conform to and the hash used for lookups are:\n\n\\code\ntypedef void (*apop_score_type)(apop_data *d, gsl_vector *gradient, apop_model *m);\n#define apop_score_hash(m1) ((size_t)((m1).log_likelihood ? (m1).log_likelihood : (m1).p))\n\\endcode\n*/\nvoid apop_score(apop_data *d, gsl_vector *out, apop_model *m){\n    Nullcheck_m(m, );\n    Apop_stopif(!out, return, 0, \"out vector is NULL. It must be pre-allocated to the correct size. E.g., gsl_vector *out = gsl_vector_alloc(m->vsize + m->size1*m->size2))).\");\n    apop_score_type ms = apop_score_vtable_get(m);\n    if (ms){\n        ms(d, out, m);\n        return;\n    }\n    gsl_vector * numeric_default = apop_numerical_gradient(d, m);\n    gsl_vector_memcpy(out, numeric_default);\n    gsl_vector_free(numeric_default);\n}\n\nApop_settings_init(apop_pm,\n    //defaults include base=NULL, index=0, own_rng=0\n    Apop_varad_set(rng, NULL);\n    Apop_varad_set(draws, 1e4);\n)\n\nApop_settings_copy(apop_pm,)\n\nApop_settings_free(apop_pm, )\n\nvoid distract_doxygen(){/*Doxygen gets thrown by the settings macros. This decoy function is a workaround. */}\n\n/** Get a model describing the distribution of the given parameter estimates.\n\nFor many models, the parameter estimates are well-known, such as the\n\\f$t\\f$-distribution of the parameters for OLS.\n\nFor models where the distribution of \\f$\\hat{p}\\f$ is not known, if you give me data, I\nwill return an \\ref apop_normal or \\ref apop_multivariate_normal model, using the parameter estimates as mean and \\ref apop_bootstrap_cov for the variances.\n\nIf you don't give me data, then I will assume that this is a stochastic model where \nre-running the model will produce different parameter estimates each time. In this case, I will\nrun the model 1e4 times and return a \\ref apop_pmf model with the resulting parameter\ndistributions.\n\nBefore calling this, I expect that you have already run \\ref apop_estimate to produce \\f$\\hat{p}\\f$.\n\nThe \\ref apop_pm_settings structure dictates details of how the model is generated.\nFor example, if you want only the distribution of the third parameter, and you know the\ndistribution will be a PMF generated via random draws, then set settings and call the\nmodel via:\n\\code\n  apop_model_group_add(your_model, apop_pm, .index =3, .draws=3e5);\n  apop_model *dist = apop_parameter_model(your_data, your_model);\n\\endcode\n\nSome useful parts of \\ref apop_pm_settings:\n\\li \\c index gives the position of the parameter (in \\ref apop_data_pack order)\nin which you are interested. Thus, if this is zero or more, then you will get a\nunivariate output distribution describing a single parameter. If <tt>index == -1</tt>,\nthen I will give you the multivariate distribution across all parameters.  The default\nis zero (i.e. the univariate distribution of the zeroth parameter).\n\\li \\c draws If there is no closed-form solution and bootstrap is inappropriate, then\nthe last resort is a large numbr of random draws of the model, summarized into a PMF. Default: 1,000 draws.\n\\li \\c rng If the method requires random draws, then use this. If you provide \\c NULL and one is needed, I provide one for you via \\ref apop_rng_get_thread.\n\nThe default is via resampling as above, but special-case calculations for certain models are held in a vtable; see \\ref vtables for details. The typedef new functions must conform to and the hash used for lookups are:\n\n\\code\ntypedef apop_model* (*apop_parameter_model_type)(apop_data *, apop_model *);\n#define apop_parameter_model_hash(m1) ((size_t)((m1).log_likelihood ? (m1).log_likelihood : (m1).p)*33 + (m1).estimate ? (size_t)(m1).estimate: 27)\n\\endcode\n*/ \napop_model *apop_parameter_model(apop_data *d, apop_model *m){\n    apop_pm_settings *settings = apop_settings_get_group(m, apop_pm);\n    if (!settings)\n        settings = Apop_settings_add_group(m, apop_pm, .base= m);\n    apop_parameter_model_type pm = apop_parameter_model_vtable_get(m);\n    if (pm) return pm(d, m);\n    else if (d){\n        Get_vmsizes(m->parameters);//vsize, msize1, msize2\n        apop_model *out = apop_model_copy(apop_multivariate_normal);\n        out->msize1 = out->vsize = out->msize2 = out->dsize = vsize+msize1+msize2;\n        out->parameters = apop_bootstrap_cov(d, m, settings->rng, settings->draws);\n        out->parameters->vector = apop_data_pack(m->parameters);\n        if (settings->index == -1)\n            return out;\n        else {\n            apop_model *out2 = apop_model_set_parameters(apop_normal, \n                    apop_data_get(out->parameters, settings->index, -1), //mean\n                    apop_data_get(out->parameters, settings->index, settings->index)//var\n                    );\n            apop_model_free(out);\n            return out2;\n        }\n    } //else\n    Get_vmsizes(m->parameters);//vsize, msize1, msize2\n    apop_data *param_draws = apop_data_alloc(0, settings->draws, vsize+msize1+msize2);\n    for (int i=0; i < settings->draws; i++){\n        apop_model *mm = apop_estimate (NULL, m);//If you're here, d==NULL.\n        apop_data_pack(mm->parameters, Apop_rv(param_draws, i));\n        apop_model_free(mm);\n    }\n    if (settings->index == -1)\n        return apop_estimate(param_draws, apop_pmf);\n    else {\n        apop_data *param_draws1 = apop_data_alloc(settings->draws, 0,0);\n        gsl_vector *the_draws = Apop_cv(param_draws, settings->index);\n        gsl_vector_memcpy(param_draws1->vector, the_draws);\n        apop_data_free(param_draws);\n        return apop_estimate(param_draws1, apop_pmf);\n    }\n}\n\nextern apop_model *apop_swap_model; //apop_missing_data.c\nint apop_model_metropolis_draw(double *out, gsl_rng* rng, apop_model *params);//apop_update.c\n\n/** Draw from a model. \n\n\\param out An already-allocated array of <tt>double</tt>s to be filled by the draw method. It must have size <tt>m->dsize</tt>.\n\\param r   A \\c gsl_rng, probably allocated via \\ref apop_rng_alloc. Optional; if \\c NULL, then I will call \\ref apop_rng_get_thread for an RNG.\n\\param m   The model from which to make draws.\n\n\\li If the model has its own \\c draw method, then this function will call it.\n\\li Else, if the model is univariate, use \\ref apop_arms_draw to generate random draws.\n\\li Else, if the model is multivariate, use \\ref apop_model_metropolis to generate random draws.\n\\li This makes a single draw of the given size. See \\ref apop_model_draws to fill a matrix with draws.\n\n\\return Zero on success; nozero on failure. <tt>out[0]</tt> is probably \\c NAN on failure.\n*/\nint apop_draw(double *out, gsl_rng *r, apop_model *m){\n    if (!r) r = apop_rng_get_thread(-1);\n    if (m->draw)\n        return m->draw(out, r, m); \n    else if (m->dsize == 1)\n        return apop_arms_draw(out, r, m);\n    //Else, MCMC, possibly setting it up first.\n    //generate a model with data/params reversed\n    //estimate mcmc. Swapped model will be stored as settings->base_model.\n    OMP_critical (apop_draw)\n    if (!Apop_settings_get_group(m, apop_mcmc)){\n        apop_model *swapped = apop_model_copy(apop_swap_model);\n        swapped->more = m;\n        swapped->msize1 = 1;\n        swapped->msize2 = m->dsize;\n        swapped->data = m->parameters;\n        Apop_settings_add_group(swapped, apop_mcmc, .burnin=0.999, .periods=1000);\n        apop_model *est = apop_model_metropolis(m->parameters, r, swapped); //leak.\n        m->draw = apop_model_metropolis_draw;\n        apop_settings_copy_group(m, est, \"apop_mcmc\");\n    }\n    return apop_draw(out, r, m);\n}\n\n/** Allocate and initialize the \\c parameters, \\c info, and other requisite parts of a \\ref apop_model.\n\nSome models have associated prep routines that also attach settings groups to the model, and set up additional special-case functions in vtables.\n\n\\li The input model is modified in place.\n\\li If called repeatedly, subsequent calls to \\ref apop_prep are no-ops. Thus, a model\n    can not be re-prepped using a new data set or other conditions.\n\\li The default prep is to simply call \\ref apop_model_clear. If the\n    input \\ref apop_model has a prep method, then that gets called instead.\n*/\nvoid apop_prep(apop_data *d, apop_model *m){\n    if (m->prep) m->prep(d, m);\n    else         apop_model_clear(d, m);\n}\n\nstatic double disnan(double in) {return gsl_isnan(in);}\n\n/** A prediction supplies E(a missing value | original data, already-estimated parameters, and other supplied data elements ).\n\nFor a regression, one would first estimate the parameters of the model, then supply a row of predictors <b>X</b>. The value of the dependent variable \\f$y\\f$ is unknown, so the system would predict that value.\n\nFor a univariate model (i.e. a model in one-dimensional data space), there is only one variable to omit and fill in, so the prediction problem reduces to the expected value: E(a missing value | original data, already-estimated parameters). [In some models, this may not be the expected value, but is a best value for the missing item using some other meaning of `best'.]\n\nIn other cases, prediction is the missing data problem: for three-dimensional data,\nyou may supply the input (34, \\c NaN, 12), and the parameterized model provides the\nmost likely value of the middle parameter given the parameters and known data.\n\n\\li If you give me a \\c NULL data set, I will assume you want all values filled in, for most models with the expected value.\n\n\\li If you give me data with \\c NaNs, I will take those as the points to\nbe predicted given the provided data.\n\nIf the model has no \\c predict method, the default is to use the \\ref apop_ml_impute function to do the work. That function does a maximum-likelihood search for the best parameters.\n\n\\return If you gave me a non-\\c NULL data set, I will return that, with the \\c NaNs filled in.  If \\c NULL input, I will allocate an \\ref apop_data set and fill it with the expected values.\n\nThere may be a second page (i.e., a \\ref apop_data set attached to the <tt>->more</tt> pointer of the main) listing confidence and standard error information. See your specific model documentation for details.\n\n\\li Special-case calculations for certain models are held in a vtable; see \\ref vtables for details. The typedef new functions must conform to and the hash used for lookups are:\n\n\\code\ntypedef apop_data * (*apop_predict_type)(apop_data *d, apop_model *params);\n#define apop_predict_hash(m1) ((size_t)((m1).log_likelihood ? (m1).log_likelihood : (m1).p)*33 + (m1).estimate ? (size_t)(m1).estimate: 27)\n\\endcode\n*/\napop_data *apop_predict(apop_data *d, apop_model *m){\n    apop_data *prediction = NULL;\n    apop_data *out = d ? d : apop_data_alloc(0, 1, m->dsize);\n    if (!d) gsl_matrix_set_all(out->matrix, GSL_NAN);\n    apop_predict_type mp = apop_predict_vtable_get(m);\n    if (mp) prediction = mp(out, m);\n    if (prediction) return prediction;\n    if (!apop_map_sum(out, disnan)) return out;\n    //default:\n    apop_model *f = apop_ml_impute(out, m);\n    apop_model_free(f);\n    return out;\n}\n\n/* Are all the elements of v less than or equal to the corresponding elements of the reference vector? */\nstatic int lte(gsl_vector *v, gsl_vector *ref){\n    for (int i=0; i< v->size; i++) \n        if(v->data[i] > gsl_vector_get(ref, i))\n            return 0;\n    return 1;\n}\n\n/** Input a one-row data point/vector and a model; returns the area of the model's PDF beneath the given point.\n\nBy default, make random draws from the PDF and return the percentage of those\ndraws beneath or equal to the given point. Many models have closed-form solutions that\nmake no use of random draws. \n\nSee also \\ref apop_cdf_settings, which is the structure used to store draws already\nmade (which means the second, third, ... calls to this function will take much less\ntime than the first), the \\c gsl_rng, and the number of draws to be made. These are\nhandled without your involvement, but if you would like to change the number of draws\nfrom the default, add this group before calling \\ref apop_cdf :\n\n\\code\nApop_model_add_group(your_model, apop_cdf, .draws=1e5, .rng=my_rng);\ndouble cdf_value = apop_cdf(your_data_point, your_model);\n\\endcode\n\n\\li Only the first row of the input \\ref apop_data set is used. Note that if you need to view row 20 of a data set as a one-row data set, use \\ref Apop_r.\n\nHere are many examples using common, mostly symmetric distributions.\n\n\\include some_cdfs.c\n*/\ndouble apop_cdf(apop_data *d, apop_model *m){\n    if (m->cdf) return m->cdf(d, m);\n    apop_cdf_settings *cs = Apop_settings_get_group(m, apop_cdf);\n    if (!cs) cs = Apop_model_add_group(m, apop_cdf);\n    long int tally = 0; \n    \n    gsl_vector *ref = apop_data_pack(Apop_r(d, 0));\n    if (!cs->draws_made){\n        if (m->dsize == -1) apop_prep(d, m);\n        Apop_stopif(m->dsize==0, return GSL_NAN, 0, \"I need to make random draws from your model, but it has dsize==0. Returning NaN\");\n        cs->draws_made = gsl_matrix_alloc(cs->draws, m->dsize);\n        for (int i=0; i< cs->draws; i++)\n            apop_draw((Apop_mrv(cs->draws_made, i))->data, cs->rng, m);\n    }\n    for (int i=0; i< cs->draws_made->size1; i++)\n        tally += lte(Apop_mrv(cs->draws_made, i), ref);\n    gsl_vector_free(ref);\n    return tally/(double)cs->draws_made->size1;\n}\n\nApop_settings_init(apop_cdf,\n    Apop_varad_set(draws, 1e4);\n    Apop_varad_set(rng, NULL);\n    out->draws_refcount = malloc(sizeof(int));\n    *out->draws_refcount = 1;\n)\n\nApop_settings_free(apop_cdf,\n    if (in->draws_made && !--*in->draws_refcount)\n        gsl_matrix_free(in->draws_made);\n)\n\nApop_settings_copy(apop_cdf,\n    ++*out->draws_refcount;\n)\n"
  },
  {
    "path": "apop_name.m4.c",
    "content": "/** \\file apop_name.c */\n/* Copyright (c) 2006--2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <stdio.h>\n#include <regex.h>\n\n/** Allocates a name structure\n\\return\tAn allocated, empty name structure.  In the very unlikely event that \\c malloc fails, return \\c NULL.\n\nBecause \\ref apop_data_alloc uses this to set up its output, you will rarely if ever\nneed to call this function explicitly. You may want to use it if wrapping a \\c gsl_matrix into an \\ref apop_data set. For example, to put a title on a vector:\n\n\\code\napop_data *d = &(apop_data){.vector=your_vector, .names=apop_name_alloc()};\napop_name_add(d->names, \"A column of numbers\", 'v');\napop_data_print(d);\n\n...\napop_name_free(d->names); //but d itself is auto-allocated; no need to free it.\n\\endcode\n*/\napop_name * apop_name_alloc(void){\n    apop_name * init_me = malloc(sizeof(apop_name));\n    Apop_stopif(!init_me, return NULL, 0, \"malloc failed. Probably out of memory.\");\n    *init_me = (apop_name){ };\n\treturn init_me;\n}\n\n//Currently identical to apop_settings_hash (see notes there).\nstatic unsigned long apop_name_hash(char const *str){\n    unsigned long int hash = 5381;\n    char c;\n    while ((c = *str++)) hash = hash*33 + c;\n    return hash;\n}\n\n/** Adds a name to the \\ref apop_name structure. Puts it at the end of the given list.\n\n\\param n \tAn existing, allocated \\ref apop_name structure.\n\\param add_me \tA string. If \\c NULL, do nothing; return -1.\n\\param type \t'r': add a row name<br>\n'c': add a matrix column name<br>\n't': add a text column name<br>\n'h': add a title (i.e., a header).<br>\n'v': add (or overwrite) the vector name<br>\n\\return \tReturns the number of rows/cols/depvars after you have added the new one. But if \\c add_me is \\c NULL, return -1.\n*/\nint apop_name_add(apop_name * n, char const *add_me, char type){\n    if (!add_me)\n        return -1;\n\tif (type == 'h'){\n        free(n->title);\n        Asprintf(&n->title, \"%s\", add_me);\n        return 1;\n\t} \n\tif (type == 'v'){\n\t\tn->vector\t= realloc(n->vector,  strlen(add_me) + 1);\n\t\tstrcpy(n->vector, add_me);\n\t\treturn 1;\n\t} \n\tif (type == 'r'){\n\t\tn->rowct++;\n\t\tn->row\t= realloc(n->row, sizeof(char*) * n->rowct);\n\t\tn->row[n->rowct -1]\t= malloc(strlen(add_me) + 1);\n\t\tstrcpy(n->row[n->rowct -1], add_me);\n\t\tn->rowhash = realloc(n->rowhash, n->rowct * sizeof(unsigned long));\n        n->rowhash[n->rowct-1] = apop_name_hash(add_me);\n\t\treturn n->rowct;\n\t} \n\tif (type == 't'){\n\t\tn->textct++;\n\t\tn->text\t= realloc(n->text, sizeof(char*) * n->textct);\n\t\tn->text[n->textct -1]\t= malloc(strlen(add_me) + 1);\n\t\tstrcpy(n->text[n->textct -1], add_me);\n\t\tn->texthash = realloc(n->texthash, n->textct * sizeof(unsigned long));\n        n->texthash[n->textct-1] = apop_name_hash(add_me);\n\t\treturn n->textct;\n\t}\n\t//else assume (type == 'c')\n        Apop_stopif(type != 'c', /*keep going.*/, \n            2,\"You gave me >%c<, I'm assuming you meant c; \"\n                             \" copying column names.\", type);\n\t\tn->colct++;\n\t\tn->col = realloc(n->col, sizeof(char*) * n->colct);\n\t\tn->col[n->colct -1]\t= malloc(strlen(add_me) + 1);\n\t\tstrcpy(n->col[n->colct -1], add_me);\n\t\tn->colhash = realloc(n->colhash, n->colct * sizeof(unsigned long));\n        n->colhash[n->colct-1] = apop_name_hash(add_me);\n\t\treturn n->colct;\n}\n\n/** Prints the given list of names to stdout. Useful for debugging.\n\n\\param n  The \\ref apop_name structure\n*/\nvoid apop_name_print(apop_name * n){\n    if (!n) {\n        printf(\"NULL\");\n        return;\n    }\n\tif (n->title) printf(\"title: %s\\n\", n->title);\n\tif (n->vector){\n\t\tprintf(\"vector:\");\n        printf(\"\\t%s\\n\", n->vector);\n\t}\n\tif (n->colct > 0){\n\t\tprintf(\"column:\");\n\t\tfor (int i=0; i < n->colct; i++)\n\t\t\tprintf(\"\\t%s\", n->col[i]);\n\t\tprintf(\"\\n\");\n\t}\n\tif (n->textct > 0){\n\t\tprintf(\"text:\");\n\t\tfor (int i=0; i < n->textct; i++)\n\t\t\tprintf(\"\\t%s\", n->text[i]);\n\t\tprintf(\"\\n\");\n\t}\n\tif (n->rowct > 0){\n\t\tprintf(\"row:\");\n\t\tfor (int i=0; i < n->rowct; i++)\n\t\t\tprintf(\"\\t%s\", n->row[i]);\n\t\tprintf(\"\\n\");\n\t}\n}\n\t\n/** Free the memory used by an \\ref apop_name structure. */\nvoid  apop_name_free(apop_name * free_me){\n    if (!free_me) return; //only needed if users are doing tricky things like newdata = (apop_data){.matrix=...};\n\tfor (size_t i=0; i < free_me->colct; i++)  free(free_me->col[i]);\n\tfor (size_t i=0; i < free_me->textct; i++) free(free_me->text[i]);\n\tfor (size_t i=0; i < free_me->rowct; i++)  free(free_me->row[i]);\n    if (free_me->vector) free(free_me->vector);\n\tfree(free_me->col);  free(free_me->colhash);\n\tfree(free_me->text); free(free_me->texthash);\n\tfree(free_me->row);  free(free_me->rowhash);\n\tfree(free_me);\n}\n\n/** Append one list of names to another.\n\nIf the first list is empty, then this is a copy function.\n\n\\param  n1      The first set of names (no default, must not be \\c NULL)\n\\param  nadd      The second set of names, which will be appended after the first. (no default. If \\c NULL, a no-op.)\n\\param type1     Either 'c', 'r', 't', or 'v' stating whether you are merging the\ncolumns, rows, text, or vector. If 'v', then ignore \\c typeadd and just overwrite the\ntarget vector name with the source name. (default: 'r')\n\\param typeadd     Either 'c', 'r', 't', or 'v' stating whether you are merging the columns, rows, or text. If 'v', then overwrite the target with the source vector name. (default: type1)\n*/\nAPOP_VAR_HEAD void  apop_name_stack(apop_name * n1, apop_name *nadd, char type1, char typeadd){\n    apop_name * apop_varad_var(nadd, NULL); \n    if (!nadd) return;\n    apop_name * apop_varad_var(n1, NULL);\n    Apop_stopif(!n1, return, 0, \"Can't stack onto a NULL set of names (which n1 is).\");\n    char apop_varad_var(type1, 'r');\n    char apop_varad_var(typeadd, type1);\nAPOP_VAR_ENDHEAD\n    int i;\n    apop_name counts = (apop_name){.rowct=nadd->rowct, .textct = nadd->textct, .colct = nadd->colct};//Necessary when stacking onto self.;\n    if (typeadd == 'v')\n        apop_name_add(n1, nadd->vector, 'v');\n    else if (typeadd == 'r')\n        for (i=0; i< counts.rowct; i++)\n            apop_name_add(n1, nadd->row[i], type1);\n    else if (typeadd == 't')\n        for (i=0; i< counts.textct; i++)\n            apop_name_add(n1, nadd->text[i], type1);\n    else if (typeadd == 'c')\n        for (i=0; i< counts.colct; i++)\n            apop_name_add(n1, nadd->col[i], type1);\n    else Apop_notify(1, \"'%c' sent to apop_name_stack, but the only \"\n                        \"valid options are r t c v. Doing nothing.\", typeadd);\n}\n\n/** Copy one \\ref apop_name structure to another. That is, all data is duplicated.\n\nUsed internally by \\ref apop_data_copy, but sometimes useful by itself. For example,\nsay that we have an \\ref apop_data struct named \\c d and a \\ref gsl_matrix of the same\ndimensions named \\c m; we could give \\c m the labels from \\c d for printing:\n\\code\napop_data *wrapped = &(apop_data){.matrix=m, .names=apop_name_copy(d)};\napop_data_print(wrapped);\napop_name_free(wrapped->names); //wrapped itself is auto-allocated; do not free.\n\\endcode\n \n\\param in The input names\n\\return   A \\ref apop_name struct with copies of all input names.\n*/\napop_name * apop_name_copy(apop_name *in){\n    apop_name *out = apop_name_alloc();\n    apop_name_stack(out, in, 'v');\n    apop_name_stack(out, in, 'c');\n    apop_name_stack(out, in, 'r');\n    apop_name_stack(out, in, 't');\n    Asprintf(&out->title, \"%s\", in->title);\n    return out;\n}\n\n/** Finds the position of an element in a list of names.\n\nThe function uses POSIX's \\c strcasecmp, and so does case-insensitive search the way that function does.\n\n\\param n        the \\ref apop_name object to search.\n\\param name     the name you seek; see above.\n\\param type     \\c 'c' (=column), \\c 'r' (=row), or \\c 't' (=text). Default is \\c 'c'.\n\\return         The position of \\c findme. If \\c 'c', then this may be -1, meaning the vector name. If not found, returns -2.  On error, e.g. <tt>name==NULL</tt>, returns -2.\n*/\nint apop_name_find(const apop_name *n, const char *name, const char type){\n    Apop_stopif(!name, return -2, 0, \"You asked me to search for NULL.\");\n    char **list;\n    unsigned long *listh;\n    int listct;\n    if (type == 'r' || type == 'R'){\n        list = n->row;\n        listh = n->rowhash;\n        listct = n->rowct;\n    }\n    else if (type == 't' || type == 'T'){\n        list = n->text;\n        listh = n->texthash;\n        listct = n->textct;\n    }\n    else { // default type == 'c'\n        list = n->col;\n        listh = n->colhash;\n        listct = n->colct;\n    }\n\n    if (listh) {\n        unsigned long hash = apop_name_hash(name);\n        for (int i = 0; i < listct; i++)\n            if (hash==listh[i] && !strcasecmp(name, list[i]))\n                return i;\n    }\n\n    //Hashes may be broken, so try again with plain string comparisons.\n    for (int i = 0; i < listct; i++)\n        if (!strcasecmp(name, list[i])) return i;\n\n    if ((type=='c' || type == 'C') && n->vector && !strcasecmp(name, n->vector)) return -1;\n    return -2;\n}\n"
  },
  {
    "path": "apop_output.m4.c",
    "content": "/** \\file \n  Some printing and output interface functions. */\n/* Copyright (c) 2006--2007, 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n//The reader will find a few function headers for this file in asst.h\n#include \"apop_internal.h\"\n \n#define Output_vars output_name, output_pipe, output_type, output_append\n\n#define Output_declares char const * output_name, FILE * output_pipe, char output_type, char output_append\n\n/** If you're reading this, it is probably because you were referred by another function\n  that uses this internally. You should never call this function directly, but do read\n  this documentation.\n\n  There are four settings that affect how output happens, which can be set when you call the\n  function that sent you to this documentation, e.g:\n\n  \\code\n  apop_data_print(your_data, .output_type ='f', .output_append = 'w');\n  \\endcode\n\n  \\param output_name The name of the output file, if any. For a database, the table to write.\n  \\param output_pipe If you have already opened a file and have a \\c FILE* on hand, use\n  this instead of giving the file name.\n  \\param output_type \\c 'p' = pipe, \\c 'f'= file, \\c 'd' = database\n  \\param output_append \\c 'a' = append (default), \\c 'w' = write over.\n\nAt the end, \\c output_name, \\c output_pipe, and \\c output_type are all set.\nNotably, the local \\c output_pipe will have the correct location for the calling function to \\c fprintf to.\n\n\\li See \\ref legi for more discussion.\n\n\\li The default is output to stdout. For example,\n\\code\napop_data_print(your_data);\n//is equivalent to\napop_data_print(your_data, .output_type='p', .output_pipe=stdout);\n\\endcode\n\n\\li Tip: if writing to the database, you can get a major speed boost by wrapping the call in a begin/commit wrapper:\n\n\\code\napop_query(\"begin;\");\napop_data_print(your_data, .output_name=\"dbtab\", .output_type='d');\napop_query(\"commit;\");\n\\endcode\n\\ingroup all_public\n*/\nint apop_prep_output(char const *output_name, FILE ** output_pipe, char *output_type, char *output_append){\n    *output_append = *output_append ? *output_append : 'w';\n\n    if (!output_name && !*output_pipe && !*output_type)     *output_type = 's';              \n    else if (output_name && !*output_pipe && !*output_type) *output_type = 'f'; \n    else if (!output_name && *output_pipe && !*output_type) *output_type = 'p';     \n\n    if (*output_type =='p')      *output_pipe = *output_pipe ? *output_pipe: stdout;      \n    else if (*output_type =='d') *output_pipe = stdout;  //won't be used.\n    else *output_pipe = output_name\n                        ? fopen(output_name, *output_append == 'a' ? \"a\" : \"w\")\n                        : stdout;\n    Apop_stopif(!output_pipe && output_name, return -1, 0, \"Trouble opening file %s.\", output_name);\n    return 0;\n}\n\n#define Dispatch_output                        \\\n    char const *apop_varad_var(output_name, NULL);  \\\n    FILE * apop_varad_var(output_pipe, NULL);  \\\n    char apop_varad_var(output_type, 0);       \\\n    char apop_varad_var(output_append, 0);     \\\n    Apop_stopif(apop_prep_output(output_name, &output_pipe, &output_type, &output_append), \\\n            return, 0, \"Trouble preparing to write output.\");\n\n\n/////The printing functions.\n\nstatic void white_pad(int ct){\n    for(size_t i=0; i < ct; i ++)\n        printf(\" \");\n}\n\n/* This function prettyprints the \\c apop_data set to a screen.\n\nIt is currently not in the documentation. It'd be nice to merge this w/apop_data_print.\n\nThis takes a lot of machinery. I write every last element to a text array, then measure column widths, then print to screen with padding to guarantee that everything lines up.  There's no way to have the first element of a column line up with the last unless you interrogate the width of every element in the column, so printing columns really can't be a one-pass process.\n\nSo, I produce an \\ref apop_data set with no numeric elements and a text element to be\nfilled with the input data set, and then print that. That means that I'll be using\n(more than) twice the memory to print this. If this is a problem, you can use \\ref\napop_data_print to dump your data to a text file, and view the text file, or print\nsubsets.\n\nFor more machine-readable printing, see \\ref apop_data_print.\n*/\nvoid apop_data_show(const apop_data *in){\n    if (!in) {printf(\"NULL\\n\"); return;}\n    Get_vmsizes(in) //vsize, msize1, msize2, tsize\n//Take inventory and get sizes\n    size_t hasrownames = (in->names && in->names->rowct) ? 1 : 0;\n    size_t hascolnames = in->names && \n                    (in->names->vector || in->names->colct || in->names->textct);\n    size_t hasweights = (in->weights != NULL);\n\n    size_t outsize_r = GSL_MAX(in->matrix ? in->matrix->size1 : 0, in->vector ? in->vector->size: 0);\n    outsize_r = GSL_MAX(outsize_r, in->textsize[0]);\n    outsize_r = GSL_MAX(outsize_r, wsize);\n    if (in->names) outsize_r = GSL_MAX(outsize_r, in->names->rowct);\n    outsize_r += hascolnames;\n\n    size_t outsize_c = msize2;\n    outsize_c += in->textsize[1];\n    outsize_c += (vsize>0);\n    outsize_c += (wsize>0);\n    outsize_c += hasrownames + hasweights;\n\n//Write to the printout data set.\n    apop_data *printout = apop_text_alloc(NULL , outsize_r, outsize_c);\n    if (hasrownames)\n        for (size_t i=0; i < in->names->rowct; i ++)\n            apop_text_set(printout, i + hascolnames, 0, \"%s\", in->names->row[i]);\n    for (size_t i=0; i < vsize; i ++) //vsize may be zero.\n        apop_text_set(printout, i + hascolnames, hasrownames, \"%g\", gsl_vector_get(in->vector, i));\n    for (size_t i=0; i < msize1; i ++) //msize1 may be zero.\n        for (size_t j=0; j < msize2; j ++)\n            apop_text_set(printout, i + hascolnames, hasrownames + (vsize >0)+ j, \"%g\", gsl_matrix_get(in->matrix, i, j));\n    if (in->textsize[0])\n        for (size_t i=0; i < in->textsize[0]; i ++)\n            for (size_t j=0; j < in->textsize[1]; j ++)\n                apop_text_set(printout, i + hascolnames, hasrownames + (vsize>0)+ msize2 + j, \"%s\", in->text[i][j]);\n    if (hasweights)\n        for (size_t i=0; i < in->weights->size; i ++)\n            apop_text_set(printout, i + hascolnames, outsize_c-1, \"%g\", gsl_vector_get(in->weights, i));\n\n//column names\n    if (hascolnames){\n        if (vsize && in->names->vector)\n            apop_text_set(printout, 0 , hasrownames, \"%s\", in->names->vector);\n        if (msize2 && in->names)\n            for (size_t i=0; i < in->names->colct; i ++)\n                apop_text_set(printout, 0 , hasrownames + (vsize>0) + i, \"%s\", in->names->col[i]);\n        if (in->textsize[1] && in->names)\n            for (size_t i=0; i < in->names->textct; i ++)\n                apop_text_set(printout, 0 , hasrownames + (vsize>0) + msize2 + i, \"%s\", in->names->text[i]);\n        if (hasweights)\n            apop_text_set(printout, 0 , outsize_c-1, \"Weights\");\n    }\n\n//get column sizes\n    int colsizes[outsize_c];\n    for (size_t i=0; i < outsize_c; i ++){\n        colsizes[i] = strlen(printout->text[0][i]);\n        for (size_t j=1; j < outsize_r; j ++)\n            colsizes[i] = GSL_MAX(colsizes[i], strlen(printout->text[j][i]));\n    }\n\n//Finally, print\n    if (in->names && in->names->title && strlen(in->names->title))\n        printf(\"\\t%s\\n\\n\", in->names->title);\n    for (size_t j=0; j < outsize_r; j ++){\n        for (size_t i=0; i < outsize_c; i ++){\n            white_pad(colsizes[i] - strlen(printout->text[j][i]) + 1);//one spare space.\n            printf(\"%s\", printout->text[j][i]);\n            if (i > 0 && i< outsize_c-1) \n                printf(\" %s \", apop_opts.output_delimiter);\n        }\n        printf(\"\\n\");\n    }\n\n    if (in->more) {\n        printf(\"\\n\");\n        apop_data_show(in->more);\n    }\n    apop_data_free(printout);\n}\n\nvoid p_fn(FILE * f, double data){\n    if (data == (int) data) fprintf(f, \"% 5i\", (int) data); \n    else                    fprintf(f, \"% 5f\", data);\n}\n\nstatic void print_core_v(const gsl_vector *data, char *separator, Output_declares){\n    FILE *f = output_pipe;\n    if (!data) fprintf(f, \"NULL\\n\");\n    else {\n\t    for (size_t i=0; i<data->size; i++){\n\t\t    p_fn(f, gsl_vector_get(data, i));\n\t\t    if (i< data->size -1) fprintf(f, \"%s\", separator);\n\t    }\n\t    fprintf(f,\"\\n\");\n    }\n\tif (output_name) fclose(f);\n}\n\n/** Print a vector to the screen, a file, a pipe, or the database.\n\n  \\li See \\ref apop_prep_output for more on how printing settings are set.\n  \\li For example, the default for \\ref apop_opts_type \"apop_opts.output_delimiter\"\nis a tab, which puts the vector on one line, but <tt>apop_opts.output_type=\"\\n\"</tt>\nwould print the vector vertically.\n  \\li See also \\ref Legi for more details and examples.\n  \\li This function uses the \\ref designated syntax for inputs.\n\\ingroup all_public\n*/\nAPOP_VAR_HEAD void apop_vector_print(gsl_vector *data, Output_declares){\n    gsl_vector *apop_varad_var(data, NULL);\n    Dispatch_output\nAPOP_VAR_ENDHEAD\n\tprint_core_v(data, apop_opts.output_delimiter, Output_vars);\n }\n\n/* currently removed from the documentation.\n Dump a <tt>gsl_vector</tt> to the screen. \n    You may want to set \\ref apop_opts_type \"apop_opts.output_delimiter\".\n\n\\li See \\ref apop_prep_output for more on how printing settings are set.\n\\li See also \\ref Legi for more details and examples.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nvoid apop_vector_show(const gsl_vector *data){\n\tprint_core_v(data, apop_opts.output_delimiter, NULL, stdout, 's', 0); \n}\n\nstatic int get_max_strlen(char **names, size_t len){\n    int max  = 0;\n    for (int i=0; i< len; i++)\n        max = GSL_MAX(max, strlen(names[i]));\n    return max;\n}\n\n//On screen, display a pipe, else use the usual output delimiter.\nstatic void a_pipe(FILE *f, char displaytype){\n    if (displaytype == 's') fprintf(f, \" | \");\n    else                    fprintf(f, \"%s\", apop_opts.output_delimiter);\n}\n\nstatic void apop_data_print_core(const apop_data *data, FILE *f, char displaytype){\n    if (!data){\n        fprintf(f, \"NULL\\n\");\n        return;\n    }\n    int i, j, L = 0, \n        start   = (data->vector)? -1 : 0,\n        end     = (data->matrix)? data->matrix->size2 : 0,\n        rowend  = (data->matrix)? data->matrix->size1 : (data->vector) ? data->vector->size : data->text ? data->textsize[0] : -1;\n    if (data->names && data->names->title && strlen(data->names->title))\n        fprintf(f, \"\\t%s\\n\\n\", data->names->title);\n    if (data->names && data->names->rowct)\n        L   = get_max_strlen(data->names->row, data->names->rowct);\n    if (data->names && data->names->rowct && (data->names->vector || data->names->colct || data->names->textct)){\n        if ((apop_opts.db_name_column || *apop_opts.db_name_column=='\\0') || \n                !strcmp(apop_opts.db_name_column, \"row_names\"))\n            fprintf(f, \"%*s  \", L+2, \" \");\n        else { fprintf(f, \"%s\", apop_opts.db_name_column); a_pipe(f, displaytype); }\n    }\n    if (data->vector && data->names && data->names->vector){\n        fprintf(f, \"%s\", data->names->vector);\n    }\n    if (data->matrix){\n        if (data->vector && data->names && data->names->colct){\n            fprintf(f, \"%c \", data->names->vector ? ' ' : '\\t' );\n            a_pipe(f, displaytype);\n        }\n        if (data->names) \n          for(i=0; i< data->names->colct; i++){\n            if (i < data->names->colct -1)\n                fprintf(f, \"%s%s\", data->names->col[i], apop_opts.output_delimiter);\n            else\n                fprintf(f, \"%s\", data->names->col[i]);\n        }\n    }\n    if (data->textsize[1] && data->names && data->names->textct){\n        if ((data->vector && data->names && data->names->vector) || (data->matrix && data->names->colct))\n            a_pipe(f, displaytype);\n        if (data->names)\n          for(i=0; i< data->names->textct; i++){\n            if (i < data->names->textct -1)\n                fprintf(f, \"%s%s\", data->names->text[i], apop_opts.output_delimiter);\n            else\n                fprintf(f, \"%s\", data->names->text[i]);\n        }\n    }\n    if(data->names && (data->names->vector || data->names->colct || data->names->textct))\n        fprintf(f, \"\\n\");\n    for(j=0; j< rowend; j++){\n        if (data->names && data->names->rowct > j)\n            fprintf(f, \"%*s%s\", L+2, data->names->row[j], apop_opts.output_delimiter);\n        for(i=start; i< end; i++){\n            if ((i < 0 && j < data->vector->size) || (i>= 0 && j < data->matrix->size1 && i < data->matrix->size2))\n                p_fn(f,  apop_data_get(data, j, i));\n            else\n                fprintf(f, \" \");\n            if (i==-1 && data->matrix) \n                a_pipe(f, displaytype);\n            if (i < end-1)\n                fprintf(f, \"%s\", apop_opts.output_delimiter);\n        }\n        if (data->text){\n            if (data->vector || data->matrix)\n                a_pipe(f, displaytype);\n            if (j < data->textsize[0])\n                for(i=0; i< data->textsize[1]; i++){\n                    fprintf(f, \"%s\", data->text[j][i]);\n                    if (i < data->textsize[1]-1) fprintf(f, \"%s\", apop_opts.output_delimiter);\n                }\n        }\n        if (data->weights && j < data->weights->size){\n            a_pipe(f, displaytype);\n            p_fn(f, data->weights->data[j]);\n        }\n        fprintf(f, \"\\n\");\n    }\n}\n\n/** Print an \\ref apop_data set to a file, the database, or the screen,\n  as determined by the \\c .output_type.\n\n\\li See \\ref apop_prep_output for more on how printing settings are set.\n\\li See \\ref Legi for more details and examples.\n\\li See \\ref sqlsec for notes on writing an \\ref apop_data set to the database.\n\\li This function uses the \\ref designated syntax for inputs.\n\\ingroup all_public\n*/\nAPOP_VAR_HEAD void apop_data_print(const apop_data *data, Output_declares){\n    const apop_data * apop_varad_var(data, NULL);\n    Dispatch_output\nAPOP_VAR_ENDHEAD \n    if (output_type  == 'd'){\n        if (output_append == 'w') apop_table_exists(output_name, 'd');\n        apop_data_to_db(data, output_name, output_append);\n        return;\n    }\n    apop_data_print_core(data, output_pipe, output_type);\n    if (data && data->more) {\n        output_append='a';\n        apop_data_print(data->more, Output_vars);\n    }\n    if (output_name)\n        fclose(output_pipe);\n}\n\n/** Print a \\c gsl_matrix to the screen, a file, a pipe, or a database table.\n\n\\li See \\ref apop_prep_output for more on how printing settings are set.\n\\li See also \\ref Legi for more details and examples.\n\\li This function uses the \\ref designated syntax for inputs.\n\\ingroup all_public\n*/\nAPOP_VAR_HEAD void apop_matrix_print(const gsl_matrix *data, Output_declares){\n    const gsl_matrix *apop_varad_var(data, NULL);\n    Dispatch_output\nAPOP_VAR_ENDHEAD\n    if (output_type == 'd'){\n        Apop_assert_c(data, , 1, \"You sent me a NULL matrix. No database table will be created.\");\n    } else if (!data){\n        fprintf(output_pipe, \"NULL\\n\");\n        return;\n    }\n    apop_data_print(&(apop_data){.matrix=(gsl_matrix*)data}, Output_vars); //cheating on the const qualifier\n}\n\n//leaving this undocumented for now.\nvoid apop_matrix_show(const gsl_matrix *data){\n    apop_data_print_core(&(apop_data){.matrix=(gsl_matrix*)data},  stdout, 's');\n}\n"
  },
  {
    "path": "apop_rake.m4.c",
    "content": "//#define __USE_POSIX //for strtok_r\n#include \"apop_internal.h\"\n#include <stdbool.h>\n#include <gsl/gsl_sort_vector.h>\nvoid xprintf(char **q, char *format, ...); //in apop_conversions.c\n\n/* This is the internal documentation for apop_rake(). I assume you've read the usage\n   documentation already (if you haven't, it's below, just above apop_rake).\n  \n  I started with:\nAlgorithm AS 51 Appl. Statist. (1972), vol. 21, p. 218\noriginal (C) Royal Statistical Society 1972\n\nBut at this point, I'm not sure if any of the original code remains at all, because the\nsparse method and the full-matrix method differ so much. \n\nThere are two phases to the process: the SQL part and the in-memory table part. The file\nhere begins with the indexing of the in-memory table. The index on the in-memory\ntable is somewhat its own structure, with get/set functions and so forth, so it has\nits own section, listed first in this file. After that, we get to the raking function\n(c_loglin) and its supporting functions, and then the apop_rake function itself. The\nSQL work itself is inside apop_rake, and c_loglin is called at the end to do the\nraking. */\n\n\n/* This section indexes a PMF-type apop_data struct. The index is held in a 2-d grid\nof nodes. The index's first index is the dimension, and will hold one column for each\ncolumn in the original data set. The index's second index goes over all of the values\nin the given column.\n\nPS: this was intended to one day become a general-purpose index for PMFs; making that\nhappen remains on the to-do list.\n\nmnode[i] = a dimension row\nmnode[i][j] = a value in a given dimension\nmnode[i][j].margin_ptrs = a list of all of the rows in the data set with the given value.\nmnode[i][j].margin_ptrs[k] = the kth item for the value.\n  */\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    double val;\n    bool *margin_ptrs, *fit_ptrs;\n} mnode_t;\n\ntypedef void(*index_apply_f)(mnode_t * const * const, int, void*);\n/** \\endcond */\n\nstatic int find_val(double findme, mnode_t *nodecol){\n    for (int i=0; nodecol[i].val <= findme || gsl_isnan(findme); i++)\n        if (nodecol[i].val == findme || (gsl_isnan(findme) && gsl_isnan(nodecol[i].val)))\n           return i;\n    return -1;\n}\n\n//returns -1 if the given row/col doesn't exist.\nint index_add_node(mnode_t **mnodes, size_t dim, size_t row, double val, bool is_margin){\n    int index = find_val(val, mnodes[dim]);\n    if (index == -1) return -1;\n    if (is_margin) mnodes[dim][index].margin_ptrs[row] = true;\n    else           mnodes[dim][index].fit_ptrs[row] = true;\n    return 0;\n}\n\nmnode_t **index_generate(apop_data const *in, apop_data const *in2){\n    size_t margin_ct = in->matrix->size2;\n    size_t margin_rows = in->matrix->size1;\n    size_t fit_rows = in2->matrix->size1;\n    mnode_t **mnodes = malloc(sizeof(mnode_t*)*(margin_ct+1));\n    //allocate every node\n    for(size_t i=0; i < margin_ct; i ++){\n        gsl_vector *vals = apop_vector_unique_elements(Apop_cv(in, i));\n        mnodes[i] = malloc(sizeof(mnode_t)*(vals->size+1));\n        for(size_t j=0; j < vals->size; j ++)\n            mnodes[i][j] = (mnode_t) {.val = gsl_vector_get(vals, j),\n                        .margin_ptrs = calloc(margin_rows, sizeof(bool)),\n                        .fit_ptrs = calloc(fit_rows, sizeof(bool))\n            };\n        mnodes[i][vals->size] = (mnode_t) {.val = GSL_POSINF}; //end-of-array sentinel\n        gsl_vector_free(vals);\n    }\n    mnodes[margin_ct] = NULL; //end-of-array sentinel\n    //put data from the matrix into the right pigeonhole\n    for(size_t i=0; i < margin_ct; i++){\n        for(size_t j=0; j < in->matrix->size1; j++)\n            Apop_stopif(index_add_node(mnodes, i, j, apop_data_get(in, j, i), true) == -1, return mnodes,\n            0, \"I can't find a value, %g, that should've already been inserted.\", apop_data_get(in, j, i));\n        for(size_t j=0; j < in2->matrix->size1; j++) //these values may not be present, in which case ignore them.\n            index_add_node(mnodes, i, j, apop_data_get(in2, j, i), false);\n    }\n    return mnodes;\n}\n\nvoid index_free(mnode_t **in){\n\tfor (int i=0; in[i]; i++){\n\t\tfor (int j=0; !isinf(in[i][j].val); j++){\n\t\t\tfree(in[i][j].margin_ptrs);\n\t\t\tfree(in[i][j].fit_ptrs);\n        }\n        free(in[i]);\n\t}\n    free(in);\n}\n\n/* The next two functions are a recursive iteration over all combinations of values\nfor a given index (which may be the whole thing or a margin). index_foreach does the\ninitialization of some state variables; value_loop does the odometer-like recursion. At\neach step, value_loop will either increment the current dimension's index, or if the\ncurrent index is at its limit, will loop back to zero on this index and then set the\nnext dimension as active. */\nstatic void value_loop(mnode_t *icon[], int *indices, mnode_t **values, \n            int this_dim, index_apply_f f, int *ctr, void *args){\n    while (!isinf(icon[this_dim][indices[this_dim]].val)){\n        values[this_dim] = &icon[this_dim][indices[this_dim]];\n\t\tif (icon[this_dim+1]){\n\t\t\tindices[this_dim+1]=0;\n\t\t\tvalue_loop(icon, indices, values, this_dim+1, f, ctr, args);\n\t\t} else{\n\t\t\tf(values, *ctr, args);\n\t\t\t(*ctr)++;\n\t\t}\n        indices[this_dim]++;\n\t}\n}\n\n/* Inputs: the index to be iterated over, the function to apply to each combination,\nand a void* with other sundry arguments to the function. I'll apply the function \nto an mnode_t* with a single mnode_t for each dimension.  */\nvoid index_foreach(mnode_t *index[], index_apply_f f, void *args){\n    int j, ctr=0;\n    for (j=0; index[j]; ) j++;\n    mnode_t *values[j+1];\n    for (j=0; index[j]; j++) \n        values[j] = &index[j][0];\n    values[j] = malloc(sizeof(mnode_t));\n    values[j]->val = GSL_POSINF;\n    int indices[j];\n    memset(indices, 0, sizeof(int)*j);\n    value_loop(index, indices, values, 0, f, &ctr, args);\n    free(values[j]);\n}\n\n/* Check whether an observation falls into all of the given margins.\n\nThis is for a contrast. We've already calculated the in/out vector for each value on\neach margin, and now need to && each dimension into one vector.\n\n\\param index Is actually a partial index: for each dimension, there should be only one value. Useful for the center of an index_foreach loop.\n\\param d Should already be allocated to the right size, may be filled with garbage.\n\\return d will be zero or one to indicate which rows of the indexed data set meet all criteria.  */\nvoid index_get_element_list(mnode_t *const * index, bool *d, size_t len, bool is_margin){\n    memcpy(d, is_margin ? index[0]->margin_ptrs: index[0]->fit_ptrs, len *  sizeof(bool));\n    for(size_t i=1; !isinf(index[i]->val); i++){\n        bool *ptr_list = is_margin ? index[i]->margin_ptrs: index[i]->fit_ptrs;\n        for(size_t j=0; j < len; j++)\n             d[j] &= ptr_list[j];\n    }\n}\n\n////End index.c\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    const apop_data *indata; \n    apop_data *fit; \n    size_t **elmtlist;\n    size_t *elmtlist_sizes;\n\tgsl_vector *indata_values;\n    mnode_t **index;\n    size_t ct, al;\n    double *maxdev;\n} rake_t;\n/** \\endcond */\n\nstatic void rakeinfo_grow(rake_t *r){\n    r->al = (r->al+1)*2;\n    r->elmtlist = realloc(r->elmtlist , sizeof(size_t*) * r->al);\n    r->elmtlist_sizes = realloc(r->elmtlist_sizes, sizeof(size_t) * r->al);\n    r->indata_values = apop_vector_realloc(r->indata_values, r->al);\n}\n\nstatic void rakeinfo_free(rake_t r){\n    #define free_and_clear(in) free(in), (in) = NULL\n    for (int i=0; i < r.ct; i++)\n        free_and_clear(r.elmtlist[i]);\n    free_and_clear(r.index); //these are just pointers to the main index.\n    free_and_clear(r.elmtlist_sizes);\n    free_and_clear(r.elmtlist);\n    gsl_vector_free(r.indata_values);\n    r.indata_values= NULL;\n}\n\nstatic void scaling(size_t const *elmts, size_t const n,  gsl_vector *weights, double const in_sum, double *maxdev){\n    double fit_sum = 0, out_sum=0;\n    for(size_t i=0; i < n; i ++)\n        fit_sum += weights->data[elmts[i]];\n    if (!fit_sum) return; //can happen if init table is very different from margins.\n    for(size_t i=0; i < n; i ++){\n        out_sum +=\n        weights->data[elmts[i]] *= in_sum/fit_sum;\n    }\n    *maxdev = GSL_MAX(fabs(fit_sum - out_sum), *maxdev);\n}\n\n/* Given one set of values from one margin, do the actual scaling.\n On the first pass, this function takes notes on each margin's element list and total \n in the original data. Later passes just read the notes and call the scaling() function above.\n*/\nstatic void one_set_of_values(mnode_t *const * const margincons, int ctr, void *in){\n    rake_t *r = in;\n    size_t marginsize = r->indata->matrix->size1;\n    size_t fitsize = r->fit->matrix->size1;\n    bool melmts[marginsize];\n    bool fitelmts[fitsize];\n\tbool first_pass = false;\n    double in_sum;\n\tif (ctr < r->ct)\n\t\tin_sum = gsl_vector_get(r->indata_values, ctr);\n    else {\n        r->ct++;\n        if (ctr >= r->al || r->al==0) rakeinfo_grow(r);\n   \t\tindex_get_element_list(margincons, melmts, marginsize, true);\n   \t\tindex_get_element_list(margincons, fitelmts, fitsize, false);\n        in_sum = 0;\n        int n=0, al=0;\n        r->elmtlist[ctr] = NULL;\n        //use margin index to get total for this margin.\n        for(int m=0; m < marginsize; m++)\n            if (melmts[m]) in_sum += r->indata->weights->data[m];\n        //use fit index to get elements involved in this margin\n        for(int m=0; m < fitsize; m++)\n            if (fitelmts[m]){\n                if (n >= al) {\n                    al = (al+1)*2;\n                    r->elmtlist[ctr] = realloc(r->elmtlist[ctr], al*sizeof(size_t));\n                }\n                r->elmtlist[ctr][n++] = m;\n            }\n        r->elmtlist_sizes[ctr] = n;\n        r->indata_values->data[ctr] = in_sum;\n\t\tfirst_pass = true;\n\t}\n    if (!r->elmtlist_sizes[ctr]) return;\n    if (!first_pass && !in_sum)  return;\n    scaling(r->elmtlist[ctr], r->elmtlist_sizes[ctr], r->fit->weights, in_sum, r->maxdev);\n}\n\n/* For each configuration margin, for each combination for that margin, \n   call the above one_set_of_values() function. */\nstatic void main_loop(int config_ct, rake_t *rakeinfo, int k){\n    for (size_t i=0; i < config_ct; i ++)\n\t\tif (k==1) index_foreach(rakeinfo[i].index, one_set_of_values, rakeinfo+i);\n\t\telse\n\t\t\tfor(int m=0; m < rakeinfo[i].ct; m++)\n\t\t\t\tone_set_of_values(NULL, m, rakeinfo+i);\n}\n\n/* Following the FORTRAN, 1 contrast ==> icon. Here, icon will be a\nsubset of the main index including only the columns pertaining to a given margin. */\nvoid generate_margin_index(mnode_t **icon, const apop_data *margin, mnode_t **mainindex, size_t col){\n    gsl_vector *iconv = Apop_cv(margin, col);\n    int ct = 0;\n    for (int j=0; mainindex[j]; j++)\n        if (gsl_vector_get(iconv, j))\n            icon[ct++] = mainindex[j];\n    icon[ct] = NULL;\n}\n\nvoid cleanup(mnode_t **index, rake_t rakeinfos[], int contrast_ct){\n\tfor(size_t i=0; i < contrast_ct; i++)\n\t\trakeinfo_free(rakeinfos[i]);\n\tindex_free(index);\n}\n\n/*\n\\param config \tAn nvar x ncon matrix; see below. [as in the original, but not squashed into 1-D.]\n\\param indata \tthe actual table. I use a PMF format.\n\\param fit \t\tthe starting table. Same size as table.\n\\param maxdev \tmaximum deviation; stop when this is met.\n\\param maxit \tmaximum iterations; stop when this is met.\n   \nRe: the contrast array: Each _column_ is a contrast. Put a one in each col\n involved in the contrast. E.g., for (1,2), (2,3):\n\n 1 0<br>\n 1 1<br>\n 0 1\n */\nstatic void c_loglin(const apop_data *config, const apop_data *indata, \n                        apop_data *fit, double tolerance, int maxit) {\n    mnode_t ** index = index_generate(indata, fit);\n\n    /* Make a preliminary adjustment to obtain the fit to an empty configuration list */\n    //fit->weights is either all 1 (for synthetic data) or the initial counts from the db.\n    double x = apop_sum(indata->weights);\n    double y = apop_sum(fit->weights);\n    gsl_vector_scale(fit->weights, x/y);\n\n\tint contrast_ct =config && config->matrix ? config->matrix->size2 : 0;\n    rake_t rakeinfos[contrast_ct];\n    double maxdev=0;\n    for(size_t i=0; i < contrast_ct; i ++){\n        gsl_vector *iconv = Apop_cv(config, i);\n        rakeinfos[i] = (rake_t) {\n            .indata = indata, \n            .fit = fit, \n            .maxdev = &maxdev, //one value shared across dimensions\n            .index = malloc(sizeof(mnode_t) *(apop_sum(iconv)+1)),\n            //others are NULL, to be filled in as we go.\n        };\n        generate_margin_index(rakeinfos[i].index, config, index, i);\n    }\n    int k;\n    gsl_vector *previous = apop_vector_copy(fit->weights);\n    for (k = 1; k <= maxit; ++k) {\n        maxdev = 0;\n        main_loop(contrast_ct, rakeinfos, k);\n        Apop_notify(3, \"Data set after round %i of raking.\\n\", k);\n        if (apop_opts.verbose >=3) apop_data_print(fit, .output_pipe=apop_opts.log_file);\n        if (maxdev < tolerance) break;// Normal termination \n        gsl_vector_memcpy(previous, fit->weights);\n    }\n    cleanup(index, rakeinfos,contrast_ct);\n    gsl_vector_free(previous);\n    Apop_stopif(k == maxit, fit->error='c', 0, \"Maximum number of iterations reached.\");\n}\n\nchar *pipe_parse = \"[ \\n\\t]*([^| \\n\\t]+)[ \\n\\t]*([|]|$)\";\n\napop_data **generate_list_of_contrasts(char *const *contras_in, int contrast_ct){\n  apop_data** out = malloc(sizeof(apop_data*)* contrast_ct);\n\tfor (int i=0; i< contrast_ct; i++) {\n        apop_regex(contras_in[i], pipe_parse, out+i);\n        apop_text_alloc(out[i], *out[i]->textsize, 1);\n    }\n\treturn out;\n}\n\napop_data *get_var_list(char const *margin_table, char const *count_col, char const *init_count_col,\n                        char * const *varlist, int *var_ct){\n    apop_data *all_vars_d=NULL;\n    if (!varlist){\n        Apop_stopif(apop_opts.db_engine=='m', apop_return_data_error(y),\n                    0, \"I need a list of the full set of variable \"\n                       \"names sent as .varlist= (char *[]){\\\"var1\\\", \\\"var2\\\",...}\");\n        //use SQLite's table_info, then shift the second col to the first.\n        all_vars_d = apop_query_to_text(\"PRAGMA table_info(%s)\", margin_table);\n        int ctr=0;\n        for (int i=0; i< all_vars_d->textsize[0]; i++)\n            if (all_vars_d->text[i][1] && (count_col ? strcmp(all_vars_d->text[i][1], count_col) : 1)\n                 && (init_count_col ? strcmp(all_vars_d->text[i][1], init_count_col): 1))\n                    apop_text_set(all_vars_d, 0, ctr++, all_vars_d->text[i][1]);\n        apop_text_alloc(all_vars_d, 1, ctr);\n        *var_ct = ctr;\n    } else {\n        all_vars_d = apop_text_alloc(NULL, 1, *var_ct); \n        for (int i=0; i<*var_ct; i++) apop_text_set(all_vars_d, 0, i, varlist[i]);\n    }\n    Apop_stopif(!all_vars_d, apop_return_data_error(y), 0, \"Trouble getting/parsing the list of variables.\");\n    return all_vars_d;\n}\n\nstatic int get_var_index(char *const *all_vars, int len, char *findme){\n\tfor (int i=0; i< len; i++)\n\t\tif (all_vars[i] && !strcmp(all_vars[i], findme))\n\t\t\treturn i;\n\tApop_notify(0, \"I couldn't find %s in the full list of variables. Returning -1.\", findme);\n    return -1;\n}\n\nvoid nan_to_zero(double *in){ if (gsl_isnan(*in)) *in=0;}\n\ndouble nudge_zeros(apop_data *in, void *nudge){\n    if (!in->weights->data[0])\n        in->weights->data[0] = *(double*)nudge;\n    return 0;\n}\n\nint find_in_allvars(char const *in, apop_data const *allvars){\n    for (int i=0; i< allvars->textsize[1]; i++)\n        if (!strcmp(allvars->text[0][i], in)) return i;\n    Apop_stopif(1, return -1, 0, \"Variable in your contrast list [%s] not in \"\n                                 \"your list of all variables.\", in);\n}\n\n/* If you are fully synthesizing or nudging zero cells, then calculate the set of \n   cells that could be nonzero (given the margin information). \n   * If using only an init_table, then that's your list of nonzero cells right there.\n   * If you have an init_table but want to nudge the zero cells up a bit, then you need\n     this, and have to merge with the init_table\n   * If you have no init_table, then this list is all the cells.\n \n */\nstatic int setup_nonzero_contrast(char const *margin_table, \n              apop_data const * allvars, int run_number,\n              char const *list_of_fields, apop_data *const* contras, int contrast_ct,\n              double nudge, char const * structural_zeros, bool have_init_table){\n    char *q;\n    bool used[allvars->textsize[1]];\n    memset(used, 0, sizeof(bool)*allvars->textsize[1]);\n\tAsprintf(&q, \"create table apop_zerocontrasts_%i as select %s, %g from\\n\", \n            run_number, apop_text_paste(allvars, .between=\",\"), nudge);\n    for (int i=0; i < contrast_ct; i++){\n        xprintf(&q, \"%s%s (select distinct %s  from %s) \\n\", \n                  q, i>0 ? \"natural join\" : \"\", \n                  apop_text_paste(contras[i], .between=\",\"),  margin_table);\n        for (int j=0; j<*contras[i]->textsize; j++){\n            int val=find_in_allvars(*contras[i]->text[j], allvars);\n            Apop_stopif(val==-1, return -1, 0, \"Error setting up contrasts\");\n            used[val]=true; \n        }\n    } \n    //make sure all variables are joined in. \n    for (int i=0; i < allvars->textsize[1]; i++)\n        if (!used[i]) xprintf(&q, \"%s%s (select distinct %s  from %s)\\n\", \n                  q, (!contrast_ct && i==0) ? \"\" : \"natural join\",\n                  allvars->text[0][i],  margin_table);\n    if (structural_zeros) xprintf(&q, \"%s where not (%s)\\n\", q, structural_zeros);\n    if (have_init_table){\n        //Keep out margins with values for now; join them in below.\n        xprintf(&q, \"%s except\\nselect %s, %g from %s\", q, list_of_fields, nudge, margin_table);\n    }\n\tApop_stopif(apop_query(\"%s\", q), return 1, 0, \"query failed.\");\n    free(q);\n    return 0;\n}\n\n/** Fit a log-linear model via iterative proportional fitting, aka raking.\n\nRaking has many uses. The <a href=\"http://modelingwithdata.org/arch/00000138.htm\">Modeling with Data blog</a> presents a series of discussions \nof uses of raking, including some worked examples.\n\nOr see Wikipedia for an overview of Log linear models, aka\n<a href=\"http://en.wikipedia.org/wiki/Poisson_regression\">Poisson regressions</a>. \nOne approach toward log-linear modeling is a regression form; let there be four\ncategories, A, B, C, and D, from which we can produce a model positing, for example,\nthat cell count is a function of a form like \\f$g_1(A) + g_2(BC) + g_3(CD)\\f$. In this case, we would\nassign a separate coefficient to every possible value of A, every possible value of\n(B, C), and every value of (C, D). Raking is the technique that searches for that large\nset of parameters.\n\nThe combinations of categories that are considered to be relevant are called \\em\ncontrasts, after ANOVA terminology of the 1940s.\n\nThe other constraint on the search are structural zeros, which are values that you know\ncan never be non-zero, due to field-specific facts about the variables. For example, U.S.\nSocial Security payments are available only to those age 65 or older, so \"age <65 and\ngets_soc_security=1\" is a structural zero.\n\nBecause there is one parameter for every combination, there may be millions of parameters\nto estimate, so the search to find the most likely value requires some attention to\ntechnique. For over half a century, the consensus method for searching has been raking, which\niteratively draws each category closer to the mean in a somewhat simple manner (this was\nfirst developed circa 1940 and had to be feasible by hand), but which is guaranteed to\neventually arrive at the maximum likelihood estimate for all cells.\n\nAnother complication is that the table is invariably sparse. One can easily construct\ntables with millions of cells, but the corresponding data set may have only a few\nthousand observations.\n\nThis function uses the database to resolve the sparseness problem. It constructs a query\nrequesting all combinations of categories the could possibly be non-zero after raking,\ngiven all of the above constraints. Then, raking is done using only that subset. \nThis means that the work is done on a number of cells proportional to the number of data\npoints, not to the full cross of all categories. Set <tt>apop_opts.verbose</tt> to 2 or greater to show the query on \\c stderr.\n\n\\li One could use raking to generate `fully synthetic' data: start with observation-level data in a margin table. Begin the raking with a starting data set of all-ones. Then rake until the all-ones set transforms into something that conforms to the margins and (if any) structural zeros. You now have a data set which matches the marginal totals but does not use any other information from the observation-level data. If you do not specify an <tt>.init_table</tt>, then an all-ones default table will be used.\n\n\n\\param margin_table The name of the table in the database to use for calculating\nthe margins.  The table should have one observation per row.  (No default)\n\n\\param var_list The full list of variables to search. A list of strings, e.g., <tt>(char *[]){\"var1\", \"var2\", ..., \"var15\"}</tt>\n\n\\param var_ct The count of the full list of variables to search.\n\n\\param contrasts The contrasts describing your model. Like the \\c var_list input, a list of strings like <tt>(char *[]){\"var1\", \"var7\", \"var13\"}</tt>\ncontrast is a pipe-delimited list of variable names. (No default)\n\n\\param contrast_ct The number of contrasts in the list of contrasts. (No default)\n\n\\param structural_zeros a SQL clause indicating combinations that can never take a nonzero\nvalue. This will go into a \\c where clause, so anything you could put there is OK, e.g.\n\"age <65 and gets_soc_security=1 or age <15 and married=1\". \nYour margin data is not checked for structural zeros.  (default: no structural zeros)\n\n\\param max_iterations Number of rounds of raking at which the algorithm halts. (default: 1000)\n\n\\param tolerance I calculate the change for each cell from round to round;\nif the largest cell change is smaller than this, I stop. (default: 1e-5)\n\n\\param count_col This column gives the count of how many observations are represented\nby each row. If \\c NULL, ech row represents one person. (default: \\c NULL)\n\n\\param init_table The default is to initially set all table elements to one and then\nrake from there. This is effectively the `fully synthetic' approach, which uses only\nthe information in the margins and derives the data set closest to the all-ones data\nset that is consistent with the margins. Care is taken to maintan sparsity in this\ncase.  If you specify an \\c init_table, then I will get the initial cell counts from\nit. (default: the fully-synthetic approach, using a starting point of an all-ones grid.)\n\n\\param init_count_col The column in \\c init_table with the cell counts.\n\n\\param nudge There is a common hack of adding a small value to every zero entry, because\na zero entry will always scale to zero, while a small value could eventually scale\nto anything.  Recall that this function works on sparse sets, so I first filter out\nthose cells that could possibly have a nonzero value given the observations, then I\nadd <tt>nudge</tt> to any zero cells within that subset.\n\n\\return An \\ref apop_data set where every row is a single combination of variable values\nand the \\c weights vector gives the most likely value for each cell.\n\n\\exception out->error='i' Input was somehow wrong.\n\\exception out->error='c' Raking did not converge, reached max. iteration count.\n\n\\li Set <tt>apop_opts.verbose=3</tt> to see the intermediate tables at the end of each round of raking.\n\\li If you want all cells to have nonzero value, then you can do that via pre-processing:\n\\code\napop_query(\"update data_table set count_col = 1e-3 where count_col = 0\");\n\\endcode\n\\li This function is thread-safe. To make this happen, temp database tables are \n    named using a number built with \\c omp_get_thread_num.\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data * apop_rake(char const *margin_table, char * const*var_list, int var_ct, char * const *contrasts, int contrast_ct, char const *structural_zeros, int max_iterations, double tolerance, char const *count_col, char const *init_table, char const *init_count_col, double nudge){\n    char const * apop_varad_var(margin_table, NULL);\n    Apop_stopif(!margin_table, apop_return_data_error(i), 0,  \n                        \"I need the name of a table in the database that will be the data source.\");\n    Apop_stopif(!apop_table_exists(margin_table), apop_return_data_error(i), \n                        0, \"your margin_table, %s, doesn't exist in the database.\", margin_table);\n    char *const* apop_varad_var(var_list, NULL);\n    int apop_varad_var(var_ct, 0);\n    char *const * apop_varad_var(contrasts, NULL); //default to all vars?\n    int apop_varad_var(contrast_ct, 0);\n    Apop_stopif(contrasts&&!contrast_ct, apop_return_data_error(i),\n            0, \"you gave me a list of contrasts but not the count. \"\n            \"This is C--I can't count them myself. Please provide the count and re-run.\");\n    char const * apop_varad_var(structural_zeros, NULL);\n    char const * apop_varad_var(count_col, NULL);\n    int apop_varad_var(max_iterations, 1e3);\n    double apop_varad_var(tolerance, 1e-5);\n    char const * apop_varad_var(init_count_col, NULL);\n    char const * apop_varad_var(init_table, NULL);\n    Apop_stopif(init_table && !apop_table_exists(init_table), apop_return_data_error(i),\n               0, \"your init_table, %s, doesn't exist in the database.\", init_table);\n    if (init_count_col && !init_table) init_table = margin_table;\n    double apop_varad_var(nudge, 0);\nAPOP_VAR_ENDHEAD\n    #ifdef OpenMP\n        int run_number = omp_get_thread_num();\n    #else\n        int run_number = 0;\n    #endif\n\n\tapop_data **contras = generate_list_of_contrasts(contrasts, contrast_ct);\n    apop_data *all_vars_d = get_var_list(margin_table, count_col, init_count_col, var_list, &var_ct);\n    Apop_stopif(all_vars_d->error, return all_vars_d, 0, \"Trouble setting up the list of variables.\");\n    int tt = all_vars_d->textsize[0]; all_vars_d->textsize[0] = 1; //mask all but the first row\n    char *list_of_fields = apop_text_paste(all_vars_d, .between=\", \");\n\n    if (nudge || !init_table){\n        char *tab;\n        Asprintf(&tab, \"apop_zerocontrasts_%i\", run_number);\n        apop_table_exists(tab, 'd');\n        free(tab);\n        Apop_stopif(setup_nonzero_contrast(margin_table, all_vars_d,  \n                        run_number, list_of_fields, contras, contrast_ct, (nudge ? nudge : 1), structural_zeros, init_table),\n                 apop_return_data_error(q),\n                 0, \"Couldn't calculate the set of nonzero cells.\");\n    }\n\n    char *initt=NULL; //handle structural zeros via subquery\n    //note that margin data may have invalid rows.\n    if (init_table){\n        if (structural_zeros) \n             Asprintf(&initt, \"(select * from %s where not (%s))\", init_table, structural_zeros);\n        else Asprintf(&initt, \"%s\", init_table);\n    }\n    char *init_q, *pre_init_q = NULL;\n    if (init_table){\n        char *countstr;\n        if (init_count_col) Asprintf(&countstr, \"sum(%s) as %s\", init_count_col, init_count_col);\n        else                Asprintf(&countstr, \"count(%s)\", **all_vars_d->text);\n        Asprintf(&init_q, \"select %s, %s from %s group by %s\", \n                           list_of_fields, countstr, initt, list_of_fields);\n        free(countstr);\n    }\n\n    char *marginq, *cc; \n    if (count_col) Asprintf(&cc, \"sum(%s)\", count_col);\n    else           cc = strdup(\"count(*)\");\n    Asprintf(&marginq, \"select %s, %s  from %s\\ngroup by %s\", \n                       list_of_fields, cc, margin_table, list_of_fields);\n    free(cc); free(initt);\n\n    char *format=strdup(\"w\");\n    for (int i =0 ; i< var_ct; i++)\n        xprintf(&format, \"m%s\", format);\n    apop_data *d, *contrast_grid;\n    d = apop_query_to_mixed_data(format, \"%s\", marginq);\n    Apop_stopif(!d || d->error, apop_return_data_error(q),\n            0, \"This query:\\n%s\\ngenerated a blank or broken table.\", marginq);\n    free(marginq);\n\n    if (pre_init_q) Apop_stopif(apop_query(\"%s\", pre_init_q), apop_return_data_error(q),\n            0, \"This query:\\n%s\\ngenerated a blank or broken table.\", pre_init_q);\n\n    apop_data *fit;\n    if (init_table) {\n        fit = (nudge) \n               ? apop_query_to_mixed_data(format, \"%s\\nunion\\nselect * from apop_zerocontrasts_%i \", init_q, run_number)\n               : apop_query_to_mixed_data(format, \"%s\", init_q);\n        Apop_stopif(!fit, apop_return_data_error(q), 0, \"Query returned a blank table.\");\n        Apop_stopif(fit->error, apop_return_data_error(q), 0, \"Query error.\");\n    } else {\n        fit = apop_query_to_mixed_data(format, \"select * from apop_zerocontrasts_%i \", run_number);\n        gsl_vector_set_all(fit->weights, nudge ? nudge : 1);\n    }\n    free(format);\n    apop_vector_apply(fit->weights, nan_to_zero);\n    if (nudge) apop_map(fit, .fn_rp=nudge_zeros, .param=&nudge);\n\n    contrast_grid = apop_data_calloc(var_ct, contrast_ct);\n\tfor (int i=0; i< contrast_ct; i++)\n\t\tfor (int j=0; j< contras[i]->textsize[0]; j++)\n\t\t\tapop_data_set(contrast_grid, get_var_index(*all_vars_d->text, all_vars_d->textsize[1], contras[i]->text[j][0]), i, 1);\n\t\n    if (!init_table || nudge)\n        for (int i=0; i< contrast_ct; i++) apop_data_free(contras[i]);\n\n    c_loglin(contrast_grid, d, fit, tolerance, max_iterations);\n    apop_data_free(d);\n    \n    all_vars_d->textsize[0] = tt;\n    apop_data_free(all_vars_d);\n    if (!init_table || nudge) apop_query(\"drop table apop_zerocontrasts_%i\", run_number);\n\tapop_data_free(contrast_grid);\n\treturn fit;\n}\n"
  },
  {
    "path": "apop_regression.m4.c",
    "content": "/** \\file apop_regression.c\tGenerally, if it assumes something is  Normally distributed, it's here.*/\n/* Copyright (c) 2006--2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <search.h> //lsearch; bsearch is in stdlib.\n\n/* For use by MLE, OLS, et al. Available for public use, but undocumented. */\nvoid apop_estimate_parameter_tests (apop_model *est){\n    Nullcheck_p(est, )\n    if (!est->data) return;\n    apop_data *ep = apop_data_add_page(est->info, apop_data_alloc(est->parameters->vector->size, 2), \"<test info>\");\n    apop_name_add(ep->names, \"p value\", 'c');\n    apop_name_add(ep->names, \"confidence\", 'c');\n    apop_name_stack(ep->names, est->parameters->names, 'r', 'r');\n    Get_vmsizes(est->data); //msize1, vsize\n    int df = msize1 ? msize1 : vsize;\n    df -= est->parameters->vector->size;\n    df  = df < 1 ? 1 : df; //some models aren't data-oriented.\n    apop_data_add_named_elmt(est->info, \"df\", df);\n\n    apop_data *one_elmt = apop_data_calloc(1, 1);\n    gsl_vector *param_v = apop_data_pack(est->parameters);\n    for (size_t i=0; i< est->parameters->vector->size; i++){\n        Apop_settings_add_group(est, apop_pm, .index=i);\n        apop_model *m = apop_parameter_model(est->data, est);\n\n        double zero = apop_cdf(one_elmt, m);\n        apop_model_free(m);\n        double conf = 2*fabs(0.5-zero); //parameter is always at 0.5 along a symmetric CDF\n        apop_data_set(ep, i, .colname=\"confidence\", .val=conf);\n        apop_data_set(ep, i, .colname=\"p value\",    .val=1-conf);\n    }\n    gsl_vector_free(param_v);\n    apop_data_free(one_elmt);\n}\n\n//Cut and pasted from the GNU std library documentation, modified to consider NaNs:\nstatic int compare_doubles (const void *a, const void *b) {\n    const double *da = (const double *) a;\n    const double *db = (const double *) b;\n    if (gsl_isnan(*da)) return gsl_isnan(*db) ? 0 : 1;\n    if (gsl_isnan(*db)) return -1;\n    return (*da > *db) - (*da < *db);\n}\n\ntypedef const char * ccp;\nstatic int strcmpwrap(const void *a, const void *b){\n    const ccp *aa = a;\n    const ccp *bb = b;\n    return strcmp(*aa, *bb);\n}\n\n/** Give me a vector of numbers, and I'll give you a sorted list of the unique elements.\n  This is basically running <tt>select distinct datacol from data order by datacol</tt>,\n  but without the aid of the database.\n\n  \\param v a vector of items\n  \\return a sorted vector of the distinct elements that appear in the input.\n  \\li NaNs (if any) appear at the end of the sort order.\n  \\see apop_text_unique_elements \n*/\ngsl_vector * apop_vector_unique_elements(const gsl_vector *v){\n    size_t prior_elmt_ctr = 107;\n    size_t elmt_ctr = 0;\n    double *elmts = NULL;\n    for (size_t i=0; i< v->size; i++){\n        if (prior_elmt_ctr != elmt_ctr)\n            elmts = realloc(elmts, sizeof(double)*(elmt_ctr+1));\n        prior_elmt_ctr = elmt_ctr;\n        double val = gsl_vector_get(v, i);\n        lsearch(&val, elmts, &elmt_ctr, sizeof(double), compare_doubles);\n        if (prior_elmt_ctr < elmt_ctr)\n            qsort(elmts, elmt_ctr, sizeof(double), compare_doubles);\n    }\n    gsl_vector *out = apop_array_to_vector(elmts, elmt_ctr);\n    free(elmts);\n    return out;\n}\n\n/** Give me a column of text, and I'll give you a sorted list of the unique elements. \n  This is basically running <tt>select distinct * from datacolumn</tt>, but without \n  the aid of the database.  \n\n  \\param d An \\ref apop_data set with a text component\n  \\param col The text column you want me to use.\n  \\return An \\ref apop_data set with a single sorted column of text, where each unique text input appears once.\n  \\see apop_vector_unique_elements\n*/\napop_data * apop_text_unique_elements(const apop_data *d, size_t col){\n  char   **tval;\n\n  //first element for free\n  size_t prior_elmt_ctr, elmt_ctr = 1;\n  char **telmts = malloc(sizeof(char**)*2);\n  telmts[0] = d->text[0][col];\n\n    for (int i=1; i< d->textsize[0]; i++){\n        prior_elmt_ctr  = elmt_ctr;\n        tval    =  &(d->text[i][col]);\n        lsearch (tval, telmts, &elmt_ctr, sizeof(char*), strcmpwrap);\n        if (prior_elmt_ctr  < elmt_ctr){\n            qsort(telmts, elmt_ctr, sizeof(char*), strcmpwrap);\n            telmts = realloc(telmts, sizeof(char**)*(elmt_ctr+1));\n        }\n    }\n\n    //pack and ship\n    apop_data *out = apop_text_alloc(NULL, elmt_ctr, 1);\n    for (int j=0; j< elmt_ctr; j++)\n        apop_text_set(out, j, 0, telmts[j]);\n    free(telmts);\n    return out;\n}\n\nstatic char *apop_get_factor_basename(apop_data *d, int col, char type){\n    char *name;\n    char *catname =   d->names == NULL ? NULL\n                    : type == 't' && d->names && d->names->textct > col ? d->names->text[col]\n                    : col == -1 && d->names && d->names->vector         ? d->names->vector\n                    : col >=0 && d->names && d->names->colct > col      ? d->names->col[col]\n                    : NULL;\n    if (catname){\n        Asprintf(&name, \"%s\", catname);\n        return name;\n    }\n    if (type == 't'){\n        Asprintf(&name, \"text column %i\", col);\n        return name;\n    }\n    if (col == -1)  Asprintf(&name, \"vector\");\n    else            Asprintf(&name, \"column %i\", col);\n    return name;\n}\n\nstatic char *make_catname (apop_data *d, int col, char type){\n    char *name, *subname = apop_get_factor_basename(d, col, type);\n    Asprintf(&name, \"<categories for %s>\", subname);\n    free(subname);\n    return name;\n}\n\n/** Factor names are stored in an auxiliary table with a name like \n<tt>\"<categories for your_var>\"</tt>. Producing this name is annoying (and prevents us from eventually making it human-language independent), so use this function to get the list of factor names.\n\n\\param data The data set. (No default, must not be \\c NULL)\n\\param col The column in the main data set whose name I'll use to check for the factor name list. Vector==-1. (default=0)\n\\param type If you are referring to a text column, use 't'. (default='d')\n\n\\return A pointer to the page in the data set with the given factor names.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data *apop_data_get_factor_names(apop_data *data, int col, char type){\n    apop_data *apop_varad_var(data, NULL)\n    Apop_stopif(!data, return NULL, 1, \"You sent me a NULL data set. Returning NULL.\");\n    int apop_varad_var(col, 0)\n    char apop_varad_var(type, 'd')\nAPOP_VAR_ENDHEAD\n    char *name = make_catname (data, col, type);\n    apop_data *out = apop_data_get_page(data, name, .match='e');\n    free(name);\n    return out;\n}\n\napop_data * create_factor_list(apop_data *d, int col, char type){\n    //first, create an ordered list of unique elements.\n    //Record that list for use in this function, and in a ->more page of the data set.\n    char *catname =  make_catname(d, col, type);\n    apop_data *factor_list;\n    if (type == 't'){\n        factor_list = apop_data_add_page(d, apop_text_unique_elements(d, col), catname);\n        size_t elmt_ctr = factor_list->textsize[0];\n        //awkward format conversion:\n        factor_list->vector = gsl_vector_alloc(elmt_ctr);\n        for (size_t i=0; i< factor_list->vector->size; i++)\n            apop_data_set(factor_list, i, -1, i);\n    } else {\n        gsl_vector *delmts = apop_vector_unique_elements(Apop_cv(d, col));\n        factor_list = apop_data_add_page(d, apop_data_alloc(), catname);\n        factor_list->vector = delmts;\n        apop_text_alloc(factor_list, delmts->size, 1);\n        for (size_t i=0; i< factor_list->vector->size; i++){\n            //shift to the text, for conformity with the more common text version.\n            apop_text_set(factor_list, i, 0, \"%g\", gsl_vector_get(delmts, i));\n        }\n    }\n    free(catname);\n    return factor_list;\n}\n\n/* Producing dummies consists of finding the index of element i, for all i, then\n setting (i, index) to one.\n Producing factors consists of finding the index and then setting (i, datacol) to index.\n Otherwise the work is basically identical.\n Also, add a ->more page to the input data giving the translation.\n */\nstatic apop_data * dummies_and_factors_core(apop_data *d, int col, char type,\n                            int keep_first, int datacol, char dummyfactor,\n                            apop_data **factor_list){\n    if (!(*factor_list=apop_data_get_factor_names(d, col, type)))\n        *factor_list = create_factor_list(d, col, type);\n    Get_vmsizes((*factor_list)); //maxsize\n    size_t elmt_ctr = maxsize;\n\n    //copy the strings to a single list-of-strings\n    apop_data *telmts = *(*factor_list)->textsize ? apop_data_transpose(*factor_list, .inplace='n'):NULL;\n    gsl_vector *delmts = (*factor_list)->vector;\n\n    //Now go through the input vector, and for row i find the posn of the vector's\n    //name in the element list created above (j), then change (i,j) in\n    //the dummy matrix to one.\n    int s = type == 't' \n            ? d->textsize[0]\n            : (col >=0 ? d->matrix->size1 : d->vector->size);\n    apop_data *out = (dummyfactor == 'd')\n                ? apop_data_calloc(0, s, (keep_first!='n' ? elmt_ctr : elmt_ctr-1))\n                : d;\n    size_t index;\n    for (size_t i=0; i< s; i++){\n        if (type == 'd'){\n            double val = apop_data_get(d, i, col);\n            size_t posn = (size_t)bsearch(&val, delmts->data, elmt_ctr, sizeof(double), compare_doubles);\n            if (posn )\n                index = (posn - (size_t)delmts->data)/sizeof(double);\n            else {\n                index = elmt_ctr++;\n                (*factor_list)->vector = apop_vector_realloc((*factor_list)->vector, elmt_ctr);\n                gsl_vector_set((*factor_list)->vector, index, val);\n                out->matrix = apop_matrix_realloc(out->matrix, out->matrix->size1, elmt_ctr);\n                gsl_vector_set_zero(Apop_cv(out, index));\n            }\n        } else {\n            size_t posn = (size_t)bsearch(&(d->text[i][col]), *telmts->text, elmt_ctr, sizeof(char**), strcmpwrap);\n            if (posn)\n                index = (posn - (size_t)*telmts->text)/sizeof(char**);\n            else {\n                index = elmt_ctr++;\n                *factor_list = apop_text_alloc(*factor_list, elmt_ctr, 1);\n                apop_text_set(*factor_list, index, 0, d->text[i][col]);\n                (*factor_list)->vector = apop_vector_realloc((*factor_list)->vector, elmt_ctr);\n                apop_data_set(*factor_list, index, -1, index);\n\n                telmts = apop_text_alloc(telmts, 1, elmt_ctr);\n                apop_text_set(telmts, 0, index, d->text[i][col]);\n                if (dummyfactor == 'd'){\n                    out->matrix = apop_matrix_realloc(out->matrix, out->matrix->size1, out->matrix->size2+1);\n                    gsl_vector_set_zero(Apop_cv(out, out->matrix->size2-1));\n                }\n            }\n        }\n        if (dummyfactor == 'd'){\n            if (keep_first!='n')\n                gsl_matrix_set(out->matrix, i, index,1); \n            else if (index > 0)   //else don't keep first and index==0; throw it out. \n                gsl_matrix_set(out->matrix, i, index-1, 1); \n        } else\n            apop_data_set(out, i, datacol, index); \n    }\n    //Add names:\n    if (dummyfactor == 'd'){\n        char *basename = apop_get_factor_basename(d, col, type);\n        for (size_t i = (keep_first!='n') ? 0 : 1; i< elmt_ctr; i++){\n            char n[1000];\n            if (type =='d'){\n                sprintf(n, \"%s dummy %g\", basename, gsl_vector_get(delmts,i));\n            } else\n                sprintf(n, \"%s\", telmts->text[0][i]);\n            apop_name_add(out->names, n, 'c');\n        }\n    }\n    apop_data_free(telmts);\n    return out;\n}\n\n/** A utility to make a matrix of dummy variables. You give me a single\nvector that lists the category number for each item, and I'll produce\na matrix with a single one in each row in the column specified.\n\nAfter that, you have to decide what to do with the new matrix and the original data column. \n\n\\li You can manually join the dummy data set with your main data, e.g.:\n\\code\napop_data *dummies  = apop_data_to_dummies(main_regression_vars, .col=8, .type='t');\napop_data_stack(main_regression_vars, dummies, 'c', .inplace='y');\n\\endcode\n\n\\li The <tt>.remove='y'</tt> option specifies that I should use \\ref apop_data_rm_columns \nto remove the column used to generate the dummies. Implemented only for <tt>type=='d'</tt>.\n\n\\li By specifying <tt>.append='y'</tt> or <tt>.append='e'</tt> I will run the above two lines for you. Your \\ref apop_data pointer will not change, but its \\c matrix element will be reallocated (via \\ref apop_data_stack).\n\n\\li By specifying <tt>.append='i'</tt>, I will place the matrix of dummies in place,\nimmediately after the data column you had specified. You will probably use this with\n<tt>.remove='y'</tt> to replace the single column with the new set of dummy columns.\nBear in mind that if there are two or more dummy columns, adding columns will change subsequent column numbers; use \\ref apop_name_find to find columns instead of giving an explicit column number.\n\n\\li If <tt>.append='i'</tt> and you asked for a text column, I will append to the end of\nthe table, which is equivalent to <tt>append='e'</tt>.\n\n\\param  d The data set with the column to be dummified (No default.)\n\\param col The column number to be transformed; -1==vector (default = 0)\n\\param type 'd'==data column, 't'==text column. (default = 't')\n\\param  keep_first  If \\c 'n', return a matrix where each row has a one in the (column specified <em>minus\n    one</em>). That is, the zeroth category is dropped, the first category\n    has an entry in column zero, et cetera. If you don't know why this\n    is useful, then this is what you need. If you know what you're doing\n    and need something special, set this to \\c 'y' and the first category won't be dropped. (default = \\c 'n')\n\\param append If \\c 'e' or \\c 'y', append the dummy grid to the end of the original data\nmatrix. If \\c 'i', insert in place, immediately after the original data column. (default = \\c 'n')\n\\param remove If \\c 'y', remove the original data or text column. (default = \\c 'n')\n\n\\return An \\ref apop_data set whose \\c matrix element is the one-zero\nmatrix of dummies. If you used <tt>.append</tt>, then this is the main matrix.\nAlso, I add a page named <tt>\"\\<categories for your_var\\>\"</tt> giving a reference table of names and column numbers (where <tt>your_var</tt> is the appropriate column heading).\n\\exception out->error=='a' allocation error\n\\exception out->error=='d' dimension error\n\n\\li Use \\ref apop_data_get_factor_names to get the list of category names.\n\\li NaNs (if any) appear at the end of the sort order.\n\\li See \\ref fact for further discussion.\n\\li See the documentation for \\ref apop_logit for a sample linear model using this function.\n\\li This function uses the \\ref designated syntax for inputs.\n\n\\see \\ref apop_data_to_factors\n*/\nAPOP_VAR_HEAD apop_data * apop_data_to_dummies(apop_data *d, int col, char type, int keep_first, char append, char remove){\n    apop_data *apop_varad_var(d, NULL)\n    Apop_stopif(!d, return NULL, 1, \"You sent me a NULL data set for apop_data_to_dummies. Returning NULL.\");\n    int apop_varad_var(col, 0)\n    char apop_varad_var(type, 't')\n    int apop_varad_var(keep_first, 'n')\n    char apop_varad_var(append, 'n')\n    char apop_varad_var(remove, 'n')\n    if (remove =='y' && type == 't') Apop_notify(1, \"Remove isn't implemented for text source columns yet.\");\nAPOP_VAR_ENDHEAD\n    if (type == 'd'){\n        Apop_stopif((col == -1) && d->vector, apop_return_data_error(d),\n                                0, \"You asked for the vector element \"\n                                \"(col==-1) but the data's vector element is NULL.\");\n        Apop_stopif((col != -1) && (col >= d->matrix->size2), apop_return_data_error(d),\n                                0, \"You asked for the matrix element %i \"\n                                \"but the data's matrix element has only %zu columns.\", col, d->matrix->size2);\n    } else Apop_stopif(col >= d->textsize[1], apop_return_data_error(d),\n                                0, \"You asked for the text element %i but \"\n                                    \"the data's text element has only %zu elements.\", col, d->textsize[1]);\n    apop_data *fdummy;\n    apop_data *dummies= dummies_and_factors_core(d, col, type, keep_first, 0, 'd', &fdummy);\n    //Now process the append and remove options.\n    size_t orig_size = d->matrix ? d->matrix->size1 : 0;\n    int rm_list[orig_size+1];\n    memset (rm_list, 0, (orig_size+1)*sizeof(int)); \n    if (append =='i'){\n        apop_data **split = apop_data_split(d, col+1, 'c');\n        //stack names, then matrices\n        for (int i=0; i < d->names->colct; i++)\n            free(d->names->col[i]);\n        apop_name_stack(d->names, split[0]->names, 'c');\n        for (int k = d->names->colct; k < (split[0]->matrix ? split[0]->matrix->size2 : 0); k++)\n            apop_name_add(d->names, \"\", 'c'); //pad so the name stacking is aligned (if needed)\n        apop_name_stack(d->names, dummies->names, 'c');\n        apop_name_stack(d->names, split[1]->names, 'c');\n        gsl_matrix_free(d->matrix);\n        d->matrix = apop_matrix_stack(split[0]->matrix, dummies->matrix, 'c');\n        apop_data_free(dummies);\n        apop_data_free(split[0]);\n        apop_matrix_stack(d->matrix, split[1]->matrix, 'c', .inplace='y');\n        apop_data_free(split[1]);\n        return d;\n    }\n    if (remove!='n' && type!='t'){\n        rm_list[col]=1;\n        apop_data_rm_columns(d, rm_list);\n    }\n    if (append =='y' || append == 'e' || append ==1 || (append=='i' && type=='t')){\n        d = apop_data_stack(d, dummies, 'c', .inplace='y');\n        apop_data_free(dummies);\n        return d;\n    }\n    return dummies;\n}\n\n/** Convert a column of text or numbers\n  into a column of numeric factors, which you can use for a multinomial probit/logit, for example.\n\n  If you don't run this on your data first, \\ref apop_probit and \\ref apop_logit default to running \n  it on the vector or (if no vector) zeroth column of the matrix of the input \\ref apop_data set, because those models need a list of the unique values of the dependent variable.\n\n\\param data The data set to be modified in place. (No default. If \\c NULL, returns \\c NULL and a warning)\n\\param intype If \\c 't', then \\c incol refers to text, if \\c 'd', refers to the vector or matrix. (default = \\c 't')\n\\param incol The column in the text that will be converted. -1 is the vector. (default = 0)\n\\param outcol The column in the data set where the numeric factors will be written (-1 means the vector). (default = 0)\n\nFor example:\n\\code\napop_data *d  = apop_query_to_mixed_data(\"mmt\", \"select 0, year, color from data\");\napop_data_to_factors(d);\n\\endcode\nNotice that the query pulled a column of zeros for the sake of saving room for the factors. It reads column zero of the text, and writes it to column zero of the matrix.\n\nAnother example:\n\\code\napop_data *d  = apop_query_to_data(\"mmt\", \"select type, year from data\");\napop_data_to_factors(d, .intype='d', .incol=0, .outcol=0);\n\\endcode\nHere, the \\c type column is converted to sequential integer factors and\nthose factors overwrite the original data. Since a reference table is\nadded as a second page of the \\ref apop_data set, you can recover the\noriginal values as needed.\n\n\\return A table of the factors used in the code. This is an \\c apop_data set with only one column of text.\nAlso, I add a page named <tt>\"<categories for your_var>\"</tt> giving a reference table of names and column numbers (where <tt>your_var</tt> is the appropriate column heading) use \\ref apop_data_get_factor_names to retrieve that table.\n\n\\exception out->error=='a' allocation error.\n\\exception out->error=='d' dimension error.\n\n\\li If the vector or matrix you wanted to write to is \\c NULL, I will allocate it for you.\n\\li See \\ref fact for further discussion.\n\\li See the documentation for \\ref apop_logit for a sample linear model using this function.\n\\li This function uses the \\ref designated syntax for inputs.\n\n\\see \\ref apop_data_to_dummies\n*/\nAPOP_VAR_HEAD apop_data *apop_data_to_factors(apop_data *data, char intype, int incol, int outcol){\n    apop_data *apop_varad_var(data, NULL)\n    Apop_stopif(!data, return NULL, 1, \"You sent me a NULL data set. Returning NULL.\");\n    int apop_varad_var(incol, 0)\n    int apop_varad_var(outcol, 0)\n    char apop_varad_var(intype, 't')\nAPOP_VAR_ENDHEAD\n    if (intype=='t'){\n        Apop_stopif(incol >= data->textsize[1], apop_return_data_error(d),\n                        0, \"You asked for the text column %i but the \"\n                           \"data's text has only %zu elements.\", incol, data->textsize[1]);\n    } else {\n        Apop_stopif((incol == -1) && !data->vector, apop_return_data_error(d),\n                0, \"You asked for the vector of the data set but there is none.\");\n        Apop_stopif((incol != -1) && !data->matrix, apop_return_data_error(d),\n                0, \"You asked for the matrix column %i but the matrix is NULL.\", incol);\n        Apop_stopif((incol != -1) && (incol >= data->matrix->size2), apop_return_data_error(d),\n                0, \"You asked for the matrix column %i but \"\n                            \"the matrix has only %zu elements.\", incol, data->matrix->size2);\n    }\n    if (!data->vector && outcol == -1) //allocate a vector for the user.\n        data->vector = gsl_vector_alloc(intype=='t' ? data->textsize[0] : data->matrix->size2);\n    if (!data->matrix && outcol >= 0) //allocate a matrxi for the user.\n        data->matrix = gsl_matrix_calloc(intype=='t' ? data->textsize[0] : data->matrix->size2, outcol+1);\n    apop_data *out;\n    dummies_and_factors_core(data, incol, intype, 1, outcol, 'f', &out);\n    return out;\n}\n\n\n/** Deprecated. Use \\ref apop_data_to_factors.\n  \n  Convert a column of text in the text portion of an \\c apop_data set\n  into a column of numeric elements, which you can use for a multinomial probit, for example.\n\n\\param d The data set to be modified in place.\n\\param datacol The column in the data set where the numeric factors will be written (-1 means the vector, which I will allocate for you if it is \\c NULL)\n\\param textcol The column in the text that will be converted.\n\nFor example:\n\\code\napop_data *d  = apop_query_to_mixed_data(\"mmt\", \"select 1, year, color from data\");\napop_text_to_factors(d, 0, 0);\n\\endcode\nNotice that the query pulled a column of ones for the sake of saving room for the factors.\n\n\\return A table of the factors used in the code. This is an \\c apop_data set with only one column of text.\nAlso, the <tt>more</tt> element is a reference table of names and column numbers.\n\n\\exception out->error=='d'  dimension error.\n*/\napop_data *apop_text_to_factors(apop_data *d, size_t textcol, int datacol){\n    Apop_stopif(textcol >= d->textsize[1],  apop_return_data_error(d),\n                0, \"You asked for the text element %i but the data's \"\n                   \"text has only %zu elements.\", datacol, d->textsize[1]);\n    if (!d->vector && datacol == -1) //allocate a vector for the user.\n        d->vector = gsl_vector_alloc(d->textsize[0]);\n    apop_data *out;\n    dummies_and_factors_core(d, textcol, 't', 1, datacol, 'f', &out);\n    return out;\n}\n\n/** Also known as \\f$R^2\\f$. Let \\f$Y\\f$ be the dependent variable,\n\\f$\\epsilon\\f$ the residual, \\f$n\\f$ the number of data points, and \\f$k\\f$ the number\nof independent vars (including the constant). Returns an \\ref apop_data set with the\nfollowing entries (in the vector element):\n\n\\li  \\f$ SST \\equiv \\sum (Y_i - \\bar Y) ^2 \\f$\n\\li  \\f$ SSE \\equiv \\sum \\epsilon ^2       \\f$\n\\li  \\f$ R^2 \\equiv 1 - {SSE\\over SST}     \\f$\n\\li  \\f$ R^2_{adj} \\equiv R^2 - {(k-1)\\over (n-k-1)}(1-R^2)     \\f$\n\nInternally allocates (and frees) a vector the size of your data set.\n\n\\return A \\f$5 \\times 1\\f$ apop_data table with the following fields:\n\\li \"R squared\"\n\\li \"R squared adj\"\n\\li \"SSE\"\n\\li \"SST\"\n\\li \"SSR\"\n\nIf the output is in \\c sss, use <code>apop_data_get(sss, .rowname=\"SSE\")</code> to get the SSE, and so on for the other items.\n\n\\param m    A model. I use the pointer to the data set used for estimation and the info page named \\c \"<Predicted>\". \nThe Predicted page should include observed, expected, and residual columns, which I use to\ngenerate the sums of squared errors and residuals, et cetera. All generalized linear\nmodels produce a page with this name and of this form, as do a host of other models. Nothing \nkeeps you from finding the \\f$R^2\\f$ of, say, a kernel smooth; it is up to you to determine \nwhether such a thing is appropriate to your given models and situation.\n\n\\li <tt>apop_estimate(yourdata, apop_ols)</tt> does this automatically\n\\li If I don't find a <tt>\"<Predicted>\"</tt> page, print an error (iff <tt>apop_opts.verbose >=0</tt>) and return \\c NULL.\n\\li The number of observations equals the number of rows in the Predicted page\n\\li The number of independent variables, needed only for the adjusted \\f$R^2\\f$, is from the\nnumber of columns in the main data set's matrix (i.e. the first page; i.e. the set of\nparameters if this is the \\c parameters output from a model estimation). \n\\li If your data (first page again) has a \\c weights vector, I will find weighted SSE,\nSST, and SSR (and calculate the \\f$R^2\\f$s using those values).\n  */\napop_data *apop_estimate_coefficient_of_determination (apop_model *m){\n  double          sse, sst, rsq, adjustment;\n  size_t          indep_ct= m->data->matrix->size2 - 1;\n  apop_data       *out    = apop_data_alloc();\n    gsl_vector *weights = m->data->weights; //typically NULL.\n    apop_data *expected = apop_data_get_page(m->info, \"<Predicted>\");\n    Apop_stopif(!expected, return NULL, 0, \"I couldn't find a \\\"<Predicted>\\\" page in your data set. Returning NULL.\\n\");\n    size_t obs = expected->matrix->size1;\n    Apop_col_tv(expected, \"residual\", v)\n    if (!weights)\n        gsl_blas_ddot(v, v, &sse);\n    else {\n        gsl_vector *v_times_w = apop_vector_copy(weights);\n        gsl_vector_mul(v_times_w, v);\n        gsl_blas_ddot(v_times_w, v, &sse);\n        gsl_vector_free(v_times_w);\n    }\n    gsl_vector *vv = Apop_cv(expected, 0);\n    sst = apop_vector_var(vv, m->data->weights) * (vv->size-1);\n    rsq = 1. - (sse/sst);\n    adjustment  = ((obs -1.) /(obs - indep_ct)) * (1.-rsq) ;\n    apop_data_add_named_elmt(out, \"R squared\", rsq);\n    apop_data_add_named_elmt(out, \"R squared adj\", 1 - adjustment);\n    apop_data_add_named_elmt(out, \"SSE\", sse);\n    apop_data_add_named_elmt(out, \"SST\", sst);\n    apop_data_add_named_elmt(out, \"SSR\", sst - sse);\n    return out;\n}\n\n/**  \\def apop_estimate_r_squared(in) \n A synonym for \\ref apop_estimate_coefficient_of_determination, q.v. \n \\hideinitializer\n */\n"
  },
  {
    "path": "apop_settings.c",
    "content": "/** \\file \n         Specifying model characteristics and details of estimation methods. */\n/* Copyright (c) 2008--2009, 2011, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n#include \"apop_internal.h\"\n\nstatic size_t get_settings_ct(apop_model *model){\n    int ct =0;\n    if (!model->settings) return 0;\n    while (model->settings[ct].name[0] !='\\0') ct++;\n    return ct;\n}\n\n//The Dan J Bernstein string hashing algorithm.\n//Could conceivably save a lot of time under certain settings-heavy circumstances.\nstatic unsigned long apop_settings_hash(char *str){\n    unsigned long int hash = 5381;\n    char c;\n    while ((c = *str++)) hash = hash*33 + c;\n    return hash;\n}\n\n/* Remove a settings group from a model.\n\nUse \\ref Apop_settings_rm_group. That macro uses this function internally.\n*/\nvoid apop_settings_remove_group(apop_model *m, char *delme){\n    if (!m->settings) return;\n    int i = 0;\n    int ct = get_settings_ct(m);\n    unsigned long delme_hash = apop_settings_hash(delme);\n \n    while (m->settings[i].name[0] !='\\0'){\n        if (m->settings[i].name_hash == delme_hash){\n            ((void (*)(void*))m->settings[i].free)(m->settings[i].setting_group);\n            for (int j=i+1; j< ct+1; j++) //don't forget the null sentinel.\n                m->settings[j-1] = m->settings[j];\n            i--;\n        }\n        i++;\n    }\n   // apop_assert_void(0, 1, 'c', \"I couldn't find %s in the input model, so nothing was removed.\", delme);\n}\n\n/* Don't use this function. It's what the \\c Apop_model_add_group macro uses internally. Use that.  */\nvoid *apop_settings_group_alloc(apop_model *model, char *type, void *free_fn, void *copy_fn, void *the_group){\n    if(apop_settings_get_grp(model, type, 'c'))  \n        apop_settings_remove_group(model, type); \n    int ct = get_settings_ct(model);\n    model->settings = realloc(model->settings, sizeof(apop_settings_type)*(ct+2));   \n    model->settings[ct] = (apop_settings_type) {\n                            .setting_group = the_group,\n                            .name_hash = apop_settings_hash(type),\n                            .free= free_fn, .copy = copy_fn };\n    strncpy(model->settings[ct].name, type, 100);\n    model->settings[ct+1] = (apop_settings_type) { };\n    return model->settings[ct].setting_group;\n}\n\n//need this for the apop_settings_model_group_alloc macro.\napop_model *apop_settings_group_alloc_wm(apop_model *model, char *type, void *free_fn, void *copy_fn, void *the_group){\n    apop_settings_group_alloc(model, type, free_fn, copy_fn, the_group);\n    return model;\n}\n\n/* This function is used internally by the macro \\ref Apop_settings_get_group. Use that.  */\nvoid * apop_settings_get_grp(apop_model *m, char *type, char fail){\n    //Used only for finding the non-blank groups.\n    Apop_stopif(!m, return NULL, 0, \"you gave me a NULL model as input.\");\n    if (!m->settings) return NULL;\n    int i;\n    unsigned long type_hash = apop_settings_hash(type);\n    for (i=0; m->settings[i].name[0] !='\\0'; i++)\n       if (type_hash == m->settings[i].name_hash)\n           return m->settings[i].setting_group;\n    Apop_assert(fail != 'f', \"I couldn't find the settings group %s in the given model.\", type);\n    return NULL; //else, just return NULL and let the caller sort it out.\n}\n\n/** Copy a settings group with the given name from the second model to\nthe first (i.e., the arguments are in memcpy order). \n\nYou probably won't need this often---just use \\ref apop_model_copy.\n\n\\param outm The model that will receive a copy of the settings group.\n\\param inm The model that will provide the original.\n\\param copyme The string naming the group. For example, for an \\ref apop_mcmc_settings group, this would be \\c \"apop_mcmc\".\n\n\\exception outm->error=='s'  Error copying settings group.\n*/\nvoid apop_settings_copy_group(apop_model *outm, apop_model *inm, char *copyme){\n    if (!copyme || !strlen(copyme)) return; //apop_settings_group_alloc takes care of the blank sentinel.\n    Apop_stopif(!inm, if (outm) outm->error = 's'; return, 0, \"you asked me to copy the settings of a NULL model.\");\n    Apop_stopif(!inm->settings, return, 0, \"The input model (i.e., the second argument to this function) has no settings.\");\n    void *g =  apop_settings_get_grp(inm, copyme, 'c');\n    Apop_stopif(!g, outm->error='s'; return, 0, \"Couldn't find the group you wanted me to copy. Not copying anything; setting outmodel->error='s'.\");\n    int i;\n    unsigned long type_hash = apop_settings_hash(copyme);\n    for (i=0; inm->settings[i].name[0] !='\\0'; i++)//retrieve the index.\n       if (type_hash == inm->settings[i].name_hash)\n           break;\n    void *gnew = (inm->settings[i].copy) \n                    ? ((void *(*)(void*))inm->settings[i].copy)(g)\n                    : g;\n    apop_settings_group_alloc(outm, copyme, inm->settings[i].free, inm->settings[i].copy, gnew);\n}\n"
  },
  {
    "path": "apop_sort.m4.c",
    "content": "/** \\file apop_sort.c\nCopyright (c) 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <stdbool.h>\n\nstatic double find_smallest_larger_than(apop_data const *sort_order, double *x){\n    //the next column in the sort order is the one that is not NAN, greater than x, but smaller than all other candidate values.\n    double candidate_col=-100, candidate_val = INFINITY, v;\n    if (sort_order->vector && !isnan(v=gsl_vector_get(sort_order->vector, 0)) \n            && v > *x && v < candidate_val){\n        candidate_val = v;\n        candidate_col = -1;\n    }\n    if (sort_order->matrix)\n        for (int i=0; i< sort_order->matrix->size2; i++)\n            if (!isnan(v=gsl_matrix_get(sort_order->matrix, 0, i)) \n                    && v > *x && v < candidate_val){\n                candidate_val = v;\n                candidate_col = i;\n            }\n    if (*sort_order->textsize)\n        for (int i=0; i< sort_order->textsize[1]; i++)\n            if (apop_opts.nan_string && strcmp(sort_order->text[0][i], apop_opts.nan_string)\n                    && (v=atof(sort_order->text[0][i])) > *x && v < candidate_val){\n                candidate_val = v;\n                candidate_col = i+0.5;\n            }\n    if (sort_order->weights && !isnan(v=gsl_vector_get(sort_order->weights, 0)) \n            && v > *x && v < candidate_val){\n        candidate_val = v;\n        candidate_col = -2;\n    }\n    if (sort_order->names->rowct)\n        if (apop_opts.nan_string && strcmp(*sort_order->names->row, apop_opts.nan_string)\n                && (v=atof(*sort_order->names->row)) > *x && v < candidate_val){\n            candidate_val = v;\n            candidate_col = 0.2;\n        }\n\n    *x = candidate_val;\n    return candidate_col;\n}\n\nstatic void generate_sort_order(apop_data const *data, apop_data const *sort_order, int cols_to_sort_ct, double *so){\n/* the internal rule is that the vector is -1, the weights vector is -2, the names are\n * 0.2, and the text cols are the column+0.5. How's that for arbitrary. */\n    if (sort_order) {\n        double x = -INFINITY;\n        for (int i=0; i< cols_to_sort_ct-1; i++)\n            so[i] = find_smallest_larger_than(sort_order, &x);\n    } else {\n        int ctr=0;\n        if (data->vector) so[ctr++] = -1;\n        if (data->matrix) for(int i=0; i<data->matrix->size2; i++) so[ctr++] = i;\n        if (*data->textsize) for(int i=0; i< data->textsize[1]; i++) so[ctr++] = i+0.5;\n        if (data->weights) so[ctr++] = -2;\n        if (data->names->rowct) so[ctr++] = 0.2;\n    }\n    so[cols_to_sort_ct-1] = -100;\n}\n\n#include <gsl/gsl_sort_vector.h>\n\nstatic int find_min_unsorted(size_t *sorted, size_t height, size_t min){\n    while (min<height)\n        if (!sorted[min]) return min;\n        else              min++;\n    return -1;\n}\n\nstatic threadlocal apop_data *d;  //stdlib qsort doesn't have a hook where we can put these.\nstatic threadlocal int offset;\n\nstatic int compare_strings(const void *a, const void *b) {\n    const size_t *da = (const size_t *) a;\n    const size_t *db = (const size_t *) b;\n    return offset==-1\n        ? strcasecmp(d->names->row[*da], d->names->row[*db])\n        : strcasecmp(d->text[*da][offset], d->text[*db][offset]);\n}\n\nstatic void rearrange(apop_data *data, size_t height, size_t *perm){\n    size_t i, start=0;\n    size_t *sorted = calloc(height, sizeof(size_t));\n    while (1){\n        i     =\n        start = find_min_unsorted(sorted, height, start);\n        if (i==-1) break;\n        apop_data *first_row_storage = apop_data_copy(Apop_r(data, start));\n        sorted[start]++;\n        while (perm[i]!=start){\n            //copy from perm[i] to i\n            apop_data_memcpy(Apop_r(data,i), Apop_r(data, perm[i]));\n            sorted[perm[i]]++;\n            i = perm[i];\n        }\n        apop_data_memcpy(Apop_r(data, i), first_row_storage);\n        apop_data_free(first_row_storage);\n    }\n    free(sorted);\n}\n\n/** Sort an \\ref apop_data set on an arbitrary sequence of columns. \n\nThe \\c sort_order set is a one-row data set that should look like the data set being\nsorted. The easiest way to generate it is to use \\ref Apop_r to pull one row of the\ntable, then copy and fill it. For each column you want used in the sort, assign a ranking giving whether the column should be sorted first, second, .... Columns you don't want used in the sorting should be set to \\c NAN. Ties are broken by the earlier element in the default order (see below).\n\nE.g., to sort by the last column of a five-column matrix first, then the next-to-last column, then the next-to-next-to-last, then by the first text column, then by the second text column:\n\n\\code\napop_data *sort_order = apop_data_copy(Apop_r(data, 0));\nsort_order->vector = NULL; //so it will be skipped.\nApop_data_fill(sort_order, NAN, NAN, 3, 2, 1);\napop_text_set(sort_order, 0, 0, \"4\");\napop_text_set(sort_order, 0, 1, \"5\");\napop_data_sort(data, sort_order);\n\\endcode\n\nTo determine which columns are sorted at which step, I use only comparisons, not the actual numeric values. For example, (1, 2, 3) and (-1.32, 0, 27) work identically. For text, I use \\c atof to convert the your text to a number, as in the example above that set text values of \\c \"4\" and \\c \"5\". A blank string, NaN numeric value, or NULL element in the \\ref apop_data set means that column will not be sorted.\n\n\\li Strings are sorted case-insensitively, using \\c strcasecmp. [exercise for the reader: modify the source to use Glib's locale-correct string sorting.]\n\n\\li The setup generates a lexicographic sort using the columns you specify. If you would like a different sort order, such as Euclidian distance to the origin, you can generate a new column expressing your preferred metric, and then sorting on that. See the example below.\n\n\\param data The data set to be sorted. If \\c NULL, this function is a no-op that returns \\c NULL.\n\\param sort_order An \\ref apop_data set describing the order in which columns are used for sorting, as above. If \\c NULL, then sort by the vector, then each matrix column, then text, then weights, then row names.\n\\param inplace If 'n', make a copy, else sort in place. (default: 'y').\n\\param asc If 'a', ascending; if 'd', descending. This is applied to all columns; column-by-column application is to do. (default: 'a').\n\\param col_order For internal use only. In your call, it should be \\c NULL; you can leave this off your function call entirely and the \\ref designated syntax will takes care of it for you.\n\n\\return A pointer to the sorted data set. If <tt>inplace=='y'</tt> (the default), then this is the same as the input set.\n\n\nA few examples:\n\n\\include \"sort_example.c\"\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_data *apop_data_sort(apop_data *data, apop_data *sort_order, char asc, char inplace, double *col_order){\n    apop_data * apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NULL, 1, \"You gave me NULL data to sort. Returning NULL\");\n    apop_data * apop_varad_var(sort_order, NULL);\n    char apop_varad_var(inplace, 'y');\n    char apop_varad_var(asc, 'a');\n    double * apop_varad_var(col_order, NULL);\nAPOP_VAR_ENDHEAD\n    if (!data) return NULL;\n\n    apop_data *out = inplace=='n' ? apop_data_copy(data) : data;\n\n    apop_data *xx = sort_order ? sort_order : out;\n    Get_vmsizes(xx); //firstcol, msize2\n    int cols_to_sort_ct = msize2 - firstcol +1 + !!(xx->weights) + xx->textsize[1] + !!xx->names->rowct;\n    double so[cols_to_sort_ct];\n    if (!col_order){\n        generate_sort_order(out, sort_order, cols_to_sort_ct, so);\n        col_order = so;\n    }\n\n    bool is_text = ((int)*col_order != *col_order);\n    bool is_name = (*col_order == 0.2);\n\n    gsl_vector_view c;\n    gsl_vector *cc = NULL;\n    if (!is_text && *col_order>=0){\n        c = gsl_matrix_column(out->matrix, *col_order);\n        cc = &c.vector;\n    }\n    gsl_vector *thiscol =   cc               ? cc\n                          : (*col_order==-2) ? out->weights\n                          : (*col_order==-1) ? out->vector\n                                             : NULL;\n\n    size_t height = thiscol   ? thiscol->size\n                    : is_name ? out->names->rowct\n                              : *out->textsize;\n\n    gsl_permutation *p = gsl_permutation_alloc(height);\n    if (!is_text) gsl_sort_vector_index (p, thiscol);\n    else {\n        gsl_permutation_init(p);\n        d = out;\n        offset = is_name ? -1 : *col_order-0.5;        \n        qsort(p->data, height, sizeof(size_t), compare_strings);\n    }\n\n    size_t *perm = p->data;\n    if (asc=='d' || asc=='D') //reverse the perm matrix.\n        for (size_t j=0; j< height/2; j++){\n            double t         = perm[j];\n            perm[j]          = perm[height-1-j];\n            perm[height-1-j] = t;\n        }\n    rearrange(out, height, perm);\n    gsl_permutation_free(p);\n    if (col_order[1] == -100) return out;\n\n    /*Second pass:\n    find blocks where all are of the same value.\n    After you pass a block of size > 1 row where all vals in this col are identical,\n    sort that block, using the rest of the sort order. */\n    int bottom=0;\n    if (!is_text){\n        double last_val = gsl_vector_get(thiscol, 0);\n        for (int i=1; i< height+1; i++){\n            double this_val=0;\n            if ((i==height || (this_val=gsl_vector_get(thiscol, i)) != last_val) \n                    && bottom != i-1){\n                apop_data_sort_base(Apop_rs(out, bottom, i-bottom), sort_order, 'a', 'y', col_order+1);\n            }\n            if (last_val != this_val) bottom = i;\n            last_val = this_val;\n        }\n    } else {\n        char *last_val =  strdup(is_name ? out->names->row[0] : out->text[0][(int)(*col_order-0.5)]);\n        for (int i=1; i< height+1; i++){\n            char *this_val = i==height ? NULL : is_name ? out->names->row[i] : out->text[i][(int)(*col_order-0.5)];\n            if ((i==height || strcasecmp(this_val, last_val)) \n                    && bottom != i-1){\n                apop_data_sort_base(Apop_rs(out, bottom, i-bottom), sort_order, 'a', 'y', col_order+1);\n            }\n            if (this_val && strcmp(last_val, this_val)) bottom = i;\n            free(last_val);\n            last_val = this_val ? strdup(this_val) : NULL;\n        }\n        free(last_val);\n    }\n    return out;\n}\n"
  },
  {
    "path": "apop_stats.m4.c",
    "content": "/** \\file apop_stats.c\tBasic moments and some distributions. */\n/* Copyright (c) 2006--2007, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <gsl/gsl_rng.h>\n#include <gsl/gsl_eigen.h>\n\n#define Check_vw    \\\n    Apop_stopif(!v, return GSL_NAN, 0, \"data vector is NULL. Returning NaN.\\n\");            \\\n    Apop_stopif(!v->size, return GSL_NAN, 0, \"data vector has size 0. Returning NaN.\\n\");   \\\n    Apop_stopif(weights && weights->size != v->size, return GSL_NAN, 0, \"data vector has size %zu; weighting vector has size %zu. Returning NaN.\\n\", v->size, weights->size);\n\n/** Returns the sum of the data in the given vector.\n*/\nlong double apop_vector_sum(const gsl_vector *in){\n    Apop_stopif(!in, return 0, 1, \"You just asked me to sum a NULL. Returning zero.\");\n    long double out = 0;\n    for (size_t i=0; i< in->size; i++)\n        out += gsl_vector_get(in, i);\n\treturn out; \n}\n\n/** \\def apop_sum(in)\n  An alias for \\ref apop_vector_sum. Returns the sum of the data in the given vector.\n*/\n\n/** \\def apop_mean(v)\n Returns the mean of the elements of the vector \\c v.\n\n\\param v A \\ref gsl_vector.\n*/\n\n/** \\def apop_var(in)\nAn alias for \\ref apop_vector_var.\nReturns the variance of the data in the given vector.\n*/\n\n/** Returns an unbiased estimate of the sample skew of the data in the given vector.\n*/\ndouble apop_vector_skew(const gsl_vector *in){\n\treturn apop_vector_skew_pop(in) * gsl_pow_2(in->size)/((in->size -1.)*(in->size -2.)); }\n\n/** Returns the sample fourth central moment of the data in the given\nvector. Corrections are made to produce an unbiased result as per <a\nhref=\"http://modelingwithdata.org/pdfs/moments.pdf\">Appendix M</a> (PDF) of <em>Modeling\nwith data</em>.\n\n\\li This is an estimate of the fourth central moment without normalization. The kurtosis\n    of a \\f${\\cal N}(0,1)\\f$ is \\f$3 \\sigma^4\\f$, not three, one, or zero.\n\\see \\ref apop_vector_kurtosis_pop\n*/\ndouble apop_vector_kurtosis(const gsl_vector *in){\n    size_t n = in->size;\n    long double coeff0= n*n/(gsl_pow_3(n-1)*(gsl_pow_2(n)-3*n+3));\n    long double coeff1= n*gsl_pow_2(n-1)+ (6*n-9);\n    long double coeff2= n*(6*n-9);\n    return  coeff0 *(coeff1 * apop_vector_kurtosis_pop(in) - coeff2 * gsl_pow_2(apop_vector_var(in)*(n-1.)/n));\n}\n\nstatic double wskewkurt(const gsl_vector *v, const gsl_vector *w, const int exponent, const char *fn_name){\n    long double wsum = 0, sumcu = 0, vv, ww, mu;\n    //Using the E(x - \\bar x)^3 form, which is lazy.\n    mu  = apop_vector_mean(v, w);\n    for (size_t i=0; i< w->size; i++){\n        vv    = gsl_vector_get(v,i);\n        ww    = gsl_vector_get(w,i);\n        sumcu+= ww * gsl_pow_int(vv - mu, exponent); \n        wsum += ww; \n    }\n    double len = wsum < 1.1 ? w->size : wsum;\n    return sumcu/len;\n}\n\n/** Returns the population skew \\f$(\\sum_i (x_i - \\mu)^3/n))\\f$ of the data in the given vector. Observations may be weighted.\n\n\\param v       The data vector\n\\param weights The weight vector. Default: equal weights for all observations.\n\\return        The weighted skew.\n \n\\li  Some people like to normalize the skew by dividing by (variance)\\f$^{3/2}\\f$; that's not done here, so you'll have to do so separately if need be.\n\n\\li Apophenia tries to be smart about reading the weights. If weights\nsum to one, then the system uses \\c w->size as the number of elements,\nand returns the usual sum over \\f$n-1\\f$. If weights > 1, then the\nsystem uses the total weights as \\f$n\\f$. Thus, you can use the weights\nas standard weightings or to represent elements that appear repeatedly.\n*/\nAPOP_VAR_HEAD double apop_vector_skew_pop(gsl_vector const *v, gsl_vector const *weights){\n    gsl_vector const * apop_varad_var(v, NULL);\n    gsl_vector const * apop_varad_var(weights, NULL);\n    Check_vw\nAPOP_VAR_ENDHEAD\n    if (weights) return wskewkurt(v, weights, 3, \"apop_vector_weighted_skew\");\n\n    //This code is cut/pasted/modified from the GSL. \n    //I reimplement the skew calculation here without the division by var^3/2 that the GSL does. \n    size_t n = v->size;\n    long double avg = 0;\n    long double mean = apop_vector_mean(v);\n    for (size_t i = 0; i < n; i++) {\n        const long double x = gsl_vector_get(v, i) - mean; \n        avg += (gsl_pow_3(x) - avg)/(i + 1);\n    } \n    return avg;\n}\n\n/** Returns the population fourth central moment [\\f$\\sum_i (x_i - \\mu)^4/n)\\f$] of the data in\nthe given vector, with an optional weighting.\n\n\\param v The data vector\n\\param weights The weight vector. If NULL, assume equal weights.\n\\return The weighted kurtosis.\n \n  \\li Some people like to normalize the fourth central moment by dividing by variance\nsquared, or by subtracting three; those things are not done here, so you'll have to\ndo them separately if desired.\n  \\li This function uses the \\ref designated syntax for inputs.\n\\see \\ref apop_vector_kurtosis for the unbiased sample version.\n*/\nAPOP_VAR_HEAD double apop_vector_kurtosis_pop(gsl_vector const *v, gsl_vector const *weights){\n    gsl_vector const * apop_varad_var(v, NULL);\n    gsl_vector const * apop_varad_var(weights, NULL);\n    Check_vw\nAPOP_VAR_ENDHEAD\n    if (weights) return wskewkurt(v, weights, 4, \"apop_vector_weighted_kurtosis\");\n\n    //This code is cut/pasted/modified from the GSL. \n    //I reimplement the kurtosis calculation here without the division by var^2 that the GSL does. \n    size_t n = v->size;\n    long double avg  = 0;\n    long double mean = apop_vector_mean(v);\n    for (size_t i = 0; i < n; i++) {\n        const long double x = gsl_vector_get(v, i) - mean; \n        avg += (gsl_pow_4(x) - avg)/(i + 1);\n    } \n    return avg;\n}\n\n/** Returns the variance of the data in the given vector, given that you've already calculated the mean.\n\\param in\tthe vector in question\n\\param mean\tthe mean, which you've already calculated using \\ref apop_vector_mean.\n\\see apop_vector_var\n*/\ndouble apop_vector_var_m(const gsl_vector *in, const double mean){\n\treturn gsl_stats_variance_m(in->data,in->stride, in->size, mean); }\n\n/** Returns the correlation coefficient of two vectors:\n\\f$ {\\hbox{cov}(a,b)\\over \\sqrt{\\hbox{var}(a)} \\sqrt{\\hbox{var}(b)}}.\\f$\n\nAn example\n\\code \ngsl_matrix *m = apop_text_to_data(\"indata\")->matrix;\nprintf(\"The correlation coefficient between rows two \"\n       \"and three is %g.\\n\", apop_vector_correlation(Apop_mrv(m, 2), Apop_mrv(m, 3)));\n\\endcode \n\n\\param ina, inb Two vectors of equal length (no default, must not be NULL)\n\\param weights Replicate weights for the observations. (default: equal weights for all observations)\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD double apop_vector_correlation(const gsl_vector *ina, const gsl_vector *inb, const gsl_vector *weights){\n    gsl_vector const * apop_varad_var(ina, NULL);\n    gsl_vector const * apop_varad_var(inb, NULL);\n    gsl_vector const * apop_varad_var(weights, NULL);\nAPOP_VAR_ENDHEAD\n\treturn apop_vector_cov(ina, inb, weights) \n            / sqrt(apop_vector_var(ina, weights) * apop_vector_var(inb, weights)); }\n\n\n/** Returns the distance between two vectors, where distance is defined\n based on the third (optional) parameter:\n\n - 'e'  (the default): scalar distance (standard Euclidean metric) between two vectors. \\f$\\sqrt{\\sum_i{(a_i - b_i)^2}},\\f$\nwhere \\f$i\\f$ iterates over dimensions.\n - 'm'   Returns the Manhattan metric distance  between two vectors: \\f$\\sum_i{|a_i - b_i|},\\f$\nwhere \\f$i\\f$ iterates over dimensions.\n - 'd'  The discrete norm: if \\f$a = b\\f$, return zero, else return one.\n - 's'  The sup norm: find the dimension where \\f$|a_i - b_i|\\f$ is largest, return the distance along that one dimension.\n - 'l' or 'L' The \\f$L_p\\f$ norm, \\f$\\left(\\sum_i{|a_i - b_i|^2}\\right)^{1/p}\\f$. The value of \\f$p\\f$ is set by the fourth (optional) argument.\n\n \\param ina First vector (No default, must not be \\c NULL)\n \\param inb Second vector (Default = zero)\n \\param metric The type of metric, as above.\n \\param norm  If you are using an \\f$L_p\\f$ norm, this is \\f$p\\f$. Must be strictly greater than zero. (default = 2)\n\n\\li  The defaults are such that\n \\code\n apop_vector_distance(v);\n apop_vector_distance(v, .metric = 's');\n apop_vector_distance(v, .metric = 'm');\n \\endcode\ngives you the standard Euclidean length of \\c v, its longest element, and its sum.\n\\li This function uses the \\ref designated syntax for inputs.\n\n\\include test_distances.c\n*/\nAPOP_VAR_HEAD double apop_vector_distance(const gsl_vector *ina, const gsl_vector *inb, const char metric, const double norm){\n    static threadlocal gsl_vector *zero = NULL;\n    const gsl_vector * apop_varad_var(ina, NULL);\n    Apop_stopif(!ina, return NAN, 1, \"The first vector is NULL. Returning NAN\");\n    const gsl_vector * apop_varad_var(inb, NULL);\n    if (!inb){\n        if (!zero || zero->size !=ina->size){\n            if (zero) gsl_vector_free(zero);\n            zero = gsl_vector_calloc(ina->size);\n        }\n        inb = zero;\n    }\n    const char apop_varad_var(metric, 'e');\n    const double apop_varad_var(norm, 2);\nAPOP_VAR_ENDHEAD\n    Apop_stopif(ina->size != inb->size, return GSL_NAN, 0, \"I need equal-sized vectors, but \"\n                \"you sent a vector of size %zu and a vector of size %zu. Returning NaN.\", ina->size, inb->size);\n    double dist = 0;\n    if (metric == 'e' || metric == 'E'){\n        for (size_t i=0; i< ina->size; i++)\n            dist += gsl_pow_2(gsl_vector_get(ina, i) - gsl_vector_get(inb, i));\n        return sqrt(dist); \n    }\n    if (metric == 'm' || metric == 'M'){ //redundant with vector_grid_distance, below.\n        for (size_t i=0; i< ina->size; i++) \n            dist += fabs(gsl_vector_get(ina, i) - gsl_vector_get(inb, i));\n        return dist; \n    }\n    if (metric == 'd' || metric == 'D'){\n        for (size_t i=0; i< ina->size; i++) \n            if (gsl_vector_get(ina, i) != gsl_vector_get(inb, i))\n                return 1;\n        return 0;\n    }\n    if (metric == 's' || metric == 'S'){\n        for (size_t i=0; i< ina->size; i++) \n            dist = GSL_MAX(dist, fabs(gsl_vector_get(ina, i) - gsl_vector_get(inb, i)));\n        return dist;\n    }\n    if (metric == 'l' || metric == 'L'){\n        for (size_t i=0; i< ina->size; i++)\n            dist += pow(fabs(gsl_vector_get(ina, i) - gsl_vector_get(inb, i)), norm);\n        return pow(dist, 1./norm); \n    }\n  Apop_stopif(1, return NAN, 1, \"I couldn't find the metric type you gave, %c, in my list of supported types. Returning NaN\", metric);\n}\n\n/** This function will normalize a vector, either such that it has mean\nzero and variance one, or ranges between zero and one, or sums to one.\n\n\\param in\tA \\c gsl_vector with the un-normalized data. \\c NULL\ninput gives \\c NULL output. (No default)\n\n\\param out \tIf normalizing in place, \\c NULL.\nIf not, the address of a <tt>gsl_vector*</tt>. Do not allocate. (default = \\c NULL.)\n\n\\param normalization_type \n\\c 'p': normalized vector will sum to one. E.g., start with a set of observations in bins, end with the percentage of observations in each bin. (the default)<br>\n\\c 'r': normalized vector will range between zero and one. Replace each X with (X-min) / (max - min).<br>\n\\c 's': normalized vector will have mean zero and (sample) variance one. Replace\neach X with \\f$(X-\\mu) / \\sigma\\f$, where \\f$\\sigma\\f$ is the sample\nstandard deviation.<br>\n\\c 'm': normalize to mean zero: Replace each X with \\f$(X-\\mu)\\f$<br>\n\n\\b Example \n\\code\n\\endcode\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD void apop_vector_normalize(gsl_vector *in, gsl_vector **out, const char normalization_type){\n    gsl_vector * apop_varad_var(in, NULL);\n    Apop_stopif(!in, return, 1, \"Input vector is NULL. Doing nothing.\");\n    gsl_vector ** apop_varad_var(out, NULL);\n    const char apop_varad_var(normalization_type, 'p');\nAPOP_VAR_END_HEAD\n    double mu, min, max;\n\tif (!out) out = &in;\n\telse {\n\t\t*out = gsl_vector_alloc (in->size);\n\t\tgsl_vector_memcpy(*out, in);\n\t}\n\tif (normalization_type == 's'){\n\t\tmu = apop_vector_mean(in);\n        Apop_stopif(!isfinite(mu), return, 0, \"normalization failed: the mean of the vector is not finite.\");\n\t\tgsl_vector_add_constant(*out, -mu);\n        double scaling = 1./(sqrt(apop_vector_var_m(*out, 0)));\n        Apop_stopif(!isfinite(scaling), return, 0, \"normalization failed: 1/(std error)  of the vector is not finite.\");\n\t\tgsl_vector_scale(*out, scaling);\n\t} \n\telse if (normalization_type == 'r'){\n        gsl_vector_minmax(in, &min, &max);\n\t\tgsl_vector_add_constant(*out, -min);\n\t\tgsl_vector_scale(*out, 1/(max-min));\t\n\n\t}\n\telse if (normalization_type == 'p'){\n\t\tlong double sum\t= apop_sum(in);\n        Apop_stopif(!sum, return, 0, \"the vector sums to zero, so I can't normalize it to sum to one.\");\n\t\tgsl_vector_scale(*out, 1/sum);\t\n\t}\n\telse if (normalization_type == 'm'){\n\t\tmu = apop_vector_mean(in);\n        Apop_stopif(!isfinite(mu), return, 0, \"normalization failed: the mean of the vector is not finite.\");\n\t\tgsl_vector_add_constant(*out, -mu);\n\t}\n}\n\n/** Returns the sum of the elements of a matrix. Occasionally convenient.\n\n\\param m\tthe matrix to be summed. \n*/\nlong double apop_matrix_sum(const gsl_matrix *m){\n    Apop_stopif(!m, return 0, 1, \"You just asked me to sum a NULL. Returning zero.\");\n    long double\tsum\t= 0;\n\tfor (size_t j=0; j< m->size1; j++)\n\t\tfor (size_t i=0; i< m->size2; i++)\n\t\t\tsum += gsl_matrix_get(m, j, i);\n\treturn sum;\n}\n\n/** Returns the mean of all elements of a matrix.\n\n\\param data\tThe matrix to be averaged. If \\c NULL, return zero.\n\\return The mean of all cells of the matrix.\n*/\ndouble apop_matrix_mean(const gsl_matrix *data){\n    if (!data) return 0;\n    long double avg = 0;\n    int cnt = 0;\n    for(size_t i=0; i < data->size1; i++)\n        for(size_t j=0; j < data->size2; j++){\n            double x = gsl_matrix_get(data, i,j);\n            long double ratio = cnt/(cnt+1.0);\n            cnt++;\n            avg*= ratio;\n            avg+= x/cnt;\n        }\n\treturn avg;\n}\n\n/** Returns the mean and population variance of all elements of a matrix.\n \n\\li If \\c NULL, return \\f$\\mu=0, \\sigma^2=NaN\\f$.\n\\li Gives the population variance (sum of squares divided by \\f$N\\f$).  \nIf you want sample variance, multiply the result by \\f$N/(N-1)\\f$:\n\\code\ndouble mu, var;\napop_data *data= apop_query_to_data(\"select * from indata\");\napop_matrix_mean_and_var(data->matrix, &mu, &var);\nvar *= (data->size1*data->size2)/(data->size1*data->size2-1.0);\n\\endcode\n\n\\param data\tthe matrix to be averaged. \n\\param\tmean\twhere to put the mean to be calculated.\n\\param\tvar\twhere to put the variance to be calculated.\n*/\nvoid apop_matrix_mean_and_var(const gsl_matrix *data, double *mean, double *var){\n    if (!data) {*mean=0; *var=GSL_NAN; return;}\n    long double avg     = 0,\n                avg2    = 0;\n    size_t cnt= 0;\n    long double x, ratio;\n    for(size_t i=0; i < data->size1; i++)\n        for(size_t j=0; j < data->size2; j++){\n            x     = gsl_matrix_get(data, i,j);\n            ratio = cnt/(cnt+1.0);\n            cnt   ++;\n            avg   *= ratio;\n            avg2  *= ratio;\n            avg   += x/(cnt +0.0);\n            avg2  += gsl_pow_2(x)/(cnt +0.0);\n        }\n\t*mean = avg;\n    *var  = avg2 - gsl_pow_2(avg); //E[x^2] - E^2[x]\n}\n\n/** Put summary information about the columns of a table (mean, std dev, variance, min, median, max) in a table.\n\n\\param indata The table to be summarized. An \\ref apop_data structure. May have a <tt>weights</tt> element.\n\\return     An \\ref apop_data structure with one row for each column in the original\n            table, and a column for each summary statistic.\n\\exception out->error='a'  Allocation error.\n\n\\li This function gives more columns than you probably want; use \\ref apop_data_prune_columns to pick the ones you want to see.\n\n\\li See apop_data_prune_columns for an example.\n*/\napop_data * apop_data_summarize(apop_data *indata){\n    Apop_stopif(!indata, return NULL, 0, \"You sent me a NULL apop_data set. Returning NULL.\");\n    Apop_stopif(!indata->matrix, return NULL, 0, \"You sent me an apop_data set with a NULL matrix. Returning NULL.\");\n    apop_data *out = apop_data_alloc(indata->matrix->size2, 6);\n    double mean, var;\n    char rowname[10000]; //crashes on more than 10^9995 columns.\n\tapop_name_add(out->names, \"mean\", 'c');\n\tapop_name_add(out->names, \"std dev\", 'c');\n\tapop_name_add(out->names, \"variance\", 'c');\n\tapop_name_add(out->names, \"min\", 'c');\n\tapop_name_add(out->names, \"median\", 'c');\n\tapop_name_add(out->names, \"max\", 'c');\n\tif (indata->names !=NULL){\n        apop_name_stack(out->names,indata->names, 'r', 'c');\n        if (indata->names->title && strlen(indata->names->title)){\n            char *title;\n            Asprintf(&title, \"summary for %s\", indata->names->title);\n            apop_name_add(out->names, title, 'h');\n            free(title);\n        }\n    }\n\telse\n\t\tfor (size_t i=0; i< indata->matrix->size2; i++){\n\t\t\tsprintf(rowname, \"col %zu\", i);\n\t\t\tapop_name_add(out->names, rowname, 'r');\n\t\t}\n\tfor (size_t i=0; i< indata->matrix->size2; i++){\n        gsl_vector *v = Apop_cv(indata, i);\n        if (!indata->weights){\n            mean = apop_vector_mean(v);\n            var  = apop_vector_var_m(v, mean);\n        } else {\n            mean = apop_vector_mean(v, indata->weights);\n            var  = apop_vector_var(v, indata->weights);\n        } \n        double *pctiles = apop_vector_percentiles(v);\n\t\tgsl_matrix_set(out->matrix, i, 0, mean);\n\t\tgsl_matrix_set(out->matrix, i, 1, sqrt(var));\n\t\tgsl_matrix_set(out->matrix, i, 2, var);\n\t\tgsl_matrix_set(out->matrix, i, 3, pctiles[0]);\n\t\tgsl_matrix_set(out->matrix, i, 4, pctiles[50]);\n\t\tgsl_matrix_set(out->matrix, i, 5, pctiles[100]);\n        free(pctiles);\n\t}\n\treturn out;\n}\n\n/** Returns an array of size 101, where \\c returned_vector[95] gives the value of the\n95th percentile, for example. \\c Returned_vector[100] is always the maximum value,\nand \\c returned_vector[0] is always the min (regardless of rounding rule).\n\n  \\param data\tA \\c gsl_vector with the data. (No default, must not be \\c NULL.)\n  \\param rounding Either be \\c 'u', \\c 'd', or \\c 'a'. Unless your data is\nexactly a multiple of 101, some percentiles will be ambiguous. If \\c 'u', then round\nup (use the next highest value); if \\c 'd', round down to the next lowest value; if \\c\n'a', take the mean of the two nearest points.  (Default = \\c 'd'.)\n\n\\li If the rounding method is \\c 'u' or \\c 'a', then you can say \"5% or more  of\nthe sample is below returned_vector[5]\"; if \\c 'd' or \\c 'a', then you can say \"5%\nor more of the sample is above returned_vector[5]\".\n\\li You may eventually want to \\c free() the array returned by this function.\n\\li This function uses the \\ref designated syntax for inputs.\n*/ \nAPOP_VAR_HEAD double * apop_vector_percentiles(gsl_vector *data, char rounding){\n    gsl_vector *apop_varad_var(data, NULL);\n    Apop_stopif(!data, return NULL, 0, \"You gave me NULL data.\");\n    char apop_varad_var(rounding, 'd');\nAPOP_VAR_ENDHEAD\n    gsl_vector *sorted\t= gsl_vector_alloc(data->size);\n    double     *pctiles = malloc(sizeof(double) * 101);\n\tgsl_vector_memcpy(sorted,data);\n\tgsl_sort_vector(sorted);\n\tfor(int i=0; i<101; i++){\n\t\tint index = i*(data->size-1)/100.0;\n\t\tif (rounding == 'u' && index != i*(data->size-1)/100.0)\n\t\t\tindex ++; //index was rounded down, but should be rounded up.\n\t\tif (rounding == 'a' && index != i*(data->size-1)/100.0)\n            pctiles[i]\t= (gsl_vector_get(sorted, index)+gsl_vector_get(sorted, index+1))/2.;\n        else pctiles[i]\t= gsl_vector_get(sorted, index);\n\t}\n\tgsl_vector_free(sorted);\n\treturn pctiles;\n}\n\n/** Find the mean, weighted or unweighted. \n\n\\param v        The data vector\n\\param weights  The weight vector. Default: assume equal weights.\n\\return         The weighted mean \n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD double apop_vector_mean(gsl_vector const *v, gsl_vector const *weights){\n    gsl_vector const * apop_varad_var(v, NULL);\n    gsl_vector const * apop_varad_var(weights, NULL);\n    Check_vw\nAPOP_VAR_END_HEAD\n    if (!weights) return gsl_stats_mean(v->data, v->stride, v->size);\n    long double sum = 0, wsum = 0;\n    for (size_t i=0; i< weights->size; i++){\n        sum  += gsl_vector_get(weights, i) * gsl_vector_get(v,i); \n        wsum += gsl_vector_get(weights, i); \n    }\n    return sum/wsum;\n}\n\n/** Find the sample variance of a vector, weighted or unweighted.\n\n\\param v       The data vector\n\\param weights The weight vector. If NULL (the default), assume equal weights.\n\\return        The weighted sample variance.  \n\n  \\li This uses (n-1) in the denominator of the sum; i.e., it corrects for the bias\nintroduced by using \\f$\\bar x\\f$ instead of \\f$\\mu\\f$.\n  \\li  Multiply the output by (n-1)/n if you need population variance.\n  \\li Apophenia tries to be smart about reading the weights. If weights\nsum to one, then the system uses \\c w->size as the number of elements,\nand returns the usual sum over \\f$n-1\\f$. If weights > 1, then the\nsystem uses the total weights as \\f$n\\f$. Thus, you can use the weights\nas standard weightings or to represent elements that appear repeatedly.\n  \\li This function uses the \\ref designated syntax for inputs.\n\\see apop_vector_var_m for the case where you already have the vector's mean.\n*/\nAPOP_VAR_HEAD double apop_vector_var(gsl_vector const *v, gsl_vector const *weights){\n    gsl_vector const * apop_varad_var(v, NULL);\n    gsl_vector const * apop_varad_var(weights, NULL);\n    Check_vw\nAPOP_VAR_END_HEAD\n    if (!weights) return gsl_stats_variance(v->data, v->stride, v->size);\n    //Using the E(x^2) - E^2(x) form.\n    long double sum = 0, wsum = 0, sumsq = 0, vv, ww;\n    for (size_t i=0; i< weights->size; i++){\n        vv = gsl_vector_get(v, i);\n        ww = gsl_vector_get(weights, i);\n        sum   += ww * vv;\n        sumsq += ww * gsl_pow_2(vv); \n        wsum  += ww; \n    }\n    double len = (wsum < 1.1 ? weights->size : wsum);\n    return (sumsq/len - gsl_pow_2(sum/len)) * len/(len -1.);\n}\n\n/** Find the sample covariance of a pair of vectors, with an optional weighting. This only\nmakes sense if the weightings are identical, so the function takes only one weighting vector for both.\n\n\\param  v1, v2  The data vectors (no default; must not be \\c NULL)\n\\param  weights The weight vector. (default equal weights for all elements)\n\\return The sample covariance\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD double apop_vector_cov(const gsl_vector *v1, const gsl_vector *v2, const gsl_vector *weights){\n    gsl_vector const * apop_varad_var(v1, NULL);\n    gsl_vector const * apop_varad_var(v2, NULL);\n    gsl_vector const * apop_varad_var(weights, NULL);\n    Apop_stopif(!v1, return GSL_NAN, 0, \"first data vector is NULL. Returning NaN.\");\n    Apop_stopif(!v2, return GSL_NAN, 0, \"second data vector is NULL. Returning NaN.\");\n    Apop_stopif(!v1->size, return GSL_NAN, 0, \"first data vector has size 0. Returning NaN.\");\n    Apop_stopif(!v2->size, return GSL_NAN, 0, \"second data vector has size 0. Returning NaN.\");\n    Apop_stopif(v1->size!= v2->size, return GSL_NAN, 0, \"data vectors have sizes %zu and %zu. Returning NaN.\", v1->size, v2->size);\n    Apop_stopif(weights && ((weights->size != v1->size) || (weights->size != v2->size)), return GSL_NAN, 0, \"data vectors have sizes %zu and %zu; weighting vector has size %zu. Returning NaN.\", v1->size, v2->size, weights->size);\n\nAPOP_VAR_ENDHEAD\n    if (!weights) return gsl_stats_covariance(v1->data, v1->stride, v2->data, v2->stride, v2->size);\n    long double sum1 = 0, sum2 = 0, wsum = 0, sumsq = 0, vv1, vv2, ww;\n    //Using the E(x^2) - E^2(x) form.\n    for (size_t i=0; i< weights->size; i++){\n        vv1   = gsl_vector_get(v1,i);\n        vv2   = gsl_vector_get(v2,i);\n        ww    = gsl_vector_get(weights,i);\n        sum1 += ww * vv1;\n        sum2 += ww * vv2;\n        sumsq+= ww * vv1 * vv2;\n        wsum += ww; \n    }\n    double len = (wsum < 1.1 ? weights->size : wsum);\n    return (sumsq/len  - sum1*sum2/gsl_pow_2(len)) *(len/(len-1));\n}\n\n/** Returns the sample variance/covariance matrix relating each column of the matrix to each other column.\n\n\\param in An \\ref apop_data set. If the weights vector is set, I'll take it into account.\n\n\\li This is the sample covariance---dividing by \\f$n-1\\f$, not \\f$n\\f$. If you need the population variance, use \n\\code\napop_data *popcov = apop_data_covariance(indata);\nint size=indata->matrix->size1;\ngsl_matrix_scale(popcov->matrix, size/(size-1.));\n\\endcode\n\n\\return Returns an \\ref apop_data set the variance/covariance matrix.  \n\\exception out->error='a'  Allocation error.\n*/\napop_data *apop_data_covariance(const apop_data *in){\n    Apop_stopif(!in, return NULL, 1, \"You sent me a NULL apop_data set. Returning NULL.\");\n    Apop_stopif(!in->matrix, return NULL, 1, \"You sent me an apop_data set with a NULL matrix. Returning NULL.\");\n    apop_data *out = apop_data_alloc(in->matrix->size2, in->matrix->size2);\n    Apop_stopif(out->error, return out, 0, \"allocation error.\");\n    for (size_t i=0; i < in->matrix->size2; i++){\n        for (size_t j=i; j < in->matrix->size2; j++){\n            double var = apop_vector_cov(Apop_cv(in, i), Apop_cv(in, j), in->weights);\n            gsl_matrix_set(out->matrix, i,j, var);\n            if (i!=j) gsl_matrix_set(out->matrix, j,i, var);\n        }\n    }\n    apop_name_stack(out->names, in->names, 'c');\n    apop_name_stack(out->names, in->names, 'r', 'c');\n    return out;\n}\n\n/** Returns the matrix of correlation coefficients \\f$(\\sigma^2_{xy}/(\\sigma_x\\sigma_y))\\f$ relating each column with each other.\n\n\\param in \tA data matrix: rows are observations, columns are variables. If you give me a weights vector, I'll use it.\n\n\\return Returns the square variance/covariance matrix with dimensions equal to the number of input columns.\n\\exception out->error='a'  Allocation error.\n*/\napop_data *apop_data_correlation(const apop_data *in){\n    apop_data *out = apop_data_covariance(in);\n    if (!out) return NULL;\n    for(size_t i=0; i< in->matrix->size2; i++){\n        double std_dev = sqrt(apop_vector_var(Apop_cv(in, i), in->weights));\n        gsl_vector_scale(Apop_cv(out, i), 1.0/std_dev);\n        gsl_vector_scale(Apop_rv(out, i), 1.0/std_dev);\n    }\n    return out;\n}\n\n\n/** Given a vector representing a probability distribution of observations, calculate the entropy, \\f$\\sum_i -\\ln(v_i)v_i\\f$.\n\n\\li You may input a vector giving frequencies (normalized to sum to one) or counts (arbitrary sum).\n\n\\li The entropy of a data set depends only on the frequency with which elements are\nobserved, not the value of the elements themselves. The \\ref apop_data_pmf_compress\nfunction will reduce an input \\ref apop_data set to one weighted line per observation, and\nthe weights would determine the entropy:\n\n\\code\napop_data *data = apop_text_to_data(\"indata\");\napop_data_pmf_compress(data);\ndata_entropy = apop_vector_entropy(d->weights);\n\\endcode\n\n\\li The entropy is calculated using natural logs. To convert to base 2, divide by \\f$\\ln(2)\\f$; see the example.\n\n\\li The entropy of an empty data set (\\c NULL or a total weight of zero) is zero. Print a warning when given \\c NULL\n    input and <tt>apop_opts.verbose >=1</tt>.\n\n\\li If the input vector has negative elements, return \\c NaN; print a warning when <tt>apop_opts.verbose >= 0</tt>.\n\nSample code:\n\\include entropy_vector.c\n*/\nlong double apop_vector_entropy(gsl_vector *in){\n    Apop_stopif(!in, return 0, 1, \"Entropy of a NULL vector ≡ 0\");\n    Apop_stopif(!in->size, return 0, 1, \"Entropy of a zero-length vector ≡ 0\");//can't happen.\n\n    //User may or may not have normalized in, so scale everything by the sum.\n    long double sum = apop_vector_sum(in);\n    Apop_stopif(sum<0, return NAN, 0, \"Vector sums to a negative value (%Lg). Returning NaN.\\n\", sum);\n    if (!sum) return 0;\n\n    long double out=0;\n    for (int i=0; i< in->size; i++){\n        double val = gsl_vector_get(in, i)/sum;\n        Apop_stopif(val<0, return NAN, 0, \"negative value (%g) in vector position %i. Returning NaN.\\n\", val, i);\n        if (!val) continue;\n        out -= logl(val)*val;\n    }\n    return out;\n}\n\nstatic long double norment(apop_model *m){\n    double sigma_sq = gsl_pow_2(apop_data_get(m->parameters, 1));\n    return (log(2*M_PI*sigma_sq) +1)/2.;\n}\n\ndouble get_ll(apop_data *d, void *m){ return apop_log_likelihood(d, m); }\n\n\n/** Calculate the entropy of a model: \\f$\\int -\\ln(p(x))p(x)dx\\f$, which is the expected\n  value of \\f$-\\ln(p(x))\\f$.\n\nThe default method is to make draws using \\ref apop_model_draws, then\nevaluate the log likelihood at those points using the model's \\c log_likelihood method.\n\nThere are a number of routines for specific models, inlcuding the \\ref apop_normal and \\ref apop_pmf models.\n\n\\li  If you want the entropy of a data set, see \\ref apop_vector_entropy.\n\\li The entropy is calculated using natural logs. If you prefer base-2 logs, just divide by \\f$\\ln(2)\\f$: <tt>apop_model_entropy(my_model)/log(2)</tt>.\n\n\\param in A parameterized \\ref apop_model. That is, you have already used \\ref apop_estimate or \\ref apop_model_set_parameters to estimate/set the model parameters.\n\\param draws If using the default method of making random draws, how many random draws to make (default=1,000)\n\nSample code:\n\\include entropy_model.c\n*/\nAPOP_VAR_HEAD long double apop_model_entropy(apop_model *in, int draws){\n    apop_model * apop_varad_var(in, NULL);\n    Apop_stopif(!in, return NAN, 0, \"NULL input model. Returning NaN.\");\n    int apop_varad_var(draws, 1000);\nAPOP_VAR_ENDHEAD\n    static int setup=0; if (!(setup++)){\n        apop_entropy_vtable_add(norment, apop_normal);\n    }\n    apop_entropy_type e_fn = apop_entropy_vtable_get(in);\n    if (e_fn) return e_fn(in);\n\n    apop_data *d = apop_model_draws(in, draws);\n    apop_data *lls = apop_map(d, .fn_rp=get_ll, .param=in);\n\n    long double out = -apop_vector_mean(lls->vector);\n    apop_data_free(d);\n    apop_data_free(lls);\n    return out;\n}\n\n/** Kullback-Leibler divergence.\n\nThis measure of the divergence of one distribution from another has the form \\f$ D(p,q)\n= \\sum_i \\ln(p_i/q_i) p_i \\f$.  Notice that it is not a distance, because there is an\nasymmetry between \\f$p\\f$ and \\f$q\\f$, so one can expect that \\f$D(p, q) \\neq D(q, p)\\f$.\n\n  \\param from the \\f$p\\f$ in the above formula. (No default; must not be \\c NULL)\n  \\param to the \\f$q\\f$ in the above formula. (No default; must not be \\c NULL)\n  \\param draw_ct If I do the calculation via random draws, how many? (Default = 1e5)\n  \\param rng    A \\c gsl_rng. If \\c NULL or number of threads is greater than 1, I'll take care of the RNG; see \\ref apop_rng_get_thread. (Default = \\c NULL)\n\nThis function can take empirical histogram-type models (\\ref apop_pmf) or continuous\nmodels like \\ref apop_loess or \\ref apop_normal.\n\nIf the \\c from distribution is a PMF (determined by checking whether its \\c p function\nis that of \\ref apop_pmf), then I'll step through it for the points in the summation.\n\n\\li If you have two empirical distributions in the form of \\ref apop_pmf, they must\nbe synced: if \\f$p_i>0\\f$ but \\f$q_i=0\\f$, then the function returns \\c GSL_NEGINF. If\n<tt>apop_opts.verbose >=1</tt> I print a message as well.\n\nIf the \\c from distribution is not a PMF, then I will take \\c draw_ct random draws\nfrom \\c from and evaluate at those points.\n\n\\li Set <tt>apop_opts.verbose = 3</tt> for observation-by-observation info.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD long double apop_kl_divergence(apop_model *from, apop_model *to, int draw_ct, gsl_rng *rng){\n    apop_model * apop_varad_var(from, NULL);\n    apop_model * apop_varad_var(to, NULL);\n    Apop_stopif(!from, return NAN, 0, \"The first model is NULL; returning NaN.\");\n    Apop_stopif(!to, return NAN, 0, \"The second model is NULL.\");\n    double apop_varad_var(draw_ct, 1e5);\n    gsl_rng * apop_varad_var(rng, NULL);\nAPOP_VAR_ENDHEAD\n    double div = 0;\n    Apop_notify(3, \"p(from)\\tp(to)\\tfrom*log(from/to)\\n\");\n    if (from->p == apop_pmf->p){\n        apop_data *p = from->data;\n        apop_pmf_settings *settings = Apop_settings_get_group(from, apop_pmf);\n        Get_vmsizes(p); //maxsize\n        OMP_for_reduce (+:div,    int i=0; i < maxsize; i++){\n            double pi = p->weights ? gsl_vector_get(p->weights, i)/settings->total_weight : 1./maxsize;\n            if (!pi){\n                Apop_notify(3, \"0\\t--\\t0\");\n                continue;\n            } //else:\n            double qi = apop_p(Apop_r(p, i), to);\n            Apop_notify(3,\"%g\\t%g\\t%g\", pi, qi, pi ? pi * log(pi/qi):0);\n            Apop_stopif(!qi, div+=GSL_NEGINF; break, 1, \"The PMFs aren't synced: from-distribution has a value where \"\n                                                \"to-distribution doesn't (which produces infinite divergence).\");\n            div += pi * log(pi/qi);\n        }\n    } else { //the version with the RNG.\n        Apop_stopif(!from->dsize, return GSL_NAN, 0, \"I need to make random draws from the 'from' model, \"\n                                                     \"but its dsize (draw size)==0. Returning NaN.\");\n        OMP_for_reduce(+:div,    int i=0; i < draw_ct; i++){\n            double draw[from->dsize];\n            apop_draw(draw, rng, from);\n            gsl_matrix_view dm = gsl_matrix_view_array(draw, 1, from->dsize);\n            double pi = apop_p(&(apop_data){.matrix=&(dm.matrix)}, from);\n            double qi = apop_p(&(apop_data){.matrix=&(dm.matrix)}, to);\n            double val = pi ? log(pi/qi): 0; //each row already has probability p_i\n            Apop_notify(3,\"%g\\t%g\\t%g\", pi, qi, val);\n            div += val;\n            Apop_stopif(!qi, break, 1, \"From-distribution has a value where \"\n                                                \"to-distribution doesn't (which produces infinite divergence).\");\n        }\n        div /= draw_ct; //div is an expected value of ln(pi/qi)\n    }\n    return div;\n}\n\n/** The multivariate generalization of the Gamma distribution.\n\\f[\n\\Gamma_p(a)=\n\\pi^{p(p-1)/4}\\prod_{j=1}^p\n\\Gamma\\left[ a+(1-j)/2\\right]. \\f]\n\nBecause \\f$\\Gamma(x)\\f$ is undefined for \\f$x\\in\\{0, -1, -2, ...\\}\\f$, this function returns \\c NAN when \\f$a+(1-j)/2\\f$ takes on one of those values.\n\nSee also \\ref apop_multivariate_lngamma, which is more numerically stable in most cases.\n*/\nlong double apop_multivariate_gamma(double a, int p){\n    Apop_stopif(-(a+(1-p)/2) == (int)-(a+(1-p)/2) && a+(1-p)/2 <=0, return NAN, 1, \"Undefined when a + (1-p)/2 = 0, -1, -2, ... [you sent a=%g, p=%i]\", a, p);\n    long double out = pow(M_PI, p*(p-1.)/4.);\n    long double factor = 1;\n    for (int i=1; i<=p; i++)\n        factor *= gsl_sf_gamma(a+(1-i)/2.);\n    return out * factor;\n}\n\n/** The log of the multivariate generalization of the Gamma; see also\n \\ref apop_multivariate_gamma.\n*/\nlong double apop_multivariate_lngamma(double a, int p){\n    Apop_stopif(-(a+(1-p)/2) == (int)-(a+(1-p)/2) && a+(1-p)/2 <=0, return NAN, 1, \"Undefined when a + (1-p)/2 = 0, -1, -2, ... [you sent a=%g, p=%i]\", a, p);\n    long double out = M_LNPI * p*(p-1.)/4.;\n    for (int i=1; i<=p; i++)\n        out += gsl_sf_lngamma(a+(1-i)/2.);\n    return out;\n}\n\nstatic void find_eigens(gsl_matrix **subject, gsl_vector *eigenvals, gsl_matrix *eigenvecs){\n    gsl_eigen_symmv_workspace * w = gsl_eigen_symmv_alloc((*subject)->size1);\n    gsl_eigen_symmv(*subject, eigenvals, eigenvecs, w);\n    gsl_eigen_symmv_free (w);\n    gsl_matrix_free(*subject); *subject  = NULL;\n}\n\nstatic void diagonal_copy(gsl_vector *v, gsl_matrix *m, char in_or_out){\n    gsl_vector_view dv = gsl_matrix_diagonal(m);\n    if (in_or_out == 'i') gsl_vector_memcpy(&(dv.vector), v);\n    else                  gsl_vector_memcpy(v, &(dv.vector));\n}\n\nstatic double diagonal_size(gsl_matrix *m){\n    gsl_vector_view dv = gsl_matrix_diagonal(m);\n    return apop_sum(&dv.vector);\n}\n\nstatic double biggest_elmt(gsl_matrix *d){ \n    return  GSL_MAX(fabs(gsl_matrix_max(d)), fabs(gsl_matrix_min(d)));\n}\n\n/** Test whether the input matrix is positive semidefinite (PSD).\n\nA covariance matrix will always be PSD, so this function can tell you whether your matrix is a valid covariance matrix.\n\nConsider the 1x1 matrix in the upper left of the input, then the 2x2 matrix in the\nupper left, on up to the full matrix. If the matrix is PSD, then each of these has\na positive determinant. This function thus calculates \\f$N\\f$ determinants for an\n\\f$N\\f$x\\f$N\\f$ matrix.\n\n\\param m The matrix to test. If \\c NULL, I will return zero---not PSD.\n\\param semi If anything but \\c 's', check for positive definite, not semidefinite. (default 's')\n\nSee also \\ref apop_matrix_to_positive_semidefinite, which will change the input to something PSD.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD int apop_matrix_is_positive_semidefinite(gsl_matrix *m, char semi){\n    gsl_matrix * apop_varad_var(m, NULL);\n    Apop_stopif(!m, return 0, 1, \"You gave me a NULL matrix. I will take this as not positive semidefinite; returning zero.\");\n    char apop_varad_var(semi, 's');\nAPOP_VAR_ENDHEAD\n    for (int i=1; i<= m->size1; i++){\n        gsl_matrix mv =gsl_matrix_submatrix (m, 0, 0, i, i).matrix;\n        double det = apop_matrix_determinant(&mv);\n        if ((semi == 'd' && det <0) || det <=0)\n            return 0;\n    }\n    return 1;\n}\n\nvoid vfabs(double *x){*x = fabs(*x);}\n\n/**  This function takes in a matrix and converts it in place to the `closest' positive semidefinite matrix.\n\n\\param m On input, any matrix; on output, a positive semidefinite matrix. If \\c NULL, return \\c NaN and print an error.\n\\return the distance between the original and new matrices.\n\n\\li See also the test function \\ref apop_matrix_is_positive_semidefinite.\n\\li This function can be used as the core of a model constraint.\n\\li Adapted from the R Matrix package's nearPD, which is \nCopyright (2007) Jens Oehlschlägel [under the GPL].\n*/\ndouble apop_matrix_to_positive_semidefinite(gsl_matrix *m){\n    Apop_stopif(!m, return NAN, 0, \"Got a NULL matrix. Returning NaN.\");\n    if (apop_matrix_is_positive_semidefinite(m)) return 0; \n    double diffsize=0, dsize;\n    apop_data *qdq; \n    gsl_matrix *d = apop_matrix_copy(m);\n    gsl_matrix *original = apop_matrix_copy(m);\n    double orig_diag_size = fabs(diagonal_size(d));\n    int size = d->size1;\n    gsl_vector *diag = gsl_vector_alloc(size);\n    diagonal_copy(diag, d, 'o');\n    apop_vector_apply(diag, vfabs);\n    double origsize = biggest_elmt(d);\n    do {\n        //get eigenvals\n        apop_data *eigenvecs = apop_data_alloc(size, size);\n        gsl_vector *eigenvals = gsl_vector_calloc(size);\n        gsl_matrix *junk_copy = apop_matrix_copy(d);\n        find_eigens(&junk_copy, eigenvals, eigenvecs->matrix);//junk freed here.\n        \n        //prune positive only\n        int j=0;\n        int plussize = eigenvecs->matrix->size1;\n        int *mask = calloc(eigenvals->size , sizeof(int));\n        for (int i=0; i< eigenvals->size; i++)\n            plussize -= \n            mask[i] = (gsl_vector_get(eigenvals, i) <= 0);\n        \n        //construct Q = pruned eigenvals\n        apop_data_rm_columns(eigenvecs, mask);\n        if (!eigenvecs->matrix) break;\n        \n        //construct D = positive eigen diagonal\n        apop_data *eigendiag = apop_data_calloc(0, plussize, plussize);\n        for (int i=0; i< eigenvals->size; i++)\n            if (!mask[i]) {\n                apop_data_set(eigendiag, j, j, eigenvals->data[i]);\n                j++;\n            }\n\n        // Our candidate is QDQ', symmetrized, with the old diagonal subbed in.\n        apop_data *qd = apop_dot(eigenvecs, eigendiag);\n        qdq = apop_dot(qd, eigenvecs, .form2='t');\n        for (int i=0; i< qdq->matrix->size1; i++)\n            for (int j=i+1; j< qdq->matrix->size1; j++){\n                double avg = (apop_data_get(qdq, i, j) +apop_data_get(qdq, j, i)) /2.;\n                apop_data_set(qdq, i, j, avg);\n                apop_data_set(qdq, j, i, avg);\n            }\n        diagonal_copy(diag, qdq->matrix, 'i');\n        \n        // Evaluate progress, clean up.\n        dsize = biggest_elmt(d);\n        gsl_matrix_sub(d, qdq->matrix);\n        diffsize = biggest_elmt(d);\n        apop_data_free(qd); gsl_matrix_free(d);\n        apop_data_free(eigendiag); free(mask);\n        apop_data_free(eigenvecs); gsl_vector_free(eigenvals);\n        d = qdq->matrix;\n        qdq->matrix=NULL; apop_data_free(qdq); qdq = NULL;\n    } while (diffsize/dsize > 1e-3);\n\n    apop_data *eigenvecs = apop_data_alloc(size, size);\n    gsl_vector *eigenvals = gsl_vector_calloc(size);\n    find_eigens(&d, eigenvals, eigenvecs->matrix);//free d here.\n    //make eigenvalues more positive\n    double score =0;\n    for (int i=0; i< eigenvals->size; i++){\n        double v = gsl_vector_get(eigenvals, i);\n        if (v < 1e-1){\n            gsl_vector_set(eigenvals, i, 1e-1);\n            score += 1e-1 - v;\n        }\n    }\n    for (int i=0; i< size; i++)\n        assert(eigenvals->data[i] >=0);\n    //if (score){\n        apop_data *eigendiag = apop_data_calloc(0, size, size);\n        diagonal_copy(eigenvals, eigendiag->matrix, 'i');\n        double new_diag_size = diagonal_size(eigendiag->matrix);\n        gsl_matrix_scale(eigendiag->matrix, orig_diag_size/new_diag_size);\n        apop_data *qd = apop_dot(eigenvecs, eigendiag);\n        qdq = apop_dot(qd, eigenvecs, .form2='t');\n        gsl_matrix_memcpy(m, qdq->matrix);\n        apop_data_free(qd);\n        apop_data_free(eigendiag);\n    //}\n    assert(apop_matrix_is_positive_semidefinite(m));\n    apop_data_free(qdq); gsl_vector_free(diag);\n    apop_data_free(eigenvecs); gsl_vector_free(eigenvals);\n    gsl_matrix_sub(original, m);\n    return biggest_elmt(original)/origsize;\n}\n"
  },
  {
    "path": "apop_tests.m4.c",
    "content": "/** \\file apop_tests.c\t */\n/* Copyright (c) 2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n */\n#include \"apop_internal.h\"\n\nstatic apop_data * produce_t_test_output(int df, double stat, double diff){\n    double pval, qval, two_tail;\n    if (!gsl_isnan(stat)){\n        pval    = gsl_cdf_tdist_P(stat, df);\n        qval    = 1-pval;\n        two_tail= 2*GSL_MIN(fabs(pval-.5),fabs(qval-0.5));\n    } else {\n        pval    = GSL_NAN;\n        qval    = GSL_NAN;\n        two_tail= GSL_NAN;\n    }\n    apop_data *out = apop_data_alloc();\n    apop_data_add_named_elmt(out, \"mean left - right\", diff);\n    apop_data_add_named_elmt(out, \"t statistic\", stat);\n    apop_data_add_named_elmt(out, \"df\", df);\n    apop_data_add_named_elmt(out, \"p value, 1 tail\", GSL_MIN(pval,qval));\n    apop_data_add_named_elmt(out, \"confidence, 1 tail\", 1 - GSL_MIN(pval,qval));\n    apop_data_add_named_elmt(out, \"p value, 2 tail\", 1- two_tail);\n    apop_data_add_named_elmt(out, \"confidence, 2 tail\", two_tail);\n    return out;\n}\n\n/** Answers the question: with what confidence can I say that the means of these two columns of data are different?\n\nIf \\c apop_opts.verbose is >=1, then display some information to stdout, like the mean/var/count for both vectors and the t statistic.\n\n\\param a one column of data\n\\param b another column of data\n\\return an \\ref apop_data set with the following elements: <br>\n    <tt>mean left - right</tt>:    the difference in means; if positive, first vector has larger mean, and one-tailed test is testing \\f$L > R\\f$, else reverse if negative.<br>\n    <tt>t statistic</tt>:    used for the test<br>\n    <tt>df</tt>:             degrees of freedom<br>\n    <tt>p value, 1 tail</tt>: the p-value for a one-tailed test that one vector mean is greater than the other.<br>\n    <tt>confidence, 1 tail</tt>: 1- p value.<br>\n    <tt>p value, 2 tail</tt>: the p-value for the two-tailed test that left mean = right mean.<br>\n    <tt>confidence, 2 tail</tt>: 1-p value\n\nExample usage:\n\\code\ngsl_vector *L = apop_query_to_vector(\"select * from data where sex='M'\");\ngsl_vector *R = apop_query_to_vector(\"select * from data where sex='F'\");\napop_data *test_out = apop_t_test(L, R);\nprintf(\"Reject the null hypothesis of no difference between M and F with %g%% confidence\\n\", apop_data_get(test_out, .rowname=\"confidence, 2 tail\"));\n\\endcode\n\n\\see \\ref apop_paired_t_test, which answers the question: with what confidence can I\nsay that the mean difference between the two columns is zero?\n*/\napop_data *\tapop_t_test(gsl_vector *a, gsl_vector *b){\n    int a_count = a->size,\n        b_count = b->size;\n    double a_avg = apop_vector_mean(a);\n    double a_var = (a_count > 1) ? apop_vector_var(a) : 0,\n           b_avg = apop_vector_mean(b),\n    b_var = b_count > 1 ? apop_vector_var(b): 0,\n    stat = (a_avg - b_avg)/ sqrt(\n                        (b_count > 1 ? b_var/(b_count-1) : 0) \n                        + (a_count > 1 ? a_var/(a_count-1) : 0) \n                        );\n    if (apop_opts.verbose >=1){\n        printf(\"1st avg: %g; 1st std dev: %g; 1st count: %i.\\n\", a_avg, sqrt(a_var), a_count);\n        printf(\"2st avg: %g; 2st std dev: %g; 2nd count: %i.\\n\", b_avg, sqrt(b_var), b_count);\n        printf(\"t-statistic: %g.\\n\", stat);\n    }\n    int df = a_count+b_count-2;\n    return produce_t_test_output(df, stat, a_avg - b_avg);\n}\n\n/** Answers the question: with what confidence can I say that the mean difference between the two columns is zero?\n\nIf <tt>apop_opts.verbose >=2</tt>, then display some information, like the mean/var/count for both vectors and the t statistic, to stderr.\n\n\\param a A column of data\n\\param b A matched column of data\n\\return an \\ref apop_data set with the following elements:\n    <tt>mean left - right</tt>:    the difference in means; if positive, first vector has larger mean, and one-tailed test is testing \\f$L > R\\f$, else reverse if negative.<br>\n    <tt>t statistic</tt>:    used for the test<br>\n    <tt>df</tt>:             degrees of freedom<br>\n    <tt>p value, 1 tail</tt>: the p-value for a one-tailed test that one vector mean is greater than the other.<br>\n    <tt>confidence, 1 tail</tt>: 1- p value.<br>\n    <tt>p value, 2 tail</tt>: the p-value for the two-tailed test that left mean = right mean.<br>\n    <tt>confidence, 2 tail</tt>: 1-p value\n\n\\see \\ref apop_t_test for an example, and for when the element-by-element difference between the vectors has no sensible interpretation.\n*/\napop_data * apop_paired_t_test(gsl_vector *a, gsl_vector *b){\n    gsl_vector *diff = gsl_vector_alloc(a->size);\n    gsl_vector_memcpy(diff, a);\n    gsl_vector_sub(diff, b);\n    int count = a->size; \n    double avg = apop_vector_mean(diff),\n    var = apop_vector_var(diff),\n    stat = avg/ sqrt(var/(count-1));\n    gsl_vector_free(diff);\n    Apop_notify(2, \"avg diff: %g; diff std dev: %g; count: %i; t-statistic: %g.\\n\", avg, sqrt(var), count, stat);\n    return produce_t_test_output(count-1, stat, avg);\n}\n\n/** Runs an F-test specified by \\c q and \\c c. See\n the chapter on hypothesis testing in  <a href=\"http://modelingwithdata.org\">Modeling With Data</a>, p 309, which will tell you that:\n \\f[{N-K\\over q}\n {({\\bf Q}'\\hat\\beta - {\\bf c})' [{\\bf Q}' ({\\bf X}'{\\bf X})^{-1} {\\bf Q}]^{-1} ({\\bf Q}' \\hat\\beta - {\\bf c})\n \\over {\\bf u}' {\\bf u} } \\sim F_{q,N-K},\\f]\n and that's what this function is based on.\n\n\\param est An \\ref apop_model that you have already calculated. (No default)\n\\param contrast  An \\ref apop_data set whose matrix represents \\f${\\bf Q}\\f$ and whose\n    vector represents \\f${\\bf c}\\f$. Each row represents a hypothesis. (Defaults:\n    if matrix is \\c NULL, it is set to the identity matrix with the top row missing. If\n    the vector is \\c NULL, it is set to a zero matrix of length equal to the height of\n    the contrast matrix. Thus, if the entire \\c apop_data set is NULL or omitted, we are\n    testing the hypothesis that all but \\f$\\beta_1\\f$ are zero.)\n\n\\return An \\c apop_data set with a few variants on the confidence with which we can reject the joint hypothesis.\n\\todo There should be a way to get OLS and GLS to store \\f$(X'X)^{-1}\\f$. In fact, if you did GLS, this is invalid, because you need \\f$(X'\\Sigma X)^{-1}\\f$, and I didn't ask for \\f$\\Sigma\\f$.\n\n\\exception out->error='a'  Allocation error.\n\\exception out->error='d'  dimension-matching error.\n\\exception out->error='i'  matrix inversion error.\n\\exception out->error='m'  GSL math error.\n\n\\li There are two approaches to an \\f$F\\f$-test: the ANOVA approach, which is typically\n    built around the claim that all effects but the mean are zero; and the more general\n    regression form, which allows for any set of linear claims about the data. If you send\n    a \\c NULL contrast set, I will generate the set of linear contrasts that are equivalent\n    to the ANOVA-type approach. This is why the top row of the default \\f${\\bf Q}\\f$\n    matrix is missing: there is no hypothesis test about the coefficient for the\n    constant term. See the example below.\n\\li This function uses the \\ref designated syntax for inputs.\n\n\\include f_test.c\n*/\nAPOP_VAR_HEAD apop_data * apop_f_test (apop_model *est, apop_data *contrast){\n    apop_model *apop_varad_var(est, NULL)\n    Nullcheck_m(est, NULL);\n    Nullcheck_d(est->data, NULL);\n    apop_data * apop_varad_var(contrast, NULL);\n    int free_data=0, free_matrix=0, free_vector=0;\n    if (!contrast) contrast = apop_data_alloc(), free_data=1;\n    if (!contrast->matrix) {\n        int size = est->parameters->vector->size;\n        contrast->matrix= gsl_matrix_calloc(size - 1, size);\n        for (int i=1; i< size; i++)\n            apop_data_set(contrast, i-1, i, 1);\n    }\n    if (!contrast->vector) contrast->vector = gsl_vector_calloc(contrast->matrix->size1), free_vector=1;\n\n    apop_data *out = apop_f_test_base(est, contrast);\n    if (free_data) {apop_data_free(contrast); return out;}\n    if (free_matrix) gsl_matrix_free(contrast->matrix);\n    if (free_vector) gsl_vector_free(contrast->vector);\n    return out;\nAPOP_VAR_ENDHEAD\n    apop_data *out = apop_data_alloc();\n    Asprintf(&out->names->title, \"F test\");\n    size_t contrast_ct = contrast->vector->size;\n    Apop_stopif(contrast->matrix->size1 != contrast_ct,  out->error='d'; return out,\n            0, \"I counted %zu contrasts by the size of either contrast->vector or \"\n            \"est->parameters->vector->size, but you gave me a matrix with %zu rows. Those should match.\"\n            , contrast_ct, contrast->matrix->size1);\n    double f_stat, pval;\n\n    Get_vmsizes(est->data); //msize1, msize2\n    int data_df = msize1 - contrast_ct;\n\n    //Find (\\underbar x)'(\\underbar x), where (\\underbar x) = the data with means removed\n    long double means[msize2];\n    for (int i=1; i< msize2; i++)\n        means[i] = apop_vector_mean(Apop_cv(est->data, i));\n    means[0]=0;// don't screw with the ones column.\n    apop_data *xpx = apop_data_alloc(msize2, msize2);\n    Apop_stopif(xpx->error, apop_data_free(xpx); out->error='a'; return out, 0, \"allocation error\");\n    for (int i=0; i< msize2; i++)\n        for (int j=0; j< msize2; j++){ //at this loop, we calculate one cell in the dot prouct\n            long double total = 0;\n            for (int c=0; c<msize1; c++)\n                total +=  (gsl_matrix_get(est->data->matrix, c, i) - means[i])\n                         *(gsl_matrix_get(est->data->matrix, c, j) - means[j]);\n            apop_data_set(xpx, i, j, total);\n        }\n\n    apop_data xpxinv = (apop_data){.matrix=apop_matrix_inverse(xpx->matrix)};\n    Apop_stopif(!xpxinv.matrix, out->error='i'; return out, 0, \"inversion of X'X error\");\n    apop_data *qprimexpxinv = apop_dot(contrast, &xpxinv, 'm', 'm');\n    apop_data *qprimexpxinvq = apop_dot(qprimexpxinv, contrast, 'm', 't');\n    Apop_stopif(qprimexpxinvq->error || qprimexpxinv->error, out->error='m'; return out, 0, \"broken dot\");\n    apop_data qprimexpxinvqinv = (apop_data){.matrix=apop_matrix_inverse(qprimexpxinvq->matrix)};\n    Apop_stopif(!qprimexpxinvqinv.matrix, out->error='i'; return out, 0, \"inversion of Q'(X'X)^{-1}Q error\");\n    apop_data_free(qprimexpxinvq);\n    apop_data_free(qprimexpxinv);\n    apop_data *qprimebeta = apop_dot(contrast, est->parameters, 'm', 'v');\n    Apop_stopif(qprimebeta->error, out->error='m'; return out, 0, \"broken dot\");\n    gsl_vector_sub(qprimebeta->vector, contrast->vector);\n    apop_data *qprimebetaminusc_qprimexpxinvqinv = apop_dot(&qprimexpxinvqinv, qprimebeta, .form2='v');\n    Apop_stopif(qprimebetaminusc_qprimexpxinvqinv->error, out->error='m'; return out, 0, \"broken dot\");\n    gsl_blas_ddot(qprimebeta->vector, qprimebetaminusc_qprimexpxinvqinv->vector, &f_stat);\n    apop_data_free(xpx);\n    apop_data_free(qprimebeta);\n    apop_data_free(qprimebetaminusc_qprimexpxinvqinv);\n\n    apop_data *r_sq_list = apop_estimate_coefficient_of_determination (est);\n    double variance = apop_data_get(r_sq_list, .rowname=\"sse\");\n    f_stat *=  data_df / (variance * contrast_ct);\n    pval    = (contrast_ct > 0 && data_df > 0) ? gsl_cdf_fdist_Q(f_stat, contrast_ct, data_df): GSL_NAN; \n\n    apop_data_add_named_elmt(out, \"F statistic\", f_stat);\n    apop_data_add_named_elmt(out, \"p value\", pval);\n    apop_data_add_named_elmt(out, \"confidence\", 1- pval);\n    apop_data_add_named_elmt(out, \"df1\", contrast_ct);\n    apop_data_add_named_elmt(out, \"df2\", data_df);\n    return out;\n}\n\nstatic double one_chi_sq(apop_data *d, int row, int col, int n){\n    double rowexp  = apop_vector_sum(Apop_rv(d, row))/n;\n    double colexp  = apop_vector_sum(Apop_cv(d, col))/n;\n    double observed = apop_data_get(d, row, col);\n    double expected = n * rowexp * colexp;\n    return gsl_pow_2(observed - expected)/expected; \n}\n\n/** Run a Chi-squared test on an ANOVA table, i.e., an NxN table with the null hypothesis that all cells are equally likely.\n\n \\param d The input data, which is a crosstab of various elements. They don't have to sum to one.\n\\return A \\ref apop_data set including elements named\n     <tt>\"chi squared statistic\"</tt>, <tt>\"df\"</tt>, and <tt>\"p value\"</tt>. Retrieve via,\n     e.g., <tt>apop_data_get(out, .rowname=\"p value\")</tt>.\n \\see apop_test_fisher_exact\n*/\napop_data * apop_test_anova_independence(apop_data *d){\n    Apop_stopif(!d || !d->matrix, return NULL, 0, \"You sent me data with no matrix element. Returning NULL.\");\n    double total = 0;\n    //You can have a one-column or one-row matrix if you want; else df = (rows-1)*(cols-1)\n    double df = d->matrix->size1==1 ? d->matrix->size2-1 : d->matrix->size2 == 1 ? d->matrix->size1 \n                              : (d->matrix->size1 - 1)* (d->matrix->size2 - 1);\n    Apop_stopif(!df, return NULL, 0, \"You sent a degenerate matrix. Returning NULL.\");\n    int n = apop_matrix_sum(d->matrix);\n    for (size_t row=0; row <d->matrix->size1; row++)\n        for (size_t col=0; col <d->matrix->size2; col++)\n            total += one_chi_sq(d, row, col, n);\n    apop_data *out = apop_data_alloc();\n    double chisq = gsl_cdf_chisq_Q(total, df);\n    apop_data_add_named_elmt(out, \"chi squared statistic\", total);\n    apop_data_add_named_elmt(out, \"df\", df);\n    apop_data_add_named_elmt(out, \"p value\", chisq);\n    return out;\n}\n\n\nstatic apop_data* apop_anova_one_way(char *table, char *data, char *grouping){\n    //ANOVA has always just been a process of filling in a form, and\n    //that's what this function does.\n    apop_data *out = apop_data_calloc(3, 6);\n    apop_name_add(out->names, \"sum of squares\", 'c');\n    apop_name_add(out->names, \"df\", 'c');\n    apop_name_add(out->names, \"mean squares\", 'c');\n    apop_name_add(out->names, \"F ratio\", 'c');\n    apop_name_add(out->names, \"p value\", 'c');\n    apop_name_add(out->names, \"confidence\", 'c');\n    apop_name_add(out->names, grouping, 'r');\n    apop_name_add(out->names, \"residual\", 'r');\n    apop_name_add(out->names, \"total\", 'r');\n \n    //total sum of squares:\n    apop_data* tss = apop_query_to_data(\"select var_pop(%s), count(*) from %s\", data, table);\n    Apop_stopif(!tss, apop_return_data_error('q'), 0, \"Query 'select var_pop(%s), count(*) from %s' returned NULL. Does that look right to you?\", data, table);\n    apop_data_set(out, 2, 0, apop_data_get(tss, 0, 0)*apop_data_get(tss, 0, 1)); //total sum of squares\n    double total_df = apop_data_get(tss, 0, 1);\n    apop_data_set(out, 2, 1, apop_data_get(tss, 0, 1)); //total df.\n\n    //within group sum of squares:\n    apop_data* wss = apop_query_to_data(\"select var_pop(%s), count(*) from %s group by %s\", data,  table, grouping);\n    double group_df = wss->matrix->size1-1;\n    apop_data_set(out, 0, 0, apop_data_get(wss, 0, 0)*group_df); //within sum of squares\n    apop_data_set(out, 0, 1, group_df);\n\n    //residuals are just total-wss\n    apop_data_set(out, 1, 0, apop_data_get(out, 2, 0) - apop_data_get(out, 0,0)); //residual sum of squares\n    double residual_df = total_df - group_df;\n    apop_data_set(out, 1, 1, residual_df); //residual df\n\n    apop_data_set(out, 0, 2, apop_data_get(out, 0, 0)/apop_data_get(out, 0, 1));//mean SS within\n    apop_data_set(out, 1, 2, apop_data_get(out, 1, 0)/apop_data_get(out, 1, 1));//mean SS residual\n\n    apop_data_set(out, 0, 3, apop_data_get(out, 0, 2)/apop_data_get(out, 1, 2));//F ratio\n    apop_data_set(out, 0, 4, gsl_cdf_fdist_P(apop_data_get(out, 0, 3), group_df, residual_df));//pval\n    apop_data_set(out, 0, 5, 1- apop_data_get(out, 0, 4));//confidence\n\n    apop_data_free(tss);\n    apop_data_free(wss);\n    return out;\n}\n\n/** This function produces a traditional one- or two-way ANOVA table. It\n  works from data in an SQL table, using queries of a form like <tt>select\n  data from table group by grouping1, grouping2</tt>.\n\n  \\param table The table to be queried. Anything that can go in an SQL <tt>from</tt> clause is OK, so this can be a plain table name or a temp table specification like <tt>(select ... )</tt>, with parens.\n  \\param data The name of the column holding the count or other such data\n  \\param grouping1 The name of the first column by which to group data\n  \\param grouping2 If this is \\c NULL, then the function will return a one-way ANOVA. Otherwise, the name of the second column by which to group data in a two-way ANOVA.\n */\nAPOP_VAR_HEAD apop_data* apop_anova(char *table, char *data, char *grouping1, char *grouping2){\n    char *apop_varad_var(table, NULL)\n    Apop_stopif(!table, return NULL, 0, \"I need the name of a table in the SQL database.\");\n    if (!strchr(table, <|')'|>)) //if you found ()s, then it is a temp table spec.\n        Apop_stopif(!apop_table_exists(table), return NULL, 0, \"I couldn't find the table %s in the database.\", table);\n    char *apop_varad_var(data, NULL)\n    Apop_stopif(!data, return NULL, 0, \"I need the name of the column in the %s table with the count or other data.\", table);\n    char *apop_varad_var(grouping1, NULL)\n    Apop_stopif(!data, return NULL, 0, \"I need at least grouping1, a column in the %s table.\", table);\n    char *apop_varad_var(grouping2, NULL)\nAPOP_VAR_ENDHEAD\n    apop_data *first = apop_anova_one_way(table, data, grouping1);\n    Apop_stopif(first->error, return first, 0, \"Error (%c) running one-way ANOVA.\", first->error);\n    if (!grouping2) return first;\n    apop_data *second = apop_anova_one_way(table, data, grouping2);\n    char *joined = NULL;\n    Asprintf(&joined, \"%s, %s\", grouping1, grouping2);\n    apop_data *interaction = apop_anova_one_way(table, data, joined);\n    apop_data *out = apop_data_calloc(5, 6);\n    apop_name_stack(out->names, first->names, 'c');\n    apop_data_add_names(out, 'r', first->names->row[0], second->names->row[0],\n                                  \"interaction\", \"residual\", \"total\");\n    gsl_vector *firstrow = Apop_rv(first, 0);\n    gsl_vector *secondrow = Apop_rv(second, 0);\n    gsl_vector *interrow = Apop_rv(interaction, 0);\n    gsl_matrix_set_row(out->matrix, 0, firstrow);\n    gsl_matrix_set_row(out->matrix, 1, secondrow);\n    gsl_matrix_set_row(out->matrix, 2, interrow);\n    gsl_matrix_set_row(out->matrix, 4, Apop_rv(first, 2));\n    \n    //residuals are just total-wss\n    apop_data_set(out, 3, 0, apop_data_get(out, 4, 0) \n            - gsl_vector_get(firstrow, 0) - gsl_vector_get(secondrow, 0) - gsl_vector_get(interrow, 0)); //residual sum of squares\n    double residual_df = apop_data_get(out, 4, 1) \n            - gsl_vector_get(firstrow, 1) - gsl_vector_get(secondrow, 1) - gsl_vector_get(interrow, 1); //residual df\n    apop_data_set(out, 3, 1, residual_df);\n\n    apop_data_set(out, 3, 2, apop_data_get(out, 3, 0)/apop_data_get(out, 3, 1));//mean SS residual\n\n    apop_data_set(out, 0, 3, apop_data_get(out, 0, 2)/apop_data_get(out, 3, 2));//F ratio\n    apop_data_set(out, 0, 4, gsl_cdf_fdist_P(apop_data_get(out, 0, 3), gsl_vector_get(firstrow, 1), residual_df));//pval\n    apop_data_set(out, 0, 5, 1- apop_data_get(out, 0, 4));//confidence\n\n    apop_data_set(out, 1, 3, apop_data_get(out, 1, 2)/apop_data_get(out, 3, 2));//F ratio\n    apop_data_set(out, 1, 4, gsl_cdf_fdist_P(apop_data_get(out, 1, 3), gsl_vector_get(secondrow, 1), residual_df));//pval\n    apop_data_set(out, 1, 5, 1- apop_data_get(out, 1, 4));//confidence\n\n    apop_data_set(out, 2, 3, apop_data_get(out, 2, 2)/apop_data_get(out, 3, 2));//F ratio\n    apop_data_set(out, 2, 4, gsl_cdf_fdist_P(apop_data_get(out, 2, 3), gsl_vector_get(interrow, 1), residual_df));//pval\n    apop_data_set(out, 2, 5, 1- apop_data_get(out, 2, 4));//confidence\n\n    free(joined);\n    apop_data_free(first);\n    apop_data_free(second);\n    apop_data_free(interaction);\n    return out;\n}\n\n/** This is a convenience function to do the lookup of a given statistic along a given\ndistribution. You give me a statistic, its (hypothesized) distribution, and whether\nto use the upper tail, lower tail, or both. I will return the odds of a Type I error\ngiven the model---in statistician jargon, the \\f$p\\f$-value.  [Type I error: odds of\nrejecting the null hypothesis when it is true.]\n\n   For example, \n   \\code\n   apop_test(1.3);\n   \\endcode\n\nwill return the density of the standard Normal distribution that is more than 1.3 from zero.  \nIf this function returns a small value, we can be confident that the statistic is significant. Or, \n   \\code\n   apop_test(1.3, \"t\", 10, .tail='u');\n   \\endcode\n\nwill give the appropriate odds for an upper-tailed test using the \\f$t\\f$-distribution with 10 degrees of freedom (e.g., a \\f$t\\f$-test of the null hypothesis that the statistic is less than or equal to zero).\n\nSeveral more distributions are supported; see below.\n\n \\li For a two-tailed test (the default), this returns the density outside the range. I'll only do this for symmetric distributions.\n \\li For an upper-tail test ('u'), this returns the density above the cutoff\n \\li For a lower-tail test ('l'), this returns the density below the cutoff \n \n\\param statistic    The scalar value to be tested.  \n\\param distribution  The name of the distribution; see below.\n\\param p1  The first parameter for the distribution; see below.\n\\param p2  The second parameter for the distribution; see below.\n\\param tail 'u' = upper tail; 'l' = lower tail; anything else = two-tailed. (default = two-tailed)\n\n\\return The odds of a Type I error given the model (the \\f$p\\f$-value).\n\nHere are the distributions you can use and their parameters.\n\n\\c \"normal\" or \\c \"gaussian\" \n\\li p1=\\f$\\mu\\f$, p2=\\f$\\sigma\\f$\n\\li default (0, 1)\n\n\\c \"lognormal\"  \n\\li p1=\\f$\\mu\\f$, p2=\\f$\\sigma\\f$\n\\li default (0, 1) \n\\li Remember, \\f$\\mu\\f$ and \\f$\\sigma\\f$ refer to the Normal one would get after exponentiation\n\\li One-tailed tests only\n\n\\c \"uniform\"  \n\\li p1=lower edge, p2=upper edge\n\\li default (0, 1)\n\\li two-tailed tests are run relative to the center, (p1+p2)/2.\n\n\\c \"t\"  \n\\li p1=df\n\\li no default\n\n\\c \"chi squared\", \\c \"chi\", \\c \"chisq\": \n\\li p1=df\n\\li no default\n\\li One-tailed tests only; default='u' (\\f$p\\f$-value for typical cases)\n\n\\c \"f\"  \n\\li p1=df1, p2=df2\n\\li no default\n\\li One-tailed tests only\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD double apop_test(double statistic, char *distribution, double p1, double p2, char tail){\n    double  apop_varad_var(statistic, 0);\n    char*  apop_varad_var(distribution, NULL);\n    double apop_varad_var(p1, 0);\n    double apop_varad_var(p2, 0);\n    int is_chi = !strcasecmp(distribution, \"chi squared\")|| !strcasecmp(distribution, \"chi\")\n                     || !strcasecmp(distribution, \"chisq\");\n     Apop_stopif(!strcasecmp(distribution, \"f\") && (!p1 || !p2), return NAN, 0, \"I need both a p1 and p2 parameter specifying the degrees of freedom.\");\n     Apop_stopif((!strcasecmp(distribution, \"t\") || !strcasecmp(distribution, \"f\") || is_chi)\n             && !p1, return NAN, 0, \"I need a p1 parameter specifying the degrees of freedom.\");\n     if (!p2 && (!distribution || !strcasecmp(distribution, \"normal\") || !strcasecmp(distribution, \"gaussian\") ))\n         p2 = 1;\n     if (!p2 && p1 >= 0 && !strcasecmp(distribution, \"uniform\"))\n         p2 = 1;\n\n    char apop_varad_var(tail, 0);\n    if (!tail) tail = is_chi ? 'u' : 'a';\nAPOP_VAR_ENDHEAD\n    //This is a long and boring function. I am aware that there are\n    //clever way to make it shorter.\n     if (!distribution || !strcasecmp(distribution, \"normal\") || !strcasecmp(distribution, \"gaussian\") ){\n         if (tail == 'u')\n             return gsl_cdf_gaussian_Q(p1-statistic, p2);\n         else if (tail == 'l')\n             return gsl_cdf_gaussian_P(p1-statistic, p2);\n         else\n             return 2 * gsl_cdf_gaussian_Q(fabs(p1-statistic), p2);\n     }\n    else if (!strcasecmp(distribution, \"lognormal\")){\n         if (tail == 'u')\n             return gsl_cdf_lognormal_Q(statistic, p1, p2);\n         else if (tail == 'l')\n             return gsl_cdf_lognormal_P(statistic, p1, p2);\n         else\n             Apop_assert(0, \"A two-tailed test doesn't really make sense for the lognormal. Please specify either tail= 'u' or tail= 'l'.\");\n     }\n    else if (!strcasecmp(distribution, \"t\")){\n         if (tail == 'u')\n             return gsl_cdf_tdist_Q(statistic, p1);\n         else if (tail == 'l')\n             return gsl_cdf_tdist_P(statistic, p1);\n         else\n             return 2 * gsl_cdf_tdist_Q(fabs(statistic), p1);\n     }\n    else if (!strcasecmp(distribution, \"f\")){\n         if (tail == 'u')\n             return gsl_cdf_fdist_Q(statistic, p1, p2);\n         else if (tail == 'l')\n             return gsl_cdf_fdist_P(statistic, p1, p2);\n         else\n             Apop_assert(0, \"A two-tailed test doesn't really make sense for the %s. Please specify either tail= 'u' or tail= 'l'.\", distribution);\n     }\n    else if (!strcasecmp(distribution, \"chi squared\")|| !strcasecmp(distribution, \"chi\")\n                                                || !strcasecmp(distribution, \"chisq\")){\n         if (tail == 'u')\n             return gsl_cdf_chisq_Q(statistic, p1);\n         else if (tail == 'l')\n             return gsl_cdf_chisq_P(statistic, p1);\n         else\n             Apop_assert(0, \"A two-tailed test doesn't really make sense for the %s. Please specify either tail= 'u' or tail= 'l'.\", distribution);\n     }\n    else if (!strcasecmp(distribution, \"uniform\")){\n         if (tail == 'u')\n             return gsl_cdf_flat_Q(statistic, p1, p2);\n         else if (tail == 'l')\n             return gsl_cdf_flat_P(statistic, p1, p2);\n         else\n             return 2 * gsl_cdf_flat_Q(fabs(statistic - (p1+p2)/2.), p1, p2);\n     }\n    Apop_assert(0, \"Sorry, but I don't recognize %s as a distribution\", distribution);\n}\n"
  },
  {
    "path": "apop_update.m4.c",
    "content": "/** \\file \n  The \\ref apop_update function.  */ \n/* Copyright (c) 2006--2009, 2014 by Ben Klemens. Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <stdbool.h>\n\n/* This file in four parts:\n   --an apop_model named product, purpose-built for apop_update to send to apop_model_metropolis\n   --apop_mcmc settings and their defaults.\n   --apop_update and its equipment, which has three cases:\n        --conjugates, in which case see the functions\n        --call Metropolis\n*/\n\n\n/* This will be used by apop_update to send to apop_mcmc below.\n\n   To set it up, add a more pointer to an array of two models, the prior and likelihood. \n   The total likelihood of a data point is (likelihood these parameters are drawn from\n   prior)*(likelihood of these parameters and the data set using the likelihood fn)\n*/\nstatic long double product_ll(apop_data *d, apop_model *m){\n    apop_model **pl = m->more;\n    gsl_vector *v = apop_data_pack(m->parameters);\n    apop_data_unpack(v, pl[1]->parameters);\n    gsl_vector_free(v);\n    return apop_log_likelihood(m->parameters, pl[0]) + apop_log_likelihood(d, pl[1]);\n}\n\nstatic long double product_constraint(apop_data *data, apop_model *m){\n    apop_model **pl = m->more;\n    gsl_vector *v = apop_data_pack(m->parameters);\n    apop_data_unpack(v, pl[1]->parameters);\n    gsl_vector_free(v);\n    return pl[1]->constraint(data, pl[1]);\n}\n\napop_model *product = &(apop_model){\"product of two models\", \n    .log_likelihood=product_ll, .constraint=product_constraint};\n\n\n///////////the conjugate table\n\nstatic apop_model *betabinom(apop_data *data, apop_model *prior, apop_model *likelihood){\n    apop_model *outp = apop_model_copy(prior);\n    if (!data && likelihood->parameters){\n        double n = likelihood->parameters->vector->data[0];\n        double p = likelihood->parameters->vector->data[1];\n        *gsl_vector_ptr(outp->parameters->vector, 0) += n*p;\n        *gsl_vector_ptr(outp->parameters->vector, 1) += n*(1-p);\n    } else {\n        gsl_vector *hits = Apop_cv(data, 1);\n        gsl_vector *misses = Apop_cv(data, 0);\n        *gsl_vector_ptr(outp->parameters->vector, 0) += apop_sum(hits);\n        *gsl_vector_ptr(outp->parameters->vector, 1) += apop_sum(misses);\n    }\n    return outp;\n}\n\ndouble countup(double in){return in!=0;}\n\nstatic apop_model *betabernie(apop_data *data, apop_model *prior, apop_model *likelihood){\n    apop_model *outp = apop_model_copy(prior);\n    Get_vmsizes(data);//tsize\n    double sum = apop_map_sum(data, .fn_d=countup, .part='a');\n    *gsl_vector_ptr(outp->parameters->vector, 0) += sum;\n    *gsl_vector_ptr(outp->parameters->vector, 1) += tsize - sum;\n    return outp;\n}\n\nstatic apop_model *gammaexpo(apop_data *data, apop_model *prior, apop_model *likelihood){\n    apop_model *outp = apop_model_copy(prior);\n    Get_vmsizes(data); //maxsize\n    *gsl_vector_ptr(outp->parameters->vector, 0) += maxsize;\n    apop_data_set(outp->parameters, 1, .val=1./\n                          (1./apop_data_get(outp->parameters, 1) \n                        + (data->matrix ? apop_matrix_sum(data->matrix) : 0)\n                        + (data->vector ? apop_sum(data->vector) : 0)));\n    return outp;\n}\n\nstatic apop_model *gammapoisson(apop_data *data, apop_model *prior, apop_model *likelihood){\n    /* Posterior alpha = alpha_0 + sum x; posterior beta = beta_0/(beta_0*n + 1) */\n    apop_model *outp = apop_model_copy(prior);\n    Get_vmsizes(data); //vsize, msize1,maxsize\n    *gsl_vector_ptr(outp->parameters->vector, 0) +=\n                         (vsize  ? apop_sum(data->vector): 0) +\n                         (msize1 ? apop_matrix_sum(data->matrix): 0);\n\n    double *beta = gsl_vector_ptr(outp->parameters->vector, 1);\n    *beta = *beta/(*beta * maxsize + 1);\n    return outp;\n}\n\nstatic apop_model *normnorm(apop_data *data, apop_model *prior, apop_model *likelihood){\n/*\noutput \\f$(\\mu, \\sigma) = (\\frac{\\mu_0}{\\sigma_0^2} + \\frac{\\sum_{i=1}^n x_i}{\\sigma^2})/(\\frac{1}{\\sigma_0^2} + \\frac{n}{\\sigma^2}), (\\frac{1}{\\sigma_0^2} + \\frac{n}{\\sigma^2})^{-1}\\f$\n\nThat is, the output is weighted by the number of data points for the\nlikelihood. If you give me a parametrized normal, with no data, then I'll take the weight to be \\f$n=1\\f$. \n*/\n    double mu_like, var_like;\n    long int n;\n    apop_model *outp = apop_model_copy(prior);\n    apop_prep(data, outp);\n    long double  mu_pri = prior->parameters->vector->data[0];\n    long double  var_pri = gsl_pow_2(prior->parameters->vector->data[1]);\n    if (!data && likelihood->parameters){\n        mu_like  = likelihood->parameters->vector->data[0];\n        var_like = gsl_pow_2(likelihood->parameters->vector->data[1]);\n        n        = 1;\n    } else {\n        n = data->matrix->size1 * data->matrix->size2;\n        apop_matrix_mean_and_var(data->matrix, &mu_like, &var_like);\n    }\n    gsl_vector_set(outp->parameters->vector, 0, (mu_pri/var_pri + n*mu_like/var_like)/(1/var_pri + n/var_like));\n    gsl_vector_set(outp->parameters->vector, 1, pow((1/var_pri + n/var_like), -.5));\n    return outp;\n}\n\n/** Take in a prior and likelihood distribution, and output a posterior distribution.\n\n\\li This function first checks a table of conjugate distributions for the pair you sent\nin. If the models are listed on the table, then the function returns a corresponding\nclosed-form model with updated parameters.\n\n\\li If the parameters aren't in the table of conjugate, and the prior distribution has\na \\c p or \\c log_likelihood element, then use \\ref apop_model_metropolis to generate\nthe posterior.  If you expect MCMC to run, you may add an \\ref apop_mcmc_settings\ngroup to your prior to control the details of the search. See also the \\ref\napop_model_metropolis documentation.\n\n\\li If the prior does not have a \\c p or \\c log_likelihood but does have a \\c draw\nelement, then make draws from the prior and weight them by the \\c p given by the\nlikelihood distribution. This is not a rejection sampling method, so the burnin\nis ignored.\n\n\\param data     The input data, that will be used by the likelihood function (default = \\c NULL.)\n\\param  prior   The prior \\ref apop_model. If the system needs to\nestimate the posterior via MCMC, this needs to have a \\c log_likelihood or \\c p method.  (No default, must not be \\c NULL.)\n\\param likelihood The likelihood \\ref apop_model. If the system needs to\nestimate the posterior via MCMC, this needs to have a \\c log_likelihood or \\c p method (ll preferred). (No default, must not be \\c NULL.)\n\\param rng      A \\c gsl_rng, already initialized (e.g., via \\ref apop_rng_alloc). (default: an RNG from \\ref apop_rng_get_thread)\n\\return an \\ref apop_model struct representing the posterior, with updated parameters. \n\n\n\\li In all cases, the output is a \\ref apop_model that can be used as the input to this\nfunction, so you can chain Bayesian updating procedures.\n\\li Here are the conjugate distributions currently defined:\n\n<table>\n<tr>\n<td> Prior <td></td> Likelihood  <td></td>  Notes \n</td> </tr> <tr>\n<td> \\ref apop_beta \"Beta\" <td></td> \\ref apop_binomial \"Binomial\"  <td></td>  \n</td> </tr> <tr>\n<td> \\ref apop_beta \"Beta\" <td></td> \\ref apop_bernoulli \"Bernoulli\"  <td></td> \n</td> </tr> <tr>\n<td> \\ref apop_exponential \"Exponential\" <td></td> \\ref apop_gamma \"Gamma\"  <td></td>  Gamma likelihood represents the distribution of \\f$\\lambda^{-1}\\f$, not plain \\f$\\lambda\\f$\n</td> </tr> <tr>\n<td> \\ref apop_normal \"Normal\" <td></td> \\ref apop_normal \"Normal\" <td></td>  Assumes prior with fixed \\f$\\sigma\\f$; updates distribution for \\f$\\mu\\f$\n</td></tr> <tr>\n<td> \\ref apop_gamma \"Gamma\" <td></td> \\ref apop_poisson \"Poisson\" <td></td> Uses sum and size of the data  \n</td></tr>\n</table>\n\nHere is a test function that compares the output via conjugate table and via\nMetropolis-Hastings sampling: \n\\include test_updating.c\n\n\\li The conjugate table is stored using a vtable; see \\ref vtables for details. If you\nare writing a new vtable entry, the typedef new functions must conform to and the hash\nused for lookups are:\n\n\\code\ntypedef apop_model *(*apop_update_type)(apop_data *, apop_model , apop_model);\n#define apop_update_hash(m1, m2) ((size_t)(m1).draw + (size_t)((m2).log_likelihood ? (m2).log_likelihood : (m2).p)*33)\n\\endcode\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\nAPOP_VAR_HEAD apop_model * apop_update(apop_data *data, apop_model *prior, apop_model *likelihood, gsl_rng *rng){\n    apop_data *apop_varad_var(data, NULL);\n    apop_model *apop_varad_var(prior, NULL);\n    apop_model *apop_varad_var(likelihood, NULL);\n    gsl_rng *apop_varad_var(rng, apop_rng_get_thread(-1));\nAPOP_VAR_END_HEAD\n    static int setup=0; if (!(setup++)){\n        apop_update_vtable_add(betabinom, apop_beta, apop_binomial);\n        apop_update_vtable_add(betabernie, apop_beta, apop_bernoulli);\n        apop_update_vtable_add(gammaexpo, apop_gamma, apop_exponential);\n        apop_update_vtable_add(gammapoisson, apop_gamma, apop_poisson);\n        apop_update_vtable_add(normnorm, apop_normal, apop_normal);\n    }\n    apop_update_type conj = apop_update_vtable_get(prior, likelihood);\n    if (conj) return conj(data, prior, likelihood);\n\n    apop_mcmc_settings *s = apop_settings_get_group(prior, apop_mcmc);\n\n    apop_prep(NULL, prior); //probably a no-op\n    apop_prep(data, likelihood); //probably a no-op\n    gsl_vector *pack = apop_data_pack(likelihood->parameters);\n    int tsize = pack->size;\n    gsl_vector_free(pack);\n    Apop_stopif(prior->dsize != tsize, \n                return apop_model_copy(&(apop_model){.error='d'}),\n                0, \"Size of a draw from the prior does not match \"\n                   \"the size of the likelihood's parameters (%i != %i).%s\",\n                   prior->dsize, tsize, \n                   (tsize > prior->dsize) ?  \n                        \" Perhaps use apop_model_fix_params to reduce the \"\n                        \"likelihood's parameter count?\" : \"\");\n    if (prior->p || prior->log_likelihood){\n        apop_model *p = apop_model_copy(product);\n        //pending revision, a memory leak:\n        p->more = malloc(sizeof(apop_model*)*2);\n        ((apop_model**)p->more)[0] = apop_model_copy(prior);\n        ((apop_model**)p->more)[1] = apop_model_copy(likelihood);\n        p->more_size = sizeof(apop_model*) * 2;\n        p->parameters = apop_data_alloc(prior->dsize);\n        p->data = data;\n        if (s) apop_settings_copy_group(p, prior, \"apop_mcmc\");\n        apop_model *out = apop_model_metropolis(data, rng, p); \n        return out;\n    }\n\n    Apop_stopif(!prior->draw, return NULL, 0, \"prior does not have a .p, .log_likelihood, or .draw element. I am stumped. Returning NULL.\");\n\n    if (!s) s = Apop_model_add_group(prior, apop_mcmc);\n\n    gsl_vector *draw = gsl_vector_alloc(tsize);\n    apop_data *out = apop_data_alloc(s->periods, tsize);\n    out->weights = gsl_vector_alloc(s->periods);\n\n    apop_draw(draw->data, rng, prior); //set starting point.\n    apop_data_unpack(draw, likelihood->parameters);\n\n    for (int i=0; i< s->periods; i++){\n        newdraw:\n        apop_draw(draw->data, rng, prior);\n        apop_data_unpack(draw, likelihood->parameters);\n        long double p = apop_p(data, likelihood);\n\n        Apop_notify(3, \"p=%Lg for parameters:\\t\", p);\n        if (apop_opts.verbose >=3) apop_data_print(likelihood->parameters);\n\n        Apop_stopif(gsl_isnan(p), goto newdraw,\n                1, \"Trouble evaluating the \"\n                \"likelihood function at vector beginning with %g. \"\n                \"Throwing it out and trying again.\\n\"\n                , likelihood->parameters->vector->data[0]);\n        apop_data_pack(likelihood->parameters, Apop_rv(out, i));\n        gsl_vector_set(out->weights, i, p);\n    }\n    apop_model *outp = apop_estimate(out, apop_pmf);\n    gsl_vector_free(draw);\n    return outp;\n}\n"
  },
  {
    "path": "apop_vtables.c",
    "content": "#include <stdlib.h>\n#include <string.h>\n#include \"apop_internal.h\" //just for OMP_critical\n\n#ifdef _OPENMP\n#include <omp.h>\n#define lock omp_set_lock(&v->mutex);\n#define unlock omp_unset_lock(&v->mutex);\n#else\n#define lock \n#define unlock \n#endif\n\n/** \\cond doxy_ignore */\ntypedef struct {\n    size_t hash;\n    void *fn;\n} apop_vtable_elmt_s;\n\ntypedef struct {\n    char const *name;\n    unsigned long int hashed_name;\n    int elmt_ct;\n    apop_vtable_elmt_s *elmts;\n#ifdef _OPENMP\n    omp_lock_t mutex;\n#endif\n} apop_vtable_s;\n\napop_vtable_s *vtable_list;\nint ignore_me;\n/** \\endcond */ //End of Doxygen ignore.\n\n//The Dan J Bernstein string hashing algorithm.\nstatic unsigned long apop_settings_hash(char const *str){\n    unsigned long int hash = 5381;\n    char c;\n    while ((c = *str++)) hash = hash*33 + c;\n    return hash;\n}\n\nstatic apop_vtable_s *find_tab(unsigned long h, int *ctr){\n    apop_vtable_s *v = vtable_list;\n    *ctr = 0;\n    for ( ; v->hashed_name; (*ctr)++, v++) if (v->hashed_name== h) break;\n    return v;\n}\n\n//return 0 = found; removed\n//return 1 = not found; no-op\nint apop_vtable_drop(char const *tabname, unsigned long hash){\n    if (!vtable_list) return 1;\n    unsigned long h = apop_settings_hash(tabname);\n    apop_vtable_s *v = find_tab(h, &ignore_me);\n\n    lock\n    for (int i=0; i< v->elmt_ct; i++)\n        if (hash == v->elmts[i].hash) {\n            memmove(v->elmts+i, v->elmts+i+1, sizeof(apop_vtable_elmt_s)*(v->elmt_ct-i));\n            v->elmt_ct--;\n            unlock\n            return 0;\n        }\n    unlock\n    return 1;\n}\n\nint apop_vtable_add(char const *tabname, void *fn_in, unsigned long hash){\n    if (!vtable_list){vtable_list = calloc(1, sizeof(apop_vtable_s));}\n\n    unsigned long h = apop_settings_hash(tabname);\n    int ctr;\n    apop_vtable_s *v;\n\n\n    //add a table if need be.\n    OMP_critical (new_vtable)\n    {\n    v = find_tab(h, &ctr);\n\n    if (!v->hashed_name){\n        vtable_list = realloc(vtable_list, (ctr+2)* sizeof(apop_vtable_s));\n        vtable_list[ctr] = (apop_vtable_s){.elmts=calloc(1, sizeof(apop_vtable_elmt_s))};\n        vtable_list[ctr+1] = (apop_vtable_s){ };\n        #ifdef _OPENMP\n            omp_init_lock(&vtable_list[ctr].mutex);\n            omp_set_lock(&vtable_list[ctr].mutex);\n        #endif\n        vtable_list[ctr].name = tabname;\n        vtable_list[ctr].hashed_name = h;\n        v = vtable_list+ctr;\n        unlock\n    }\n    }\n\n    lock\n    //If this hash is already present, don't re-add. \n    for (int i=0; i< v->elmt_ct; i++) if (hash == v->elmts[i].hash) {unlock; return 0;}\n\n    //insert\n    v->elmts = realloc(v->elmts, (++(v->elmt_ct))* sizeof(apop_vtable_elmt_s));\n    v->elmts[v->elmt_ct-1] = (apop_vtable_elmt_s){.hash=hash, .fn=fn_in};\n    unlock\n    return 0;\n}\n\nvoid *apop_vtable_get(char const *tabname, unsigned long hash){\n    if (!vtable_list) return NULL;\n    unsigned long thash = apop_settings_hash(tabname);\n    apop_vtable_s *v = find_tab(thash, &ignore_me);\n    if (!v->hashed_name) return NULL;\n\n    lock\n    for (int i=0; i< v->elmt_ct; i++)\n        if (hash == v->elmts[i].hash) {unlock; return v->elmts[i].fn;}\n    unlock\n    return NULL;\n}\n"
  },
  {
    "path": "asprintf.c",
    "content": "    /** \\cond doxy_ignore */\n/* Formatted output to strings.\n   Copyright (C) 1999, 2002 Free Software Foundation, Inc.\n\n   This program is free software; you can redistribute it and/or modify\n   it under the terms of the GNU General Public License as published by\n   the Free Software Foundation; either version 2, or (at your option)\n   any later version.\n\n   This program is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n   GNU General Public License for more details.\n\n   You should have received a copy of the GNU General Public License along\n   with this program; if not, write to the Free Software Foundation,\n   Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  */\n\n/* Mashed into one file by BK */\n\n/* Tell glibc's <stdio.h> to provide a prototype for snprintf().\n   This must come before <config.h> because <config.h> may include\n   <features.h>, and once <features.h> has been included, it's too late.  */\n#ifndef _GNU_SOURCE\n# define _GNU_SOURCE    1\n#endif\n\n/* vsprintf with automatic memory allocation. */\n\n#include <stdio.h>\n#include <stdarg.h>\n#include <stddef.h>\n\n#ifdef HAVE_CONFIG_H\n# include <config.h>\n#endif\n\n#if HAVE_ASPRINTF\n#else\n\n#ifndef __attribute__\n/* This feature is available in gcc versions 2.5 and later.  */\n# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 5) || __STRICT_ANSI__\n#  define __attribute__(Spec) /* empty */\n# endif\n/* The __-protected variants of `format' and `printf' attributes\n   are accepted by gcc versions 2.6.4 (effectively 2.7) and later.  */\n# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 7)\n#  define __format__ format\n#  define __printf__ printf\n# endif\n#endif\n\n#ifdef\t__cplusplus\nextern \"C\" {\n#endif\n\n/* Write formatted output to a string dynamically allocated with malloc().\n   If the memory allocation succeeds, store the address of the string in\n   *RESULT and return the number of resulting bytes, excluding the trailing\n   NUL.  Upon memory allocation error, or some other error, return -1.  */\nextern int asprintf (char **result, const char *format, ...)\n       __attribute__ ((__format__ (__printf__, 2, 3)));\nextern int vasprintf (char **result, const char *format, va_list args)\n       __attribute__ ((__format__ (__printf__, 2, 0)));\n\n//from vasnprintf\nextern char * asnprintf (char *resultbuf, size_t *lengthp, const char *format, ...)\n       __attribute__ ((__format__ (__printf__, 3, 4)));\nextern char * vasnprintf (char *resultbuf, size_t *lengthp, const char *format, va_list args)\n       __attribute__ ((__format__ (__printf__, 3, 0)));\n\n#ifdef\t__cplusplus\n}\n#endif\n\n\n/* Decomposed printf argument list. */\n\n/* Get wint_t.  */\n#ifdef HAVE_WINT_T\n# include <wchar.h>\n#endif\n\n/* Argument types */\ntypedef enum {\n  TYPE_NONE,\n  TYPE_SCHAR,\n  TYPE_UCHAR,\n  TYPE_SHORT,\n  TYPE_USHORT,\n  TYPE_INT,\n  TYPE_UINT,\n  TYPE_LONGINT,\n  TYPE_ULONGINT,\n#ifdef HAVE_LONG_LONG\n  TYPE_LONGLONGINT,\n  TYPE_ULONGLONGINT,\n#endif\n  TYPE_DOUBLE,\n#ifdef HAVE_LONG_DOUBLE\n  TYPE_LONGDOUBLE,\n#endif\n  TYPE_CHAR,\n#ifdef HAVE_WINT_T\n  TYPE_WIDE_CHAR,\n#endif\n  TYPE_STRING,\n#ifdef HAVE_WCHAR_T\n  TYPE_WIDE_STRING,\n#endif\n  TYPE_POINTER,\n  TYPE_COUNT_SCHAR_POINTER,\n  TYPE_COUNT_SHORT_POINTER,\n  TYPE_COUNT_INT_POINTER,\n  TYPE_COUNT_LONGINT_POINTER\n#ifdef HAVE_LONG_LONG\n, TYPE_COUNT_LONGLONGINT_POINTER\n#endif\n} arg_type;\n\n/* Polymorphic argument */\ntypedef struct {\n  arg_type type;\n  union\n  {\n    signed char\t\t\ta_schar;\n    unsigned char\t\ta_uchar;\n    short\t\t\ta_short;\n    unsigned short\t\ta_ushort;\n    int\t\t\t\ta_int;\n    unsigned int\t\ta_uint;\n    long int\t\t\ta_longint;\n    unsigned long int\t\ta_ulongint;\n#ifdef HAVE_LONG_LONG\n    long long int\t\ta_longlongint;\n    unsigned long long int\ta_ulonglongint;\n#endif\n    float\t\t\ta_float;\n    double\t\t\ta_double;\n#ifdef HAVE_LONG_DOUBLE\n    long double\t\t\ta_longdouble;\n#endif\n    int\t\t\t\ta_char;\n#ifdef HAVE_WINT_T\n    wint_t\t\t\ta_wide_char;\n#endif\n    const char*\t\t\ta_string;\n#ifdef HAVE_WCHAR_T\n    const wchar_t*\t\ta_wide_string;\n#endif\n    void*\t\t\ta_pointer;\n    signed char *\t\ta_count_schar_pointer;\n    short *\t\t\ta_count_short_pointer;\n    int *\t\t\ta_count_int_pointer;\n    long int *\t\t\ta_count_longint_pointer;\n#ifdef HAVE_LONG_LONG\n    long long int *\t\ta_count_longlongint_pointer;\n#endif\n  }\n  a;\n}\nargument;\n\ntypedef struct {\n  size_t count;\n  argument *arg;\n}\narguments;\n\n/* Fetch the arguments, putting them into a. */\n#ifdef STATIC\nSTATIC\n#else\nextern\n#endif\nint printf_fetchargs (va_list args, arguments *a);\n\n/* Parse printf format string. */\n\n/* Flags */\n#define FLAG_GROUP\t 1\t/* ' flag */\n#define FLAG_LEFT\t 2\t/* - flag */\n#define FLAG_SHOWSIGN\t 4\t/* + flag */\n#define FLAG_SPACE\t 8\t/* space flag */\n#define FLAG_ALT\t16\t/* # flag */\n#define FLAG_ZERO\t32\n\n/* arg_index value indicating that no argument is consumed.  */\n#define ARG_NONE\t(~(size_t)0)\n\n/* A parsed directive.  */\ntypedef struct {\n  const char* dir_start;\n  const char* dir_end;\n  int flags;\n  const char* width_start;\n  const char* width_end;\n  size_t width_arg_index;\n  const char* precision_start;\n  const char* precision_end;\n  size_t precision_arg_index;\n  char conversion; /* d i o u x X f e E g G c s p n U % but not C S */\n  size_t arg_index;\n}\nchar_directive;\n\n/* A parsed format string.  */\ntypedef struct {\n  size_t count;\n  char_directive *dir;\n  size_t max_width_length;\n  size_t max_precision_length;\n}\nchar_directives;\n\n\n/* Parses the format string.  Fills in the number N of directives, and fills\n   in directives[0], ..., directives[N-1], and sets directives[N].dir_start\n   to the end of the format string.  Also fills in the arg_type fields of the\n   arguments and the needed count of arguments.  */\n#ifdef STATIC\nSTATIC\n#else\nextern\n#endif\nint printf_parse (const char *format, char_directives *d, arguments *a);\n\n/*end headers */\n\nchar * asnprintf (char *resultbuf, size_t *lengthp, const char *format, ...) {\n  va_list args;\n  char *result;\n\n  va_start (args, format);\n  result = vasnprintf (resultbuf, lengthp, format, args);\n  va_end (args);\n  return result;\n}\n\nint asprintf (char **resultp, const char *format, ...) {\n  va_list args;\n  int result;\n\n  va_start (args, format);\n  result = vasprintf (resultp, format, args);\n  va_end (args);\n  return result;\n}\n\n#ifdef STATIC\nSTATIC\n#endif\nint printf_fetchargs (va_list args, arguments *a) {\n  size_t i;\n  argument *ap;\n\n  for (i = 0, ap = &a->arg[0]; i < a->count; i++, ap++)\n    switch (ap->type)\n      {\n      case TYPE_SCHAR:\n\tap->a.a_schar = va_arg (args, /*signed char*/ int);\n\tbreak;\n      case TYPE_UCHAR:\n\tap->a.a_uchar = va_arg (args, /*unsigned char*/ int);\n\tbreak;\n      case TYPE_SHORT:\n\tap->a.a_short = va_arg (args, /*short*/ int);\n\tbreak;\n      case TYPE_USHORT:\n\tap->a.a_ushort = va_arg (args, /*unsigned short*/ int);\n\tbreak;\n      case TYPE_INT:\n\tap->a.a_int = va_arg (args, int);\n\tbreak;\n      case TYPE_UINT:\n\tap->a.a_uint = va_arg (args, unsigned int);\n\tbreak;\n      case TYPE_LONGINT:\n\tap->a.a_longint = va_arg (args, long int);\n\tbreak;\n      case TYPE_ULONGINT:\n\tap->a.a_ulongint = va_arg (args, unsigned long int);\n\tbreak;\n#ifdef HAVE_LONG_LONG\n      case TYPE_LONGLONGINT:\n\tap->a.a_longlongint = va_arg (args, long long int);\n\tbreak;\n      case TYPE_ULONGLONGINT:\n\tap->a.a_ulonglongint = va_arg (args, unsigned long long int);\n\tbreak;\n#endif\n      case TYPE_DOUBLE:\n\tap->a.a_double = va_arg (args, double);\n\tbreak;\n#ifdef HAVE_LONG_DOUBLE\n      case TYPE_LONGDOUBLE:\n\tap->a.a_longdouble = va_arg (args, long double);\n\tbreak;\n#endif\n      case TYPE_CHAR:\n\tap->a.a_char = va_arg (args, int);\n\tbreak;\n#ifdef HAVE_WINT_T\n      case TYPE_WIDE_CHAR:\n\tap->a.a_wide_char = va_arg (args, wint_t);\n\tbreak;\n#endif\n      case TYPE_STRING:\n\tap->a.a_string = va_arg (args, const char *);\n\tbreak;\n#ifdef HAVE_WCHAR_T\n      case TYPE_WIDE_STRING:\n\tap->a.a_wide_string = va_arg (args, const wchar_t *);\n\tbreak;\n#endif\n      case TYPE_POINTER:\n\tap->a.a_pointer = va_arg (args, void *);\n\tbreak;\n      case TYPE_COUNT_SCHAR_POINTER:\n\tap->a.a_count_schar_pointer = va_arg (args, signed char *);\n\tbreak;\n      case TYPE_COUNT_SHORT_POINTER:\n\tap->a.a_count_short_pointer = va_arg (args, short *);\n\tbreak;\n      case TYPE_COUNT_INT_POINTER:\n\tap->a.a_count_int_pointer = va_arg (args, int *);\n\tbreak;\n      case TYPE_COUNT_LONGINT_POINTER:\n\tap->a.a_count_longint_pointer = va_arg (args, long int *);\n\tbreak;\n#ifdef HAVE_LONG_LONG\n      case TYPE_COUNT_LONGLONGINT_POINTER:\n\tap->a.a_count_longlongint_pointer = va_arg (args, long long int *);\n\tbreak;\n#endif\n      default:\n\t/* Unknown type.  */\n\treturn -1;\n      }\n  return 0;\n}\n\n/* Get intmax_t.  */\n#if HAVE_STDINT_H_WITH_UINTMAX\n# include <stdint.h>\n#endif\n#if HAVE_INTTYPES_H_WITH_UINTMAX\n# include <inttypes.h>\n#endif\n\n/* malloc(), realloc(), free().  */\n#include <stdlib.h>\n\n/* xsize.h -- Checked size_t computations. */\n\n#ifndef _XSIZE_H\n#define _XSIZE_H\n\n/* Get SIZE_MAX.  */\n#include <limits.h>\n#if HAVE_STDINT_H\n# include <stdint.h>\n#endif\n\n/* The size of memory objects is often computed through expressions of\n   type size_t. Example:\n      void* p = malloc (header_size + n * element_size).\n   These computations can lead to overflow.  When this happens, malloc()\n   returns a piece of memory that is way too small, and the program then\n   crashes while attempting to fill the memory.\n   To avoid this, the functions and macros in this file check for overflow.\n   The convention is that SIZE_MAX represents overflow.\n   malloc (SIZE_MAX) is not guaranteed to fail -- think of a malloc\n   implementation that uses mmap --, it's recommended to use size_overflow_p()\n   or size_in_bounds_p() before invoking malloc().\n   The example thus becomes:\n      size_t size = xsum (header_size, xtimes (n, element_size));\n      void *p = (size_in_bounds_p (size) ? malloc (size) : NULL);\n*/\n\n/* Convert an arbitrary value >= 0 to type size_t.  */\n#define xcast_size_t(N) \\\n  ((N) <= SIZE_MAX ? (size_t) (N) : SIZE_MAX)\n\n/* Sum of two sizes, with overflow check.  */\nstatic inline size_t\n#if __GNUC__ >= 3\n__attribute__ ((__pure__))\n#endif\nxsum (size_t size1, size_t size2) {\n  size_t sum = size1 + size2;\n  return (sum >= size1 ? sum : SIZE_MAX);\n}\n\n/* Sum of three sizes, with overflow check.  */\nstatic inline size_t\n#if __GNUC__ >= 3\n__attribute__ ((__pure__))\n#endif\nxsum3 (size_t size1, size_t size2, size_t size3) {\n  return xsum (xsum (size1, size2), size3);\n}\n\n/* Sum of four sizes, with overflow check.  */\nstatic inline size_t\n#if __GNUC__ >= 3\n__attribute__ ((__pure__))\n#endif\nxsum4 (size_t size1, size_t size2, size_t size3, size_t size4) {\n  return xsum (xsum (xsum (size1, size2), size3), size4);\n}\n\n/* Maximum of two sizes, with overflow check.  */\nstatic inline size_t\n#if __GNUC__ >= 3\n__attribute__ ((__pure__))\n#endif\nxmax (size_t size1, size_t size2) {\n  /* No explicit check is needed here, because for any n:\n     max (SIZE_MAX, n) == SIZE_MAX and max (n, SIZE_MAX) == SIZE_MAX.  */\n  return (size1 >= size2 ? size1 : size2);\n}\n\n/* Multiplication of a count with an element size, with overflow check.\n   The count must be >= 0 and the element size must be > 0.\n   This is a macro, not an inline function, so that it works correctly even\n   when N is of a wider tupe and N > SIZE_MAX.  */\n#define xtimes(N, ELSIZE) \\\n  ((N) <= SIZE_MAX / (ELSIZE) ? (size_t) (N) * (ELSIZE) : SIZE_MAX)\n\n/* Check for overflow.  */\n#define size_overflow_p(SIZE) \\\n  ((SIZE) == SIZE_MAX)\n/* Check against overflow.  */\n#define size_in_bounds_p(SIZE) \\\n  ((SIZE) != SIZE_MAX)\n\n#endif /* _XSIZE_H */\n\n\n#if WIDE_CHAR_VERSION\n# define PRINTF_PARSE wprintf_parse\n# define CHAR_T wchar_t\n# define DIRECTIVE wchar_t_directive\n# define DIRECTIVES wchar_t_directives\n#else\n# define PRINTF_PARSE printf_parse\n# define CHAR_T char\n# define DIRECTIVE char_directive\n# define DIRECTIVES char_directives\n#endif\n\n#ifdef STATIC\nSTATIC\n#endif\nint\nPRINTF_PARSE (const CHAR_T *format, DIRECTIVES *d, arguments *a) {\n  const CHAR_T *cp = format;\t\t/* pointer into format */\n  size_t arg_posn = 0;\t\t/* number of regular arguments consumed */\n  size_t d_allocated;\t\t\t/* allocated elements of d->dir */\n  size_t a_allocated;\t\t\t/* allocated elements of a->arg */\n  size_t max_width_length = 0;\n  size_t max_precision_length = 0;\n\n  d->count = 0;\n  d_allocated = 1;\n  d->dir = malloc (d_allocated * sizeof (DIRECTIVE));\n  if (d->dir == NULL)\n    /* Out of memory.  */\n    return -1;\n\n  a->count = 0;\n  a_allocated = 0;\n  a->arg = NULL;\n\n#define REGISTER_ARG(_index_,_type_) \\\n  {\t\t\t\t\t\t\t\t\t\\\n    size_t n = (_index_);\t\t\t\t\t\t\\\n    if (n >= a_allocated)\t\t\t\t\t\t\\\n      {\t\t\t\t\t\t\t\t\t\\\n\tsize_t memory_size;\t\t\t\t\t\t\\\n\targument *memory;\t\t\t\t\t\t\\\n\t\t\t\t\t\t\t\t\t\\\n\ta_allocated = xtimes (a_allocated, 2);\t\t\t\t\\\n\tif (a_allocated <= n)\t\t\t\t\t\t\\\n\t  a_allocated = xsum (n, 1);\t\t\t\t\t\\\n\tmemory_size = xtimes (a_allocated, sizeof (argument));\t\t\\\n\tif (size_overflow_p (memory_size))\t\t\t\t\\\n\t  /* Overflow, would lead to out of memory.  */\t\t\t\\\n\t  goto error;\t\t\t\t\t\t\t\\\n\tmemory = (a->arg\t\t\t\t\t\t\\\n\t\t  ? realloc (a->arg, memory_size)\t\t\t\\\n\t\t  : malloc (memory_size));\t\t\t\t\\\n\tif (memory == NULL)\t\t\t\t\t\t\\\n\t  /* Out of memory.  */\t\t\t\t\t\t\\\n\t  goto error;\t\t\t\t\t\t\t\\\n\ta->arg = memory;\t\t\t\t\t\t\\\n      }\t\t\t\t\t\t\t\t\t\\\n    while (a->count <= n)\t\t\t\t\t\t\\\n      a->arg[a->count++].type = TYPE_NONE;\t\t\t\t\\\n    if (a->arg[n].type == TYPE_NONE)\t\t\t\t\t\\\n      a->arg[n].type = (_type_);\t\t\t\t\t\\\n    else if (a->arg[n].type != (_type_))\t\t\t\t\\\n      /* Ambiguous type for positional argument.  */\t\t\t\\\n      goto error;\t\t\t\t\t\t\t\\\n  }\n\n  while (*cp != '\\0')\n    {\n      CHAR_T c = *cp++;\n      if (c == '%')\n\t{\n\t  size_t arg_index = ARG_NONE;\n\t  DIRECTIVE *dp = &d->dir[d->count];/* pointer to next directive */\n\n\t  /* Initialize the next directive.  */\n\t  dp->dir_start = cp - 1;\n\t  dp->flags = 0;\n\t  dp->width_start = NULL;\n\t  dp->width_end = NULL;\n\t  dp->width_arg_index = ARG_NONE;\n\t  dp->precision_start = NULL;\n\t  dp->precision_end = NULL;\n\t  dp->precision_arg_index = ARG_NONE;\n\t  dp->arg_index = ARG_NONE;\n\n\t  /* Test for positional argument.  */\n\t  if (*cp >= '0' && *cp <= '9') {\n\t      const CHAR_T *np;\n\n\t      for (np = cp; *np >= '0' && *np <= '9'; np++)\n            ;\n\t      if (*np == '$') {\n              size_t n = 0;\n\n              for (np = cp; *np >= '0' && *np <= '9'; np++)\n                n = xsum (xtimes (n, 10), *np - '0');\n              if (n == 0)\n                /* Positional argument 0.  */\n                goto error;\n              if (size_overflow_p (n))\n                /* n too large, would lead to out of memory later.  */\n                goto error;\n              arg_index = n - 1;\n              cp = np + 1;\n          }\n\t    }\n\n\t  /* Read the flags.  */\n\t  for (;;) {\n\t      if (*cp == '\\'') {\n              dp->flags |= FLAG_GROUP;\n              cp++;\n\t\t  } else if (*cp == '-') {\n              dp->flags |= FLAG_LEFT;\n              cp++;\n\t\t  } else if (*cp == '+') {\n              dp->flags |= FLAG_SHOWSIGN;\n              cp++;\n\t\t  } else if (*cp == ' ') {\n              dp->flags |= FLAG_SPACE;\n              cp++;\n\t\t  } else if (*cp == '#') {\n              dp->flags |= FLAG_ALT;\n              cp++;\n\t\t  } else if (*cp == '0') {\n              dp->flags |= FLAG_ZERO;\n              cp++;\n\t\t  } else\n\t\t      break;\n      }\n\n\t  /* Parse the field width.  */\n\t  if (*cp == '*')\n\t    {\n\t      dp->width_start = cp;\n\t      cp++;\n\t      dp->width_end = cp;\n\t      if (max_width_length < 1)\n\t\tmax_width_length = 1;\n\n\t      /* Test for positional argument.  */\n\t      if (*cp >= '0' && *cp <= '9')\n\t\t{\n\t\t  const CHAR_T *np;\n\n\t\t  for (np = cp; *np >= '0' && *np <= '9'; np++)\n\t\t    ;\n\t\t  if (*np == '$')\n\t\t    {\n\t\t      size_t n = 0;\n\n\t\t      for (np = cp; *np >= '0' && *np <= '9'; np++)\n\t\t\tn = xsum (xtimes (n, 10), *np - '0');\n\t\t      if (n == 0)\n\t\t\t/* Positional argument 0.  */\n\t\t\tgoto error;\n\t\t      if (size_overflow_p (n))\n\t\t\t/* n too large, would lead to out of memory later.  */\n\t\t\tgoto error;\n\t\t      dp->width_arg_index = n - 1;\n\t\t      cp = np + 1;\n\t\t    }\n\t\t}\n\t      if (dp->width_arg_index == ARG_NONE)\n\t\t{\n\t\t  dp->width_arg_index = arg_posn++;\n\t\t  if (dp->width_arg_index == ARG_NONE)\n\t\t    /* arg_posn wrapped around.  */\n\t\t    goto error;\n\t\t}\n\t      REGISTER_ARG (dp->width_arg_index, TYPE_INT);\n\t    }\n\t  else if (*cp >= '0' && *cp <= '9')\n\t    {\n\t      size_t width_length;\n\n\t      dp->width_start = cp;\n\t      for (; *cp >= '0' && *cp <= '9'; cp++)\n\t\t;\n\t      dp->width_end = cp;\n\t      width_length = dp->width_end - dp->width_start;\n\t      if (max_width_length < width_length)\n\t\tmax_width_length = width_length;\n\t    }\n\n\t  /* Parse the precision.  */\n\t  if (*cp == '.')\n\t    {\n\t      cp++;\n\t      if (*cp == '*')\n\t\t{\n\t\t  dp->precision_start = cp - 1;\n\t\t  cp++;\n\t\t  dp->precision_end = cp;\n\t\t  if (max_precision_length < 2)\n\t\t    max_precision_length = 2;\n\n\t\t  /* Test for positional argument.  */\n\t\t  if (*cp >= '0' && *cp <= '9')\n\t\t    {\n\t\t      const CHAR_T *np;\n\n\t\t      for (np = cp; *np >= '0' && *np <= '9'; np++)\n\t\t\t;\n\t\t      if (*np == '$')\n\t\t\t{\n\t\t\t  size_t n = 0;\n\n\t\t\t  for (np = cp; *np >= '0' && *np <= '9'; np++)\n\t\t\t    n = xsum (xtimes (n, 10), *np - '0');\n\t\t\t  if (n == 0)\n\t\t\t    /* Positional argument 0.  */\n\t\t\t    goto error;\n\t\t\t  if (size_overflow_p (n))\n\t\t\t    /* n too large, would lead to out of memory\n\t\t\t       later.  */\n\t\t\t    goto error;\n\t\t\t  dp->precision_arg_index = n - 1;\n\t\t\t  cp = np + 1;\n\t\t\t}\n\t\t    }\n\t\t  if (dp->precision_arg_index == ARG_NONE)\n\t\t    {\n\t\t      dp->precision_arg_index = arg_posn++;\n\t\t      if (dp->precision_arg_index == ARG_NONE)\n\t\t\t/* arg_posn wrapped around.  */\n\t\t\tgoto error;\n\t\t    }\n\t\t  REGISTER_ARG (dp->precision_arg_index, TYPE_INT);\n\t\t}\n\t      else\n\t\t{\n\t\t  size_t precision_length;\n\n\t\t  dp->precision_start = cp - 1;\n\t\t  for (; *cp >= '0' && *cp <= '9'; cp++)\n\t\t    ;\n\t\t  dp->precision_end = cp;\n\t\t  precision_length = dp->precision_end - dp->precision_start;\n\t\t  if (max_precision_length < precision_length)\n\t\t    max_precision_length = precision_length;\n\t\t}\n\t    }\n\n\t  {\n\t    arg_type type;\n\n\t    /* Parse argument type/size specifiers.  */\n\t    {\n\t      int flags = 0;\n\n\t      for (;;)\n\t\t{\n\t\t  if (*cp == 'h')\n\t\t    {\n\t\t      flags |= (1 << (flags & 1));\n\t\t      cp++;\n\t\t    }\n\t\t  else if (*cp == 'L')\n\t\t    {\n\t\t      flags |= 4;\n\t\t      cp++;\n\t\t    }\n\t\t  else if (*cp == 'l')\n\t\t    {\n\t\t      flags += 8;\n\t\t      cp++;\n\t\t    }\n#ifdef HAVE_INTMAX_T\n\t\t  else if (*cp == 'j')\n\t\t    {\n\t\t      if (sizeof (intmax_t) > sizeof (long))\n\t\t\t{\n\t\t\t  /* intmax_t = long long */\n\t\t\t  flags += 16;\n\t\t\t}\n\t\t      else if (sizeof (intmax_t) > sizeof (int))\n\t\t\t{\n\t\t\t  /* intmax_t = long */\n\t\t\t  flags += 8;\n\t\t\t}\n\t\t      cp++;\n\t\t    }\n#endif\n\t\t  else if (*cp == 'z' || *cp == 'Z')\n\t\t    {\n\t\t      /* 'z' is standardized in ISO C 99, but glibc uses 'Z'\n\t\t\t because the warning facility in gcc-2.95.2 understands\n\t\t\t only 'Z' (see gcc-2.95.2/gcc/c-common.c:1784).  */\n\t\t      if (sizeof (size_t) > sizeof (long))\n\t\t\t{\n\t\t\t  /* size_t = long long */\n\t\t\t  flags += 16;\n\t\t\t}\n\t\t      else if (sizeof (size_t) > sizeof (int))\n\t\t\t{\n\t\t\t  /* size_t = long */\n\t\t\t  flags += 8;\n\t\t\t}\n\t\t      cp++;\n\t\t    }\n\t\t  else if (*cp == 't')\n\t\t    {\n\t\t      if (sizeof (ptrdiff_t) > sizeof (long))\n\t\t\t{\n\t\t\t  /* ptrdiff_t = long long */\n\t\t\t  flags += 16;\n\t\t\t}\n\t\t      else if (sizeof (ptrdiff_t) > sizeof (int))\n\t\t\t{\n\t\t\t  /* ptrdiff_t = long */\n\t\t\t  flags += 8;\n\t\t\t}\n\t\t      cp++;\n\t\t    }\n\t\t  else\n\t\t    break;\n\t\t}\n\n\t      /* Read the conversion character.  */\n\t      c = *cp++;\n\t      switch (c)\n\t\t{\n\t\tcase 'd': case 'i':\n#ifdef HAVE_LONG_LONG\n\t\t  if (flags >= 16 || (flags & 4))\n\t\t    type = TYPE_LONGLONGINT;\n\t\t  else\n#endif\n\t\t  if (flags >= 8)\n\t\t    type = TYPE_LONGINT;\n\t\t  else if (flags & 2)\n\t\t    type = TYPE_SCHAR;\n\t\t  else if (flags & 1)\n\t\t    type = TYPE_SHORT;\n\t\t  else\n\t\t    type = TYPE_INT;\n\t\t  break;\n\t\tcase 'o': case 'u': case 'x': case 'X':\n#ifdef HAVE_LONG_LONG\n\t\t  if (flags >= 16 || (flags & 4))\n\t\t    type = TYPE_ULONGLONGINT;\n\t\t  else\n#endif\n\t\t  if (flags >= 8)\n\t\t    type = TYPE_ULONGINT;\n\t\t  else if (flags & 2)\n\t\t    type = TYPE_UCHAR;\n\t\t  else if (flags & 1)\n\t\t    type = TYPE_USHORT;\n\t\t  else\n\t\t    type = TYPE_UINT;\n\t\t  break;\n\t\tcase 'f': case 'F': case 'e': case 'E': case 'g': case 'G':\n\t\tcase 'a': case 'A':\n#ifdef HAVE_LONG_DOUBLE\n\t\t  if (flags >= 16 || (flags & 4))\n\t\t    type = TYPE_LONGDOUBLE;\n\t\t  else\n#endif\n\t\t  type = TYPE_DOUBLE;\n\t\t  break;\n\t\tcase 'c':\n\t\t  if (flags >= 8)\n#ifdef HAVE_WINT_T\n\t\t    type = TYPE_WIDE_CHAR;\n#else\n\t\t    goto error;\n#endif\n\t\t  else\n\t\t    type = TYPE_CHAR;\n\t\t  break;\n#ifdef HAVE_WINT_T\n\t\tcase 'C':\n\t\t  type = TYPE_WIDE_CHAR;\n\t\t  c = 'c';\n\t\t  break;\n#endif\n\t\tcase 's':\n\t\t  if (flags >= 8)\n#ifdef HAVE_WCHAR_T\n\t\t    type = TYPE_WIDE_STRING;\n#else\n\t\t    goto error;\n#endif\n\t\t  else\n\t\t    type = TYPE_STRING;\n\t\t  break;\n#ifdef HAVE_WCHAR_T\n\t\tcase 'S':\n\t\t  type = TYPE_WIDE_STRING;\n\t\t  c = 's';\n\t\t  break;\n#endif\n\t\tcase 'p':\n\t\t  type = TYPE_POINTER;\n\t\t  break;\n\t\tcase 'n':\n#ifdef HAVE_LONG_LONG\n\t\t  if (flags >= 16 || (flags & 4))\n\t\t    type = TYPE_COUNT_LONGLONGINT_POINTER;\n\t\t  else\n#endif\n\t\t  if (flags >= 8)\n\t\t    type = TYPE_COUNT_LONGINT_POINTER;\n\t\t  else if (flags & 2)\n\t\t    type = TYPE_COUNT_SCHAR_POINTER;\n\t\t  else if (flags & 1)\n\t\t    type = TYPE_COUNT_SHORT_POINTER;\n\t\t  else\n\t\t    type = TYPE_COUNT_INT_POINTER;\n\t\t  break;\n\t\tcase '%':\n\t\t  type = TYPE_NONE;\n\t\t  break;\n\t\tdefault:\n\t\t  /* Unknown conversion character.  */\n\t\t  goto error;\n\t\t}\n\t    }\n\n\t    if (type != TYPE_NONE)\n\t      {\n\t\tdp->arg_index = arg_index;\n\t\tif (dp->arg_index == ARG_NONE)\n\t\t  {\n\t\t    dp->arg_index = arg_posn++;\n\t\t    if (dp->arg_index == ARG_NONE)\n\t\t      /* arg_posn wrapped around.  */\n\t\t      goto error;\n\t\t  }\n\t\tREGISTER_ARG (dp->arg_index, type);\n\t      }\n\t    dp->conversion = c;\n\t    dp->dir_end = cp;\n\t  }\n\n\t  d->count++;\n\t  if (d->count >= d_allocated)\n\t    {\n\t      size_t memory_size;\n\t      DIRECTIVE *memory;\n\n\t      d_allocated = xtimes (d_allocated, 2);\n\t      memory_size = xtimes (d_allocated, sizeof (DIRECTIVE));\n\t      if (size_overflow_p (memory_size))\n\t\t/* Overflow, would lead to out of memory.  */\n\t\tgoto error;\n\t      memory = realloc (d->dir, memory_size);\n\t      if (memory == NULL)\n\t\t/* Out of memory.  */\n\t\tgoto error;\n\t      d->dir = memory;\n\t    }\n\t}\n    }\n  d->dir[d->count].dir_start = cp;\n\n  d->max_width_length = max_width_length;\n  d->max_precision_length = max_precision_length;\n  return 0;\n\nerror:\n  if (a->arg)\n    free (a->arg);\n  if (d->dir)\n    free (d->dir);\n  return -1;\n}\n\n#undef DIRECTIVES\n#undef DIRECTIVE\n#undef CHAR_T\n#undef PRINTF_PARSE\n\n\n#ifndef IN_LIBINTL\n# include <alloca.h>\n#endif\n\n\n#include <string.h>\t/* memcpy(), strlen() */\n#include <errno.h>\t/* errno */\n#include <float.h>\t/* DBL_MAX_EXP, LDBL_MAX_EXP */\n\n/* Some systems, like OSF/1 4.0 and Woe32, don't have EOVERFLOW.  */\n#ifndef EOVERFLOW\n# define EOVERFLOW E2BIG\n#endif\n\n#ifdef HAVE_WCHAR_T\n# ifdef HAVE_WCSLEN\n#  define local_wcslen wcslen\n# else\n   /* Solaris 2.5.1 has wcslen() in a separate library libw.so. To avoid\n      a dependency towards this library, here is a local substitute.\n      Define this substitute only once, even if this file is included\n      twice in the same compilation unit.  */\n#  ifndef local_wcslen_defined\n#   define local_wcslen_defined 1\nstatic size_t local_wcslen (const wchar_t *s) {\n  const wchar_t *ptr;\n\n  for (ptr = s; *ptr != (wchar_t) 0; ptr++)\n    ;\n  return ptr - s;\n}\n#  endif\n# endif\n#endif\n\n#if WIDE_CHAR_VERSION\n# define VASNPRINTF vasnwprintf\n# define CHAR_T wchar_t\n# define DIRECTIVE wchar_t_directive\n# define DIRECTIVES wchar_t_directives\n# define PRINTF_PARSE wprintf_parse\n# define USE_SNPRINTF 1\n# if HAVE_DECL__SNWPRINTF\n   /* On Windows, the function swprintf() has a different signature than\n      on Unix; we use the _snwprintf() function instead.  */\n#  define SNPRINTF _snwprintf\n# else\n   /* Unix.  */\n#  define SNPRINTF swprintf\n# endif\n#else\n# define VASNPRINTF vasnprintf\n# define CHAR_T char\n# define DIRECTIVE char_directive\n# define DIRECTIVES char_directives\n# define PRINTF_PARSE printf_parse\n# define USE_SNPRINTF (HAVE_DECL__SNPRINTF || HAVE_SNPRINTF)\n# if HAVE_DECL__SNPRINTF\n   /* Windows.  */\n#  define SNPRINTF _snprintf\n# else\n   /* Unix.  */\n#  define SNPRINTF snprintf\n# endif\n#endif\n\nCHAR_T * VASNPRINTF (CHAR_T *resultbuf, size_t *lengthp, const CHAR_T *format, va_list args) {\n  DIRECTIVES d;\n  arguments a;\n\n  if (PRINTF_PARSE (format, &d, &a) < 0) {\n      errno = EINVAL;\n      return NULL;\n  }\n\n#define CLEANUP() \\\n  free (d.dir);\t\t\t\t\t\t\t\t\\\n  if (a.arg)\t\t\t\t\t\t\t\t\\\n    free (a.arg);\n\n  if (printf_fetchargs (args, &a) < 0) {\n      CLEANUP ();\n      errno = EINVAL;\n      return NULL;\n  }\n\n  {\n    size_t buf_neededlength;\n    CHAR_T *buf;\n    CHAR_T *buf_malloced;\n    const CHAR_T *cp;\n    size_t i;\n    DIRECTIVE *dp;\n    /* Output string accumulator.  */\n    CHAR_T *result;\n    size_t allocated;\n    size_t length;\n\n    /* Allocate a small buffer that will hold a directive passed to\n       sprintf or snprintf.  */\n    buf_neededlength =\n      xsum4 (7, d.max_width_length, d.max_precision_length, 6);\n#if HAVE_ALLOCA\n    if (buf_neededlength < 4000 / sizeof (CHAR_T)) {\n        buf = (CHAR_T *) alloca (buf_neededlength * sizeof (CHAR_T));\n        buf_malloced = NULL;\n    } else\n#endif\n    {\n        size_t buf_memsize = xtimes (buf_neededlength, sizeof (CHAR_T));\n        if (size_overflow_p (buf_memsize))\n          goto out_of_memory_1;\n        buf = (CHAR_T *) malloc (buf_memsize);\n        if (buf == NULL)\n          goto out_of_memory_1;\n        buf_malloced = buf;\n    }\n\n    if (resultbuf != NULL) {\n        result = resultbuf;\n        allocated = *lengthp;\n    } else {\n        result = NULL;\n        allocated = 0;\n    }\n    length = 0;\n    /* Invariants:\n       result is either == resultbuf or == NULL or malloc-allocated.\n       If length > 0, then result != NULL.  */\n\n    /* Ensures that allocated >= needed.  Aborts through a jump to\n       out_of_memory if needed is SIZE_MAX or otherwise too big.  */\n#define ENSURE_ALLOCATION(needed) \\\n    if ((needed) > allocated)\t\t\t\t\t\t     \\\n      {\t\t\t\t\t\t\t\t\t     \\\n\tsize_t memory_size;\t\t\t\t\t\t     \\\n\tCHAR_T *memory;\t\t\t\t\t\t\t     \\\n\t\t\t\t\t\t\t\t\t     \\\n\tallocated = (allocated > 0 ? xtimes (allocated, 2) : 12);\t     \\\n\tif ((needed) > allocated)\t\t\t\t\t     \\\n\t  allocated = (needed);\t\t\t\t\t\t     \\\n\tmemory_size = xtimes (allocated, sizeof (CHAR_T));\t\t     \\\n\tif (size_overflow_p (memory_size))\t\t\t\t     \\\n\t  goto out_of_memory;\t\t\t\t\t\t     \\\n\tif (result == resultbuf || result == NULL)\t\t\t     \\\n\t  memory = (CHAR_T *) malloc (memory_size);\t\t\t     \\\n\telse\t\t\t\t\t\t\t\t     \\\n\t  memory = (CHAR_T *) realloc (result, memory_size);\t\t     \\\n\tif (memory == NULL)\t\t\t\t\t\t     \\\n\t  goto out_of_memory;\t\t\t\t\t\t     \\\n\tif (result == resultbuf && length > 0)\t\t\t\t     \\\n\t  memcpy (memory, result, length * sizeof (CHAR_T));\t\t     \\\n\tresult = memory;\t\t\t\t\t\t     \\\n      }\n\n    for (cp = format, i = 0, dp = &d.dir[0]; ; cp = dp->dir_end, i++, dp++)\n      {\n\tif (cp != dp->dir_start)\n\t  {\n\t    size_t n = dp->dir_start - cp;\n\t    size_t augmented_length = xsum (length, n);\n\n\t    ENSURE_ALLOCATION (augmented_length);\n\t    memcpy (result + length, cp, n * sizeof (CHAR_T));\n\t    length = augmented_length;\n\t  }\n\tif (i == d.count)\n\t  break;\n\n\t/* Execute a single directive.  */\n\tif (dp->conversion == '%') {\n\t    size_t augmented_length;\n\n\t    if (!(dp->arg_index == ARG_NONE))\n\t      abort ();\n\t    augmented_length = xsum (length, 1);\n\t    ENSURE_ALLOCATION (augmented_length);\n\t    result[length] = '%';\n\t    length = augmented_length;\n\t} else {\n\t    if (!(dp->arg_index != ARG_NONE))\n\t      abort ();\n\n\t    if (dp->conversion == 'n') {\n\t\tswitch (a.arg[dp->arg_index].type) {\n\t\t  case TYPE_COUNT_SCHAR_POINTER:\n\t\t    *a.arg[dp->arg_index].a.a_count_schar_pointer = length;\n\t\t    break;\n\t\t  case TYPE_COUNT_SHORT_POINTER:\n\t\t    *a.arg[dp->arg_index].a.a_count_short_pointer = length;\n\t\t    break;\n\t\t  case TYPE_COUNT_INT_POINTER:\n\t\t    *a.arg[dp->arg_index].a.a_count_int_pointer = length;\n\t\t    break;\n\t\t  case TYPE_COUNT_LONGINT_POINTER:\n\t\t    *a.arg[dp->arg_index].a.a_count_longint_pointer = length;\n\t\t    break;\n#ifdef HAVE_LONG_LONG\n\t\t  case TYPE_COUNT_LONGLONGINT_POINTER:\n\t\t    *a.arg[dp->arg_index].a.a_count_longlongint_pointer = length;\n\t\t    break;\n#endif\n\t\t  default:\n\t\t    abort ();\n\t\t  }\n\t      }\n\t    else {\n\t\targ_type type = a.arg[dp->arg_index].type;\n\t\tCHAR_T *p;\n\t\tunsigned int prefix_count;\n\t\tint prefixes[2];\n#if !USE_SNPRINTF\n\t\tsize_t tmp_length;\n\t\tCHAR_T tmpbuf[700];\n\t\tCHAR_T *tmp;\n\n\t\t/* Allocate a temporary buffer of sufficient size for calling\n\t\t   sprintf.  */\n\t\t{\n\t\t  size_t width;\n\t\t  size_t precision;\n\n\t\t  width = 0;\n\t\t  if (dp->width_start != dp->width_end) {\n\t\t      if (dp->width_arg_index != ARG_NONE) {\n                  int arg;\n\n                  if (!(a.arg[dp->width_arg_index].type == TYPE_INT))\n                    abort ();\n                  arg = a.arg[dp->width_arg_index].a.a_int;\n                  width = (arg < 0 ? (unsigned int) (-arg) : arg);\n\t\t\t  } else {\n                  const CHAR_T *digitp = dp->width_start;\n\n                  do\n                    width = xsum (xtimes (width, 10), *digitp++ - '0');\n                  while (digitp != dp->width_end);\n\t\t\t  }\n\t\t  }\n\n\t\t  precision = 6;\n\t\t  if (dp->precision_start != dp->precision_end) {\n\t\t      if (dp->precision_arg_index != ARG_NONE) {\n                  int arg;\n\n                  if (!(a.arg[dp->precision_arg_index].type == TYPE_INT))\n                    abort ();\n                  arg = a.arg[dp->precision_arg_index].a.a_int;\n                  precision = (arg < 0 ? 0 : arg);\n\t\t      } else {\n                  const CHAR_T *digitp = dp->precision_start + 1;\n\n                  precision = 0;\n                  while (digitp != dp->precision_end)\n                    precision = xsum (xtimes (precision, 10), *digitp++ - '0');\n              }\n\t\t  }\n\n\t\t  switch (dp->conversion) {\n\n\t\t    case 'd': case 'i': case 'u':\n# ifdef HAVE_LONG_LONG\n\t\t      if (type == TYPE_LONGLONGINT || type == TYPE_ULONGLONGINT)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned long long) * CHAR_BIT\n\t\t\t\t\t  * 0.30103 /* binary -> decimal */\n\t\t\t\t\t  * 2 /* estimate for FLAG_GROUP */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 1; /* account for leading sign */\n\t\t      else\n# endif\n\t\t      if (type == TYPE_LONGINT || type == TYPE_ULONGINT)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned long) * CHAR_BIT\n\t\t\t\t\t  * 0.30103 /* binary -> decimal */\n\t\t\t\t\t  * 2 /* estimate for FLAG_GROUP */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 1; /* account for leading sign */\n\t\t      else\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned int) * CHAR_BIT\n\t\t\t\t\t  * 0.30103 /* binary -> decimal */\n\t\t\t\t\t  * 2 /* estimate for FLAG_GROUP */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 1; /* account for leading sign */\n\t\t      break;\n\n\t\t    case 'o':\n# ifdef HAVE_LONG_LONG\n\t\t      if (type == TYPE_LONGLONGINT || type == TYPE_ULONGLONGINT)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned long long) * CHAR_BIT\n\t\t\t\t\t  * 0.333334 /* binary -> octal */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 1; /* account for leading sign */\n\t\t      else\n# endif\n\t\t      if (type == TYPE_LONGINT || type == TYPE_ULONGINT)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned long) * CHAR_BIT\n\t\t\t\t\t  * 0.333334 /* binary -> octal */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 1; /* account for leading sign */\n\t\t      else\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned int) * CHAR_BIT\n\t\t\t\t\t  * 0.333334 /* binary -> octal */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 1; /* account for leading sign */\n\t\t      break;\n\n\t\t    case 'x': case 'X':\n# ifdef HAVE_LONG_LONG\n\t\t      if (type == TYPE_LONGLONGINT || type == TYPE_ULONGLONGINT)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned long long) * CHAR_BIT\n\t\t\t\t\t  * 0.25 /* binary -> hexadecimal */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 2; /* account for leading sign or alternate form */\n\t\t      else\n# endif\n\t\t      if (type == TYPE_LONGINT || type == TYPE_ULONGINT)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned long) * CHAR_BIT\n\t\t\t\t\t  * 0.25 /* binary -> hexadecimal */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 2; /* account for leading sign or alternate form */\n\t\t      else\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (sizeof (unsigned int) * CHAR_BIT\n\t\t\t\t\t  * 0.25 /* binary -> hexadecimal */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 2; /* account for leading sign or alternate form */\n\t\t      break;\n\n\t\t    case 'f': case 'F':\n# ifdef HAVE_LONG_DOUBLE\n\t\t      if (type == TYPE_LONGDOUBLE)\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (LDBL_MAX_EXP\n\t\t\t\t\t  * 0.30103 /* binary -> decimal */\n\t\t\t\t\t  * 2 /* estimate for FLAG_GROUP */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 10; /* sign, decimal point etc. */\n\t\t      else\n# endif\n\t\t\ttmp_length =\n\t\t\t  (unsigned int) (DBL_MAX_EXP\n\t\t\t\t\t  * 0.30103 /* binary -> decimal */\n\t\t\t\t\t  * 2 /* estimate for FLAG_GROUP */\n\t\t\t\t\t )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 10; /* sign, decimal point etc. */\n\t\t      tmp_length = xsum (tmp_length, precision);\n\t\t      break;\n\n\t\t    case 'e': case 'E': case 'g': case 'G':\n\t\t    case 'a': case 'A':\n\t\t      tmp_length =\n\t\t\t12; /* sign, decimal point, exponent etc. */\n\t\t      tmp_length = xsum (tmp_length, precision);\n\t\t      break;\n\n\t\t    case 'c':\n# if defined HAVE_WINT_T && !WIDE_CHAR_VERSION\n\t\t      if (type == TYPE_WIDE_CHAR)\n\t\t\ttmp_length = MB_CUR_MAX;\n\t\t      else\n# endif\n\t\t\ttmp_length = 1;\n\t\t      break;\n\n\t\t    case 's':\n# ifdef HAVE_WCHAR_T\n\t\t      if (type == TYPE_WIDE_STRING) {\n\t\t\t  tmp_length =\n\t\t\t    local_wcslen (a.arg[dp->arg_index].a.a_wide_string);\n\n#  if !WIDE_CHAR_VERSION\n\t\t\t  tmp_length = xtimes (tmp_length, MB_CUR_MAX);\n#  endif\n\t\t\t}\n\t\t      else\n# endif\n\t\t\ttmp_length = strlen (a.arg[dp->arg_index].a.a_string);\n\t\t      break;\n\n\t\t    case 'p':\n\t\t      tmp_length =\n\t\t\t(unsigned int) (sizeof (void *) * CHAR_BIT\n\t\t\t\t\t* 0.25 /* binary -> hexadecimal */\n\t\t\t\t       )\n\t\t\t  + 1 /* turn floor into ceil */\n\t\t\t  + 2; /* account for leading 0x */\n\t\t      break;\n\n\t\t    default:\n\t\t      abort ();\n\t\t    }\n\n\t\t  if (tmp_length < width)\n\t\t    tmp_length = width;\n\n\t\t  tmp_length = xsum (tmp_length, 1); /* account for trailing NUL */\n\t\t}\n\n\t\tif (tmp_length <= sizeof (tmpbuf) / sizeof (CHAR_T))\n\t\t  tmp = tmpbuf;\n\t\telse {\n\t\t    size_t tmp_memsize = xtimes (tmp_length, sizeof (CHAR_T));\n\n\t\t    if (size_overflow_p (tmp_memsize))\n\t\t      /* Overflow, would lead to out of memory.  */\n\t\t      goto out_of_memory;\n\t\t    tmp = (CHAR_T *) malloc (tmp_memsize);\n\t\t    if (tmp == NULL)\n\t\t      /* Out of memory.  */\n\t\t      goto out_of_memory;\n\t\t}\n#endif\n\n\t\t/* Construct the format string for calling snprintf or\n\t\t   sprintf.  */\n\t\tp = buf;\n\t\t*p++ = '%';\n\t\tif (dp->flags & FLAG_GROUP)\n\t\t  *p++ = '\\'';\n\t\tif (dp->flags & FLAG_LEFT)\n\t\t  *p++ = '-';\n\t\tif (dp->flags & FLAG_SHOWSIGN)\n\t\t  *p++ = '+';\n\t\tif (dp->flags & FLAG_SPACE)\n\t\t  *p++ = ' ';\n\t\tif (dp->flags & FLAG_ALT)\n\t\t  *p++ = '#';\n\t\tif (dp->flags & FLAG_ZERO)\n\t\t  *p++ = '0';\n\t\tif (dp->width_start != dp->width_end) {\n\t\t    size_t n = dp->width_end - dp->width_start;\n\t\t    memcpy (p, dp->width_start, n * sizeof (CHAR_T));\n\t\t    p += n;\n\t\t}\n\t\tif (dp->precision_start != dp->precision_end) {\n\t\t    size_t n = dp->precision_end - dp->precision_start;\n\t\t    memcpy (p, dp->precision_start, n * sizeof (CHAR_T));\n\t\t    p += n;\n\t\t}\n\n\t\tswitch (type) {\n#ifdef HAVE_LONG_LONG\n\t\t  case TYPE_LONGLONGINT:\n\t\t  case TYPE_ULONGLONGINT:\n\t\t    *p++ = 'l';\n\t\t    /*FALLTHROUGH*/\n#endif\n\t\t  case TYPE_LONGINT:\n\t\t  case TYPE_ULONGINT:\n#ifdef HAVE_WINT_T\n\t\t  case TYPE_WIDE_CHAR:\n#endif\n#ifdef HAVE_WCHAR_T\n\t\t  case TYPE_WIDE_STRING:\n#endif\n\t\t    *p++ = 'l';\n\t\t    break;\n#ifdef HAVE_LONG_DOUBLE\n\t\t  case TYPE_LONGDOUBLE:\n\t\t    *p++ = 'L';\n\t\t    break;\n#endif\n\t\t  default:\n\t\t    break;\n\t\t  }\n\t\t*p = dp->conversion;\n#if USE_SNPRINTF\n\t\tp[1] = '%';\n\t\tp[2] = 'n';\n\t\tp[3] = '\\0';\n#else\n\t\tp[1] = '\\0';\n#endif\n\n\t\t/* Construct the arguments for calling snprintf or sprintf.  */\n\t\tprefix_count = 0;\n\t\tif (dp->width_arg_index != ARG_NONE) {\n\t\t    if (!(a.arg[dp->width_arg_index].type == TYPE_INT))\n\t\t      abort ();\n\t\t    prefixes[prefix_count++] = a.arg[dp->width_arg_index].a.a_int;\n\t    }\n\t\tif (dp->precision_arg_index != ARG_NONE) {\n\t\t    if (!(a.arg[dp->precision_arg_index].type == TYPE_INT))\n\t\t      abort ();\n\t\t    prefixes[prefix_count++] = a.arg[dp->precision_arg_index].a.a_int;\n\t    }\n\n#if USE_SNPRINTF\n\t\t/* Prepare checking whether snprintf returns the count\n\t\t   via %n.  */\n\t\tENSURE_ALLOCATION (xsum (length, 1));\n\t\tresult[length] = '\\0';\n#endif\n\n\t\tfor (;;) {\n\t\t    size_t maxlen = allocated - length;\n\t\t    int count = -1;\n\n#if USE_SNPRINTF\n\t\t    int retcount = 0;\n# define SNPRINTF_BUF(arg) \\\n\t\t    switch (prefix_count)\t\t\t\t    \\\n\t\t      {\t\t\t\t\t\t\t    \\\n\t\t      case 0:\t\t\t\t\t\t    \\\n\t\t\tretcount = SNPRINTF (result + length, maxlen, buf,  \\\n\t\t\t\t\t     arg, &count);\t\t    \\\n\t\t\tbreak;\t\t\t\t\t\t    \\\n\t\t      case 1:\t\t\t\t\t\t    \\\n\t\t\tretcount = SNPRINTF (result + length, maxlen, buf,  \\\n\t\t\t\t\t     prefixes[0], arg, &count);\t    \\\n\t\t\tbreak;\t\t\t\t\t\t    \\\n\t\t      case 2:\t\t\t\t\t\t    \\\n\t\t\tretcount = SNPRINTF (result + length, maxlen, buf,  \\\n\t\t\t\t\t     prefixes[0], prefixes[1], arg, \\\n\t\t\t\t\t     &count);\t\t\t    \\\n\t\t\tbreak;\t\t\t\t\t\t    \\\n\t\t      default:\t\t\t\t\t\t    \\\n\t\t\tabort ();\t\t\t\t\t    \\\n\t\t      }\n#else\n# define SNPRINTF_BUF(arg) \\\n\t\t    switch (prefix_count)\t\t\t\t    \\\n\t\t      {\t\t\t\t\t\t\t    \\\n\t\t      case 0:\t\t\t\t\t\t    \\\n\t\t\tcount = sprintf (tmp, buf, arg);\t\t    \\\n\t\t\tbreak;\t\t\t\t\t\t    \\\n\t\t      case 1:\t\t\t\t\t\t    \\\n\t\t\tcount = sprintf (tmp, buf, prefixes[0], arg);\t    \\\n\t\t\tbreak;\t\t\t\t\t\t    \\\n\t\t      case 2:\t\t\t\t\t\t    \\\n\t\t\tcount = sprintf (tmp, buf, prefixes[0], prefixes[1],\\\n\t\t\t\t\t arg);\t\t\t\t    \\\n\t\t\tbreak;\t\t\t\t\t\t    \\\n\t\t      default:\t\t\t\t\t\t    \\\n\t\t\tabort ();\t\t\t\t\t    \\\n\t\t      }\n#endif\n\n\t\t    switch (type) {\n\t\t      case TYPE_SCHAR: {\n\t\t\t  int arg = a.arg[dp->arg_index].a.a_schar;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_UCHAR: {\n\t\t\t  unsigned int arg = a.arg[dp->arg_index].a.a_uchar;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_SHORT: {\n\t\t\t  int arg = a.arg[dp->arg_index].a.a_short;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_USHORT: {\n\t\t\t  unsigned int arg = a.arg[dp->arg_index].a.a_ushort;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_INT: {\n\t\t\t  int arg = a.arg[dp->arg_index].a.a_int;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_UINT: {\n\t\t\t  unsigned int arg = a.arg[dp->arg_index].a.a_uint;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_LONGINT: {\n\t\t\t  long int arg = a.arg[dp->arg_index].a.a_longint;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_ULONGINT: {\n\t\t\t  unsigned long int arg = a.arg[dp->arg_index].a.a_ulongint;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#ifdef HAVE_LONG_LONG\n\t\t      case TYPE_LONGLONGINT: {\n\t\t\t  long long int arg = a.arg[dp->arg_index].a.a_longlongint;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      case TYPE_ULONGLONGINT: {\n\t\t\t  unsigned long long int arg = a.arg[dp->arg_index].a.a_ulonglongint;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#endif\n\t\t      case TYPE_DOUBLE: {\n\t\t\t  double arg = a.arg[dp->arg_index].a.a_double;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#ifdef HAVE_LONG_DOUBLE\n\t\t      case TYPE_LONGDOUBLE: {\n\t\t\t  long double arg = a.arg[dp->arg_index].a.a_longdouble;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#endif\n\t\t      case TYPE_CHAR: {\n\t\t\t  int arg = a.arg[dp->arg_index].a.a_char;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#ifdef HAVE_WINT_T\n\t\t      case TYPE_WIDE_CHAR: {\n\t\t\t  wint_t arg = a.arg[dp->arg_index].a.a_wide_char;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#endif\n\t\t      case TYPE_STRING: {\n\t\t\t  const char *arg = a.arg[dp->arg_index].a.a_string;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#ifdef HAVE_WCHAR_T\n\t\t      case TYPE_WIDE_STRING: {\n\t\t\t  const wchar_t *arg = a.arg[dp->arg_index].a.a_wide_string;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n#endif\n\t\t      case TYPE_POINTER: {\n\t\t\t  void *arg = a.arg[dp->arg_index].a.a_pointer;\n\t\t\t  SNPRINTF_BUF (arg);\n\t\t\t}\n\t\t\tbreak;\n\t\t      default:\n\t\t\tabort ();\n\t      }\n\n#if USE_SNPRINTF\n\t\t    /* Portability: Not all implementations of snprintf()\n\t\t       are ISO C 99 compliant.  Determine the number of\n\t\t       bytes that snprintf() has produced or would have\n\t\t       produced.  */\n\t\t    if (count >= 0) {\n                /* Verify that snprintf() has NUL-terminated its\n                   result.  */\n                if (count < maxlen && result[length + count] != '\\0')\n                  abort ();\n                /* Portability hack.  */\n                if (retcount > count)\n                  count = retcount;\n\t\t    } else {\n                /* snprintf() doesn't understand the '%n'\n                   directive.  */\n                if (p[1] != '\\0') {\n                    /* Don't use the '%n' directive; instead, look\n                       at the snprintf() return value.  */\n                    p[1] = '\\0';\n                    continue;\n                } else {\n                    /* Look at the snprintf() return value.  */\n                    if (retcount < 0) {\n                    /* HP-UX 10.20 snprintf() is doubly deficient:\n                       It doesn't understand the '%n' directive,\n                       *and* it returns -1 (rather than the length\n                       that would have been required) when the\n                       buffer is too small.  */\n                    size_t bigger_need =\n                      xsum (xtimes (allocated, 2), 12);\n                    ENSURE_ALLOCATION (bigger_need);\n                    continue;\n                    } else\n                      count = retcount;\n               }\n\t\t    }\n#endif\n\n\t\t    /* Attempt to handle failure.  */\n\t\t    if (count < 0) {\n                if (!(result == resultbuf || result == NULL))\n                  free (result);\n                if (buf_malloced != NULL)\n                  free (buf_malloced);\n                CLEANUP ();\n                errno = EINVAL;\n                return NULL;\n\t\t    }\n\n#if !USE_SNPRINTF\n\t\t    if (count >= tmp_length)\n\t\t      /* tmp_length was incorrectly calculated - fix the\n\t\t\t code above!  */\n\t\t      abort ();\n#endif\n\n\t\t    /* Make room for the result.  */\n\t\t    if (count >= maxlen)\n\t\t      {\n\t\t\t/* Need at least count bytes.  But allocate\n\t\t\t   proportionally, to avoid looping eternally if\n\t\t\t   snprintf() reports a too small count.  */\n\t\t\tsize_t n =\n\t\t\t  xmax (xsum (length, count), xtimes (allocated, 2));\n\n\t\t\tENSURE_ALLOCATION (n);\n#if USE_SNPRINTF\n\t\t\tcontinue;\n#endif\n\t\t      }\n\n#if USE_SNPRINTF\n\t\t    /* The snprintf() result did fit.  */\n#else\n\t\t    /* Append the sprintf() result.  */\n\t\t    memcpy (result + length, tmp, count * sizeof (CHAR_T));\n\t\t    if (tmp != tmpbuf)\n\t\t      free (tmp);\n#endif\n\n\t\t    length += count;\n\t\t    break;\n\t\t  }\n\t      }\n\t  }\n      }\n\n    /* Add the final NUL.  */\n    ENSURE_ALLOCATION (xsum (length, 1));\n    result[length] = '\\0';\n\n    if (result != resultbuf && length + 1 < allocated)\n      {\n\t/* Shrink the allocated memory if possible.  */\n\tCHAR_T *memory;\n\n\tmemory = (CHAR_T *) realloc (result, (length + 1) * sizeof (CHAR_T));\n\tif (memory != NULL)\n\t  result = memory;\n      }\n\n    if (buf_malloced != NULL)\n      free (buf_malloced);\n    CLEANUP ();\n    *lengthp = length;\n    if (length > INT_MAX)\n      goto length_overflow;\n    return result;\n\n  length_overflow:\n    /* We could produce such a big string, but its length doesn't fit into\n       an 'int'.  POSIX says that snprintf() fails with errno = EOVERFLOW in\n       this case.  */\n    if (result != resultbuf)\n      free (result);\n    errno = EOVERFLOW;\n    return NULL;\n\n  out_of_memory:\n    if (!(result == resultbuf || result == NULL))\n      free (result);\n    if (buf_malloced != NULL)\n      free (buf_malloced);\n  out_of_memory_1:\n    CLEANUP ();\n    errno = ENOMEM;\n    return NULL;\n  }\n}\n\n#undef SNPRINTF\n#undef USE_SNPRINTF\n#undef PRINTF_PARSE\n#undef DIRECTIVES\n#undef DIRECTIVE\n#undef CHAR_T\n#undef VASNPRINTF\n\nint vasprintf (char **resultp, const char *format, va_list args) {\n  size_t length;\n  char *result = vasnprintf (NULL, &length, format, args);\n  if (result == NULL)\n    return -1;\n\n  *resultp = result;\n  /* Return the number of resulting bytes, excluding the trailing NUL.\n     If it wouldn't fit in an 'int', vasnprintf() would have returned NULL\n     and set errno to EOVERFLOW.  */\n  return length;\n}\n#endif\n    /** \\endcond */ //Doxygen ignore.\n"
  },
  {
    "path": "cmd/Makefile.am",
    "content": "#The programs\nbin_PROGRAMS = \\\n\tapop_text_to_db \\\n\tapop_db_to_crosstab \\\n\tapop_plot_query\n\nAM_CFLAGS = \\\n\t-I $(top_srcdir) \\\n\t$(GSL_CFLAGS)\n\nLDADD = \\\n\t$(top_builddir)/libapophenia.la \\\n\t$(GSL_LIBS)\n\n"
  },
  {
    "path": "cmd/apop_db_to_crosstab.c",
    "content": "/** \\file \nCommand line utility to convert a three-column table to a crosstab.*/\n\n/*Copyright (c) 2005--2007, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <unistd.h>\n\nint main(int argc, char **argv){\n    int c;\n    char verbose=0;\n    char const *msg=\"Usage: %s [opts] dbname table_name rows columns data\\n\"\n\"\\n\"\n\"A command-line wrapper for the apop_db_to_crosstab function.\\n\"\n\"See Apophenia's online documentation for that function for details and tricks.\\n\"\n\"The default for the data column is a count [count(*)]\\n\"\n\"The column is optional; leave it out if you want a single-dimensional crosstab.\\n\"\n\"If you need a non-default data column but want a 1-D crosstab, use 1 as your column.\\n\"\n\"\\n\"\n\" -d\\tdelimiter (default: <tab>)\\n\"\n\" -v\\tverbose: prints status info on stderr\\n\"\n\" -v -v\\tvery verbose: also print queries executed on stderr\\n\"\n\" -h\\tdisplay this help and exit\\n\"\n\"\\n\";\n\n\tapop_opts.verbose=0;  //so don't print queries until -v -v.\n\n\twhile ((c = getopt (argc, argv, \"d:f:hv-\")) != -1)\n\t\tif      (c=='d') strcpy(apop_opts.output_delimiter,optarg);\n        else if (c=='h'||c=='-') {printf(msg, argv[0]); exit(0);}\n        else if (c=='v') {\n            verbose++;\n            apop_opts.verbose++;\n        }\n\n    Apop_stopif(optind+2 > argc, return 1, 0, \"I need at least two arguments past the options: database table [optional rowcol] [optional columncol] [optional datacol]\");\n    _Bool no_rowcol = optind+2 > argc;\n    _Bool no_columncol = optind+3 > argc;\n    _Bool no_datacol = optind+4 > argc;\n    char *rowcol = no_rowcol    ? \"1\" : argv[optind+2];\n    char *colcol = no_columncol ? \"1\" : argv[optind+3];\n    char *datacol = no_datacol  ? NULL: argv[optind+4];\n    if (verbose){\n        fprintf(stderr, \"database:%s\\ntable: %s\\nrow col: %s\\ncol col:%s%s%s\\n---------\\n\",\n            argv[optind], argv[optind +1], rowcol, colcol,\n            no_datacol ?\"\":\"\\ndata col:\", datacol);\n    }\n\tapop_db_open(argv[optind]);\n\tapop_data *m = apop_db_to_crosstab(argv[optind +1], rowcol, colcol, datacol);\n\tapop_data_print(m);\n}\n"
  },
  {
    "path": "cmd/apop_plot_query.c",
    "content": "/** \\file \n Command line utility to take in a query and produce a plot of its output via Gnuplot.\n\nCopyright (c) 2006--2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <unistd.h>\n\n\n/** This convenience function will take in a \\c gsl_vector of data and put out a histogram, ready to pipe to Gnuplot.\n\n\\param data A \\c gsl_vector holding the data. Do not pre-sort or bin; this function does that for you via apop_data_to_bins.\n\\param bin_count   The number of bins in the output histogram (if you send zero, I set this to \\f$\\sqrt(N)\\f$, where \\f$N\\f$ is the length of the vector.)\n\\param with The method for Gnuplot's plotting routine. Default is \\c \"boxes\", so the gnuplot call will read <tt>plot '-' with boxes</tt>. The \\c \"lines\" option is also popular, and you can add extra terms if desired, like <tt> \"boxes linetype 3\"</tt>.\n*/\nvoid plot_histogram(gsl_vector *data, FILE *f, size_t bin_count, char *with){\n    Apop_stopif(!data, return, 0, \"Input vector is NULL.\");\n    if (!with) with=\"impulses\";\n    apop_data vector_as_data = (apop_data){.vector=data};\n    apop_data *histodata = apop_data_to_bins(&vector_as_data, .bin_count=bin_count, .close_top_bin='y');\n    apop_data_sort(apop_data_pmf_compress(histodata));\n    apop_data_free(histodata->more); //the binspec.\n\n    fprintf(f, \"set key off\t;\\n\"\n               \"plot '-' with %s\\n\", with);\n    apop_data_print(histodata, .output_pipe=f);\n    fprintf(f, \"e\\n\");\n\n    fflush(f);\n    apop_data_free(histodata);\n}\n\n\nchar *plot_type = NULL;\nint histobins = 0;\nint histoplotting = 0;\n\nFILE *open_output(char *outfile, int sf){\n    FILE  *f;\n    if (sf && !strcmp (outfile, \"-\"))\n        return stdout;\n    if (sf && outfile){\n        f = fopen(outfile, \"w\");\n        Apop_stopif(!f, exit(0), 0, \"Trouble opening %s.\", outfile);\n        return f;\n    }\n    f = popen(\"`which gnuplot` -persist\", \"w\");\n    Apop_stopif(!f, exit(0), 0, \"Trouble opening %s.\", \"gnuplot\");\n    return f;\n}\n\nchar *read_query(char *infile){\n    char in[1000];\n    char *q = malloc(10);\n    q[0] = '\\0';\n    FILE *inf = fopen(infile, \"r\");\n    Apop_stopif(!inf, exit(0), 0, \"Trouble opening %s.\\n\", infile);\n    while(fgets(in, 1000, inf)){\n        q = realloc(q, strlen(q) + strlen(in) + 4);\n        sprintf(q, \"%s%s\", q, in);\n    }\n    sprintf(q, \"%s;\\n\", q);\n    fclose(inf);\n    return q;\n}\n\ngsl_matrix *query(char *d, char *q, int no_plot){\n\tapop_db_open(d);\n    apop_data *result = apop_query_to_data(\"%s\", q);\n\tapop_db_close(0);\n    Apop_stopif(!result, exit(2), 0, \"Your query returned a blank table. Quitting.\");\n    Apop_stopif(result->error, exit(2), 0, \"Error running your query. Quitting.\");\n    if (no_plot){\n        apop_data_show(result);\n        exit(0);\n    }\n    return result->matrix;\n}\n\nvoid print_out(FILE *f, char *outfile, gsl_matrix *m){\n    if (!histoplotting){\n        fprintf(f,\"plot '-' with %s\\n\", plot_type);\n\t    apop_matrix_print(m, NULL, .output_type='p', .output_pipe=f);\n    } else {\n        Apop_col_v(&(apop_data){.matrix=m}, 0, v);\n        plot_histogram(v, f, histobins, NULL);\n    }\n    if (outfile) fclose(f);\n}\n\nint main(int argc, char **argv){\n    int c;\n    char *q = NULL,\n         *d = NULL,\n         *outfile = NULL;\n    int sf = 0,\n        no_plot = 0;\n\n    const char* msg= \"Usage: %s [opts] dbname query\\n\"\n\"\\n\"\n\"Runs a query, and pipes the output directly to gnuplot. Use -f to dump to stdout or a file.\\n\"\n\" -d\\tdatabase to use (mandatory)\\n\"\n\" -q\\tquery to run (mandatory or use -Q)\\n\"\n\" -Q\\tfile from which to read the query\\t\\t\\n\"\n\" -n\\tno plot: just run the query and display results to stdout\\t\\t\\n\"\n\" -t\\tplot type (points, bars, ...) (default: \\\"lines\\\")\\n\"\n\" -H\\tplot histogram with this many bins (e.g., -H100) (to let the system auto-select bin sizes, use -H0)\\n\"\n\" -f\\tfile to dump to. If -f- then use stdout (default: pipe to Gnuplot)\\n\"\n\" -h\\tdisplay this help and exit\\n\"\n\"\\n\";\n\n\tApop_stopif(argc<2, return 1, 0, msg, argv[0]);\n\twhile ((c = getopt (argc, argv, \"ad:f:hH:nQ:q:st:-\")) != -1)\n\t    if (c=='f'){\n              outfile = strdup(optarg);\n              sf++;\n        } else if (c=='H'){\n              histoplotting = 1;\n              histobins = atoi(optarg);\n        }\n        else if (c=='h'||c=='-') {\n            printf(msg, argv[0]);\n\t\t\treturn 0;\n\t\t}\n        else if (c=='d') d = strdup(optarg);\n        else if (c=='n') no_plot ++;\n        else if (c=='Q') q = read_query(optarg);\n        else if (c=='q') q = strdup(optarg);\n        else if (c=='t') plot_type = strdup(optarg);\n\n    if (optind == argc -2){\n        d = argv[optind];\n        q = argv[optind+1];\n    } else if (optind == argc-1)\n        q = argv[optind];\n\n    Apop_stopif(!q, return 1, 0, \"I need a query specified with -q.\\n\");\n\n    if (!plot_type) plot_type = strdup(\"lines\");\n\n    FILE *f = open_output(outfile, sf);\n    gsl_matrix *m = query(d, q, no_plot);\n    print_out(f, outfile, m);\n}\n"
  },
  {
    "path": "cmd/apop_text_to_db.c",
    "content": "/** \\file \n A command line script to read a text file into a database.\n\nCopyright (c) 2006--2007, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n#include <unistd.h>\n\nint *break_down(char *in){\n    int *out = NULL;\n    int ctr = 0;\n    char *cp = strtok (in, \",\");\n    while (cp != NULL) {\n      out = realloc(out, sizeof(int)*(ctr+1));\n      out[ctr++] = atoi(cp);\n      cp = strtok (NULL, \",\");\n    }\n    return out;\n}\n\nint main(int argc, char **argv){\n    int c;\n    char *msg;\n    int colnames = 'y',\n        rownames = 0,\n        tab_exists_check = 0;\n    char **field_names = NULL;\n\n\tAsprintf(&msg, \"Usage: %s [-d delimiters] text_file table_name dbname\\n\"\n\"\\n\"\n\"If the input text file name is a single dash, -, then read from STDIN.\\n\"\n\"Input must be plain ASCII or UTF-8.\\n\"\n\" -d\\t\\tthe single-character delimiters to use, e.g., -d \\\" ,\\\" or -d \\\"\\\\t\\\" (which you \\n\"\n  \" \\t\\t\\twill almost certainly have to write as -d \\\"\\\\\\\\t\\\") (default: \\\"|,\\\\t\\\", meaning \\n\"\n  \" \\t\\t\\tthat any of a pipe, comma, or tab will delimit separate entries)\\n\"\n\" -nc\\t\\tdata does not include column names\\n\"\n\" -n regex\\t\\tcase-insensitive regular expression indicating Null values (default: NaN)\\n\"\n\" -m\\t\\tuse a MySQL database (default: SQLite)\\n\"\n\" -f\\t\\tfixed width field ends: -f\\\"3,8,12,17\\\" (first char is one, not zero)\\n\"\n\" -u\\t\\tmysql username\\n\"\n\" -p\\t\\tmysql password\\n\"\n\" -r\\t\\tdata includes row names\\n\"\n\" -v\\t\\tverbosity\\n\"\n\" -N\\t\\ta comma-separated list of column names: -N\\\"apple,banana,carrot,durian\\\"\\n\"\n\" -en\\t\\tif table exists, do nothing and exit\\n\"\n\" -ed\\t\\tif table exists, retain the table, delete all data, refill with the new data (i.e., call 'delete * from your_table')\\n\"\n\" -eo\\t\\tif table exists, overwrite the table from scratch (deleting the previous table entirely)\\n\"\n\" -ea\\t\\tif table exists, append new data to the existing table\\n\"\n\" -h\\t\\tdisplay this help and exit\\n\"\n\"\\n\"\n, argv[0]);\n    int * field_list = NULL;\n    char if_exists = 'n';\n\n\tif(argc<3){\n\t\tprintf(\"%s\", msg);\n\t\treturn 0;\n\t}\n\twhile ((c = getopt (argc, argv, \"n:d:e:f:hmp:ru:vN:O\")) != -1)\n        if (c=='n') {\n              if (optarg[0]=='c') colnames='n';\n              else                apop_opts.nan_string = optarg;\n        }\n\t\telse if (c=='N') {\n            apop_data *field_name_data;\n            apop_regex(optarg, \" *([^,]*[^ ]) *(,|$) *\", &field_name_data);\n            Apop_stopif(!field_name_data, return 1, 0, \"'%s' should be a \"\n                    \"comma-delimited list of field names, but I had trouble \"\n                    \"parsing it as such.\", optarg);\n            apop_data_transpose(field_name_data);\n            field_names = field_name_data->text[0];\n        }\n        else if (c=='d') strcpy(apop_opts.input_delimiters, optarg);\n\t\telse if (c=='f') field_list = break_down(optarg);\n\t\telse if (c=='h') {printf(\"%s\", msg); return 0;}\n\t\telse if (c=='m') apop_opts.db_engine = 'm';\n\t\telse if (c=='u') strcpy(apop_opts.db_user, optarg);\n\t\telse if (c=='p') strcpy(apop_opts.db_pass, optarg);\n\t\telse if (c=='r') rownames++;\n\t\telse if (c=='v') apop_opts.verbose=2;\n\t\telse if (c=='O') tab_exists_check++; //deprecated as of December 2013.\n\t\telse if (c=='e') {\n            if (optarg[0]=='n')       if_exists='n'; //the default anyway.\n            else if (optarg[0]=='d')  if_exists='d';\n            else if (optarg[0]=='a')  if_exists='a';\n            else if (optarg[0]=='o') {if_exists='o';\n                                      tab_exists_check++;\n                                     }\n\n        }\n\tapop_db_open(argv[optind + 2]);\n    if (tab_exists_check) apop_table_exists(argv[optind+1],1);\n    apop_query(\"begin\");\n\tapop_text_to_db(argv[optind], argv[optind+1], rownames, colnames, field_names, .field_ends=field_list, .if_table_exists=if_exists);\n    apop_query(\"commit\");\n}\n"
  },
  {
    "path": "configure",
    "content": "#!/bin/bash\n\nversion=1.0\nworkdir=apophenia-$version\n\necho This configure script uses Autotools to generate a build directory, $workdir.\n\nif [ ! -x `which autoreconf` ]\nthen\necho \"I couldn't find and execute autoreconf, part of the Autotools suite. See README for details.\"\nelse\n\necho \"Don't forget: Apophenia always depends on the GNU Scientific Library and SQLite3 (_and_ their -dev or -devel packages) to build and run.\"\nmkdir -p $workdir/m4\ncp -r cmd docs eg install/{acinclude.m4,apophenia.pc.in,configure.ac,Makefile.am,COPYING,Readme-pkg,rpm.spec,apophenia.map} model tests transform README ChangeLog $workdir\n\n#Debian maintainers didn't want this in the makefile.\ncp *.c $workdir && rm $workdir/*.m4.c\ncp apop_internal.h $workdir\nm4 -P install/prep_variadics.m4 apop.m4.h > $workdir/apop.h.in\nfor i in *.m4.c; do\n    m4 -P install/prep_variadics.m4 $i > $workdir/`basename $i .m4.c`.c\ndone\n\n#mkdir -p $workdir/tests/apophenia\n#cp *.h $workdir/tests/apophenia\ncd $workdir\n\n\n#Autoconf requirements:\ncp ChangeLog NEWS\necho \"Ben Klemens (fluffmail@f-m.fm)\" > AUTHORS\n\nautoreconf -f -i\nautoreconf -i\n./configure -C\n#make dist\nmake distcheck\n\necho\necho ---------------------\necho\necho \"OK, built. From the $workdir directory, you can run: make && sudo make install\"\n\nfi #end `else' block of test for Autotools.\n"
  },
  {
    "path": "docs/Makefile.am",
    "content": "\n#The Doxygen documentation, which you'll have to call yourself (via make doc).\n\nGVZDOT = /usr/bin/dot\n\ndefault:\n\n## adhoc\nhtml man: doc\n\napophenia_CSOURCES = \\\n\t$(top_srcdir)/model/*.c \\\n\t$(top_srcdir)/transform/*.c\n\techo #$(wildcard $(top_srcdir)/model/*.c) \\\n\techo #$(wildcard $(top_srcdir)/transform/*.c)\n\napophenia_DOTS = \\\n\tstructs.dot\n\napophenia_IMAGES = \\\n\tflake.gif \\\n\tsearch.gif \\\n\tright.png \\\n\tdown.png\n\napophenia_IMAGES_EXTRA = \\\n\ttriangle.png \\\n\tmodel.png\n\napophenia_JS = \\\n\ttree.js\n\napophenia_IMAGES_GENERATED = \\\n\tstructs.png\n\nmodel_doc.h: $(apophenia_CSOURCES)\n\tcat $^ | awk -f $(top_srcdir)/docs/make_model_doc.awk > $@\n\ndoc: documentation.h model_doc.h $(apophenia_IMAGES) $(apophenia_JS) $(apophenia_IMAGES_GENERATED)\n\t$(MKDIR_P) include\n\tsed -e 's/__attribute.*;/;/;s/extern //' $(top_builddir)/apop.h > include/apop.h\n\tdoxygen doxygen.conf\n\tfor f in $(apophenia_IMAGES) $(apophenia_JS) ; do \\\n\t\ttest -f $(top_srcdir)/docs/$$f && cp $(top_srcdir)/docs/$$f html/ ;\\\n\tdone\n\tcp $(apophenia_IMAGES_GENERATED) html/\n\tsed -i -f $(top_srcdir)/docs/edit_outline.sed html/index.html html/outline.html\n\tsed -i -f $(top_srcdir)/docs/edit_group.sed html/group__models.html\n\tsed -i -f $(top_srcdir)/docs/edit_width.sed html/*.html\n\t$(abs_top_srcdir)/docs/adjust\n\ndoc-clean:\n\t-rm -rf include html latex man\n\ninstall-html-local: doc\n\tcp -prd html $(DESTDIR)$(docdir)\n\nmaintainer-clean-local: doc-clean\n\nCLEANFILES = \\\n\tmissing_model_parts\n\nMAINTAINERCLEANFILES = \\\n\tmodel_doc.h \\\n\tdoxygen.log\n\nEXTRA_DIST = \\\n\tadjust \\\n\tmake_model_doc.awk \\\n\t$(apophenia_DOTS) \\\n\t$(apophenia_IMAGES) \\\n\t$(apophenia_IMAGES_EXTRA) \\\n\t$(apophenia_JS) \\\n\tedit_outline.sed edit_globals.sed edit_group.sed edit_width.sed \\\n\tapop_data_fig.html head.html foot.html \\\n\ttypical.css \\\n\tdocumentation.h\n\n%.png : %.dot\n\t$(GVZDOT) -Tpng < $< > $@\n"
  },
  {
    "path": "docs/adjust",
    "content": "#Use GNU sed to make modifications to the LaTeX version before producing a PDF\n\n#But first, some HTML tweaks:\n# Background color of side pane should match rest of document\nsed -i \"/background-color/ s|:.*|: #FFFFDE;|\" html/navtree.css\nsed -i \"/nav_h.png/ d\" html/navtree.css\n\n# We no longer live in an ASCII-only world.\nsed -i 's/pdflatex/xelatex/' latex/Makefile\n\n#PDF: a 170pp document shouldn't be typeset in sans serif\nsed -i \"/familydefault.*sfdefault/ d\" latex/refman.tex\n\n#No fancyheaders\nsed -i -e \"/fancy/d\" -e '/footrulewidth/d'  -e \"/Generated by Doxygen/ d\" latex/refman.tex\n\n##Set title, for headers and such.\n#sed '/begin.document/i\n#\\\\title[Apophenia]{Apophenia}' latex/refman.tex\n\n# Nicer lists.\nsed -i '/begin.document/a\\\n\\\\setlength\\\\itemsep{0pt} \\\\renewcommand\\\\labelitemi{$\\\\circ$} \\\\renewcommand\\\\labelitemii{$\\\\circ$} \\\\renewcommand\\\\labelitemiii{$\\\\cdot$}' latex/refman.tex\n\n#I can't make the index look not-stupid\nsed -i '/makeindex/d' latex/refman.tex\nsed -i '/makeindex/d' latex/Makefile\n\n\n#Fix enumerations -> model documentation\nsed -i -f edit_group.sed latex/group__models.tex\n#Rm header list\nsed -i -e '/subsection.*Enumer/d' -e'/begin{DoxyCompactItemize}/,/end{DoxyCompactItemize}/d' latex/group__models.tex\n##sed -i -e 's/\\\\cline{.*}$//' -e's/begin{TabularN*C}.*/begin{tabular}{ll}/' -e's/end{TabularN*C}.*/end{tabular}/' latex/group__models.tex\n#sed -i -e's/\\(TabularC}\\){2}/\\1{3}/' latex/group__models.tex\n\n#redundant with the struct defns.\nfdline=$((`grep -n 'subsection.Function Documentation.' latex/group__all__public.tex|sed 's/:.*//'`-1))\nsed -i \"/subsection.Typedef Documentation./,$fdline d\" latex/group__all__public.tex\n\n#Unclutter the ToC by using section*=omit from ToC\nsed -i -e 's/subsection{Detailed Description}/subsection*{Detailed Description}/' -e 's/subsection{Field Documentation}/subsection*{Field Documentation}/' latex/*tex\n\n#Bare references to subpages\n#sed -i -e's|^\\\\hyperlink.*hypertarget.*subsection.*label{[^}]*}$||' latex/*tex\nsed -i -e's|^\\\\hyperlink{[^}]*}{[^}]*}$||' latex/*tex\n\n#the hyperlink insertion into sample code sometimes creates needless newlines with a\n#six-space indent.\nsed -i '$!N;s/\\n      \\(\\\\hyperlink\\)/\\1/;P;D' latex/*tex\n\n#code looks too small to me. Go one step bigger.\nsed -i -e 's/scriptsize/footnotesize/' latex/doxygen.sty\n\n#paragraphs have a lot of space after the header; tighten\n#sed -i -e'/DoxyParagraph/,+15 s|\\(end{list}\\)|\\1\\\\vspace{-10pt}|' latex/doxygen.sty\n\n#offset description labels by a few cm\nsed -i '$a\\\n\\\\renewcommand{\\\\descriptionlabel}[1]{\\\\hspace{-1.5cm}\\\\emph{#1}}' latex/doxygen.sty\n\nsed -i -e 's/xtab\\*{\\\\linewidth}/xtab/g' -e 's/xtab\\(ular\\|\\)/supertabular/g' -e 's/\\\\tablefirsthead.*}//' -e '/\\\\par%/d' latex/doxygen.sty\n#sed latex/doxygen.sty\n\n\n\n#Want search box in header\nsed -i -e '/position: *absolute/d' html/search/search.css\nsed -i -e 's/display: *block/display: inline/' html/search/search.css\n#magnifying glass is hard-coded, and span=\"left\" covers left and  middle of search box\nsed -i -e '/mag_sel.png/,+3d' html/*html\n\n#100% of the search results are doubled. Can't work out why.\nsed -i -e 's/\\],\\[[^]]\\+\\]/]/' html/search/*js\n\n"
  },
  {
    "path": "docs/apop_data_fig.html",
    "content": "<table frame=box>\n<tr>\n<td>Rowname</td><td>Vector</td><td> Matrix</td><td> Text</td><td>Weights</td>\n</tr><tr valign=bottom>\n<td align=center>\n<table frame=box>\n<tr><td> </td></tr>\n<tr>\n<td>\"Steven\"</td>\n</tr><tr>\n<td>\"Sandra\"</td>\n</tr><tr>\n<td>\"Joe\"</td><td>\n</tr> \n</table>\n</td><td align=center>\n<table frame=box>\n<tr>\n<th>Outcome</th>\n</tr> <tr>\n<td align=center>1</td>\n</tr><tr>\n<td align=center>0</td>\n</tr><tr>\n<td align=center>1</td>\n</tr> \n</table>\n</td><td align=center>\n<table frame=box>\n<tr>\n<th> Age</th><th> Weight (kg)</th><th> Height (cm)</th>\n</tr> <tr>\n<td> 32</td><td> 65</td><td> 175</td>\n</tr><tr>\n<td> 41</td><td> 61</td><td> 165</td>\n</tr><tr>\n<td> 40</td><td> 73</td><td> 181</td>\n</tr> \n</table>\n</td><td align=center>\n<table frame=box>\n<tr>\n<th> Sex</th><th> State</th>\n</tr>\n<tr>\n<td> Male</td><td> Alaska</td><td>\n</tr><tr>\n<td> Female</td><td> Alabama</td>\n</tr><tr>\n<td> Male</td><td> Alabama</td>\n</tr> \n</table>\n</td><td align=center>\n<table frame=box>\n<tr><td> </td></tr>\n<tr>\n<td>1</td>\n</tr><tr>\n<td>3.2</td>\n</tr><tr>\n<td>2.4</td>\n</tr> \n</table>\n</td></tr>\n</table>\n"
  },
  {
    "path": "docs/apop_data_fig.tex",
    "content": "\\begin{tabular}{ccccc}\nRowname& Vector&  Matrix&  Text& Weights\\\\\n\\fbox{\n\\begin{tabular}{c}\n\\phantom{hi}\\\\\n\"Steven\"\\\\\n\"Sandra\"\\\\\n\"Joe\"\n\\end{tabular}\n}\n&\n\\fbox{\n\\begin{tabular}{c}\n{\\bf Outcome}\\\\\n1\\\\\n0\\\\\n1\n\\end{tabular}\n}\n&\n\\fbox{\n\\begin{tabular}{ccc}\n{\\bf Age} & {\\bf Weight (kg)} & {\\bf Height (cm)}\\\\\n32  & 65  & 175\\\\\n41  & 61  & 165\\\\\n40  & 73  & 181\n\\end{tabular}\n}\n&\n\\fbox{\n\\begin{tabular}{ccc}\n{\\bf Sex} & {\\bf State} \\\\\nMale & Alaska\\\\\nFemale & Alabama\\\\\nMale & Alabama\n\\end{tabular}\n}\n&\n\\fbox{\n\\begin{tabular}{ccc}\n\\phantom{hi}\\\\\n1\\\\\n3.2\\\\\n2.4\n\\end{tabular}\n}\n\\end{tabular}\n"
  },
  {
    "path": "docs/documentation.h",
    "content": "/* Apophenia's narrative documentation\nCopyright (c) 2005--2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n/**  \\mainpage Welcome\n\nApophenia is an open statistical library for working with data sets and statistical\nmodels. It provides functions on the same level as those of the typical stats package\n(such as OLS, Probit, or singular value decomposition) but gives the user more\nflexibility to be creative in model-building.  The core functions are written in C,\nbut experience has shown them to be easy to bind to in Python/Julia/Perl/Ruby/&c.\n\nIt is written to scale well, to comfortably work with gigabyte data sets, million-step simulations, or\ncomputationally-intensive agent-based models.\n\n<em>The goods</em> \n\nThe library has been growing and improving since 2005, and has been downloaded well over 1e4 times. To date, it has over two hundred functions and macros to facilitate statistical computing, such as:\n\n\\li OLS and family, discrete choice models like Probit and Logit, kernel density estimators, and other common models.\n\\li Functions for transforming models (like Normal \\f$\\rightarrow\\f$ truncated Normal)\n    and combining models (produce the cross-product of that truncated Normal with three others,\n    or use Bayesian updating to combine that cross-product prior with an OLS likelihood\n    to produce a posterior distribution over the OLS parameters).\n\\li Database querying and maintenance utilities.\n\\li Data manipulation tools for splitting, stacking, sorting, and otherwise shunting data sets.\n\\li Moments, percentiles, and other basic stats utilities.\n\\li \\f$t\\f$-tests, \\f$F\\f$-tests, et cetera.\n\\li Several optimization methods available for your own new models.\n\\li It does <em>not</em> re-implement basic matrix operations or build yet another database\nengine. Instead, it builds upon the excellent <a href=\"http://www.gnu.org/software/gsl/\">GNU\nScientific</a> and <a href=\"http://www.sqlite.org/\">SQLite</a> libraries. MySQL/mariaDB is also supported.\n\nFor the full list of macros, functions, and prebuilt models, check the <a href=\"group__all__public.html\">index</a>.\n\n<em><a href=\"https://github.com/b-k/apophenia/archive/pkg.zip\">Download Apophenia here</a>.</em>\n\nOr, see the \\ref setup page for detailed setup instructions,\nincluding how to use your package manager to install the Debian or Homebrew package.\n\n<em>The documentation</em>\n\nTo start off, have a look at this \\ref gentle \"Gentle Introduction\" to the library.\n\n<a href=\"outline.html\">The outline</a> gives a more detailed narrative.\n\nThe <a href=\"group__all__public.html\">index</a> lists every function in the\nlibrary, with detailed reference\ninformation. Notice that the header to every page has a link to the outline and the index.\n\nTo really go in depth, download or pick up a copy of <a\nhref=\"http://modelingwithdata.org\">Modeling with Data</a>, which discusses general\nmethods for doing statistics in C with the GSL and SQLite, as well as Apophenia\nitself. <a href=\"http://www.census.gov/srd/papers/pdf/rrs2014-06.pdf\"><em>A Useful\nAlgebraic System of Statistical Models</em></a> (PDF) discusses some of the theoretical\nstructures underlying the library.\n\nThere is a <a href=\"https://github.com/b-k/apophenia/wiki\">wiki</a> with some convenience\nfunctions, tips, and so on.\n\n<em>Notable features</em> \nMuch of what Apophenia does can be done in any typical statistics package. The \\ref\napop_data structure is much like an R data frame, for example, and there is nothing special\nabout being able to invert a matrix or take the product of two matrices with a single\nfunction call (\\ref apop_matrix_inverse and \\ref apop_dot, respectively). \nEven more advanced features like Loess smoothing (\\ref apop_loess) and the Fisher Exact\nTest (\\ref apop_test_fisher_exact) are not especially Apophenia-specific. But here are\nsome things that are noteworthy.\n\n  \\li It's a C library! You can build applications using Apophenia for the data-processing\nback-end of your program, and not worry about the overhead associated with scripting\nlanguages. For example, it is currently used in production for certain aspects of\nprocessing for the U.S. Census Bureau's American Community Survey. And the numeric\nroutines in your favorite scripting language typically have a back-end in plain C;\nperhaps Apophenia can facilitate writing your next one.\n  \\li The \\ref apop_model object allows for consistent treatment of distributions, regressions,\nsimulations, machine learning models, and who knows what other sorts of models you can\ndream up. By transforming and combining existing models, it is easy to build complex\nmodels from simple sub-models.\n  \\li For example, the \\ref apop_update function does Bayesian updating on any two\nwell-formed models. If they are on the table of conjugates, that is correctly\nhandled, and if they are not, an appropriate variant of MCMC produces an empirical\ndistribution. The output is yet another model, from which you can make random draws,\nor which you can use as a prior for another round of Bayesian updating. Outside of\nBayesian updating, the \\ref apop_model_metropolis function is good for approximating\nother complex models.\n  \\li The maximum likelihood system combines several subsystems into one\nform: it will do a few flavors of conjugate gradient search, Nelder-Mead Simplex,\nNewton's Method, or Simulated Annealing. You pick the method by a setting attached to\nyour model. If you want to use a method that requires derivatives and you don't have\na closed-form derivative, the ML subsystem will estimate a numerical gradient for\nyou. If you would like to do EM-style maximization (all but the first parameter are\nfixed, that parameter is optimized, then all but the second parameter are fixed, that\nparameter is optimized, ..., looping through dimensions until the change in objective\nacross cycles is less than <tt>eps</tt>), add a settings group specifying the\ntolerance at which the cycle should stop: <tt>Apop_settings_add_group(your_model,\napop_mle, .dim_cycle_tolerance=eps)</tt>.\n  \\li The Iterative Proportional Fitting algorithm, \\ref apop_rake, is best-in-breed,\ndesigned to handle large, sparse matrices.\n\n\n\n<em>Contribute!</em> \n\n\\li Develop a new model object.\n\\li Contribute your favorite statistical routine.\n\\li Package Apophenia into an RPM, portage, cygwin package.\n\\li Report bugs or suggest features.\n\\li Write bindings for your preferred language. For example, here are \na <a href=\"https://github.com/swuecho/apophenia-perl\">Perl wrapper</a> and early versions\nof <a href=\"http://modelingwithdata.org/arch/00000173.htm\"> a Julia wrapper</a> and <a\nhref=\"https://github.com/b-k/Rapophenia/\">an R wrapper</a> which you could expand upon.\n\nIf you're interested,  <a href=\"mailto:fluffmail@f-m.fm\">write to the maintainer</a> (Ben Klemens), or join the\n<a href=\"https://github.com/b-k/apophenia\">GitHub</a> project.\n*/\n\n/** \\page eg Some examples\nHere are a few pieces of sample code for testing your installation or to give you a\nsense of what code with Apophenia's tools looks like.\n\n<em> Two data streams</em>\n\nThe sample program here is intended to show how one would integrate Apophenia into an\nexisting program. For example, say that you are running a simulation of two different\ntreatments, or say that two sensors are posting data at regular intervals. The goal is to\ngather the data in an organized form, and then ask questions of the resulting data set.\nBelow, a thousand draws are made from the two processes and put into a database. Then,\nthe data is pulled out, some simple statistics are compiled, and the data is written to\na text file for inspection outside of the program.  This program will compile cleanly\nwith the sample \\ref makefile.\n\n\\include draw_to_db.c\n\n<em> Run a regression</em>\n\nSee \\ref gentle for an example of loading a data set and running a simple regression.\n\n<em> A sequence of t-tests</em>\n\nIn \\ref mapply \"The section on map/apply\", a new \\f$t\\f$-test on every row, with all\noperations acting on entire rows rather than individual data points:\n\n\\include t_test_by_rows.c\n\nIn the documentation for \\ref apop_query_to_text, a program to list all the tables in an SQLite database.\n\\include ls_tables.c\n\n<em>Marginal distribution</em>\n\nA demonstration of fixing parameters to create a marginal distribution, via \\ref apop_model_fix_params\n\\include fix_params.c\n*/\n\n\n/** \\page setup Setting up\n\\section cast The supporting cast \nTo use Apophenia, you will need to have a working C compiler, the GSL (v1.7 or higher) and SQLite installed. mySQL/mariaDB is optional.\n\n\\li Some readers may be unfamiliar with modern package managers and common methods for setting up a C development environment; see \n<a href=\"http://modelingwithdata.org/appendix_o.html\">Appendix O</a> of <em> Modeling with Data</em> for an introduction.\n\n\\li Other pages in this documentation have a few more notes for \\ref windows \"Windows\" users, including \\ref mingw users.\n\n\\li Install the basics using your package manager. E.g., try\n\n\\code\nsudo apt-get install make gcc libgsl0-dev libsqlite3-dev\n\\endcode\n\nor\n\n\\code\nsudo yum install make gcc gsl-devel libsqlite3x-devel\n\\endcode\n\n\\li If you use a Debian-based system (including Ubuntu), try the version in Debian's Stretch distribution:\n\n\\code\n#Add Stretch to your sources list\nsudo sed -i '$a\ndeb http://httpredir.debian.org/debian stretch main\ndeb-src http://httpredir.debian.org/debian stretch main' /etc/apt/sources.list\nsudo apt-get update\n\n#Get Apophenia\nsudo apt-get install apophenia-bin apophenia-doc libapophenia2 libapophenia2-dev libapophenia2-dbg\n\n\n#Optional: remove Stretch from your sources list.\nsudo sed -i -e '$d' -e '$d' /etc/apt/sources.list\n\\endcode\n\nThanks to Jerome Benoit for building the package and bringing the library up to Debian standards.\n\n\\li Mac users with <a href=\"http://brew.sh\">Homebrew</a>, try\n\\code \nbrew install homebrew/science/apophenia\n\\endcode\n\nThanks to <a href=\"https://github.com/bdobyns\">bdobyns</a> for the Brew script.\n\n\\li Or, <a href=\"https://github.com/b-k/apophenia/archive/pkg.zip\">Download Apophenia here</a>.\nOnce you have the library downloaded, compile it using\n\n\\code\ntar xvzf apop*tgz && cd apophenia-0.999\n./configure && make && make check && sudo make install\n\\endcode\n\nIf you decide not to keep the library on your system, run <tt>sudo make uninstall</tt>\nfrom the source directory to remove it.\n\n\\li If you need to install packages in your home directory because you don't have root\npermissions, see the \\ref notroot page.\n\n\\li A \\ref makefile will help immensely when you want to compile your program.\n\n\\li You can verify that your setup works by trying some \\ref eg \"sample programs\".\n\n\\subpage notroot\n\n\\subpage makefile\n\n\\subpage windows\n\n\n\\li Those who would like to work on a cutting-edge copy of the source code\ncan get the latest version by cutting and pasting the following onto\nthe command line. If you follow this route, be sure to read the development README in the\n<tt>apophenia</tt> directory this command will create.\n\n\\code\ngit clone https://github.com/b-k/apophenia.git\n\\endcode\n\n<!--git clone git://apophenia.git.sourceforge.net/gitroot/apophenia/apophenia\ncvs -z3 -d:ext:<i>(your sourceforge login)</i>@cvs.sourceforge.net:/cvsroot/apophenia co -P apophenia\ncvs -z3 -d:pserver:anonymous@cvs.sf.net:/cvsroot/apophenia checkout -P apophenia\nsvn co https://apophenia.svn.sourceforge.net/svnroot/apophenia/trunk/apophenia -->\n\n*/\n\n/** \\page windows Windows\n\n\\ref mingw users, see that page.\n\nIf you have a choice, <a href=\"http://www.cygwin.com\">Cygwin</a> is strongly recommended. The setup program is\nvery self-explanatory. As a warning, it will probably take up &gt;300MB on\nyour system. You should install at least the following programs:\n\\li autoconf/automake\n\\li binutils\n\\li gcc\n\\li gdb\n\\li gnuplot -- for plotting data\n\\li groff -- needed for the man program, below\n\\li gsl -- the engine that powers Apophenia\n\\li less -- to read text files\n\\li libtool -- needed for compiling programs\n\\li make\n\\li man -- for reading help files\n\\li more -- not as good as less but still good to have\n\\li sqlite3 -- a simple database engine, a requisite for Apophenia\n\nIf you are missing anything else, the program will probably tell you.\nThe following are not necessary but are good to have on hand as long as you are going to be using Unix and programming.\n\n\\li git -- to partake in the versioning system\n\\li emacs -- steep learning curve, but people love it\n\\li ghostscript (for reading .ps/.pdf files)\n\\li openssh -- needed for git\n\\li perl, python, ruby -- these are other languages that you might also be interested in\n\\li tetex -- write up your documentation using the nicest-looking formatter around\n\\li X11 -- a windowing system\n\nX-Window will give you a nicer environment in which to work.  After you start Cygwin, type <tt>startx</tt> to bring up a more usable, nice-looking terminal (and the ability to do a few thousand other things which are beyond the scope of this documentation).  Once you have Cygwin installed and a good terminal running, you can follow along with the remainder of the discussion without modification.\n\nSome older versions of Cygwin have a \\c search.h file which doesn't include the function <tt>lsearch()</tt>.  If this is the case on your system, you will have to update your Cygwin installation.\n\nFinally, windows compilers often spit out lines like:\n\\code\nInfo: resolving _gsl_rng_taus by linking to __imp__gsl_rng_taus (auto-import)\n\\endcode\nThese lines are indeed just information, and not errors. Feel free to ignore them.\n\n[Thanks to Andrew Felton and Derrick Higgins for their Cygwin debugging efforts.]\n\n\\subpage mingw\n*/\n\n/** \\page notroot  Not root? \nIf you aren't root, then the common procedure for installing a library is to create a subdirectory in your home directory in which to install packages. The key is the <tt>--prefix</tt> addition to the <tt>./configure</tt> command.\n\\code\nexport MY_LIBS = myroot   #choose a directory name to be created in your home directory.\nmkdir $HOME/$MY_LIBS\n\n# From Apophenia's package directory:\n./configure --prefix $HOME/$MY_LIBS\nmake\nmake install   #Now you don't have to be root.\n\n# Adjust your paths so the compiler and the OS can find the library.\n# These are environment variables, and they are usually set in the \n# shell's startup files. I assume you are using bash here.\n\necho \"export PATH=$HOME/$MY_LIBS/include:\\$PATH\" >> ~/.bashrc\necho \"export CPATH=$HOME/$MY_LIBS/include:\\$CPATH\" >> ~/.bashrc\necho \"export LIBRARY_PATH=$HOME/$MY_LIBS:\\$LIBRARY_PATH\" >> ~/.bashrc\necho \"export LD_LIBRARY_PATH=$HOME/$MY_LIBS:\\$LD_LIBRARY_PATH\" >> ~/.bashrc\n\\endcode\n\nOnce you have created this local root directory, you can use it to install as many new libraries as desired, and your paths will already be set up to find them.\n*/\n\n\n/** \\page makefile Makefile\n \nInstead of giving lengthy compiler commands at the command prompt, you can use a Makefile to do most of the work. How to:\n\\li Copy and paste the following into a file named \\c makefile.\n\\li Change the first line to the name of your program (e.g., if you have written <tt>census.c</tt>, then the first line will read <tt>PROGNAME=census</tt>). \n\\li If your program has multiple <tt>.c</tt> files, add a corresponding <tt>.o</tt> to the currently blank <tt>objects</tt> variable, e.g. <tt>objects=sample2.o sample3.o</tt>\n\\li One you have a Makefile in the directory, simply type <tt>make</tt> at the command prompt to generate the executable.\n\n \\code\nPROGNAME = your_program_name_here\nobjects =\nCFLAGS = -g -Wall -O3\nLDLIBS = -lapophenia -lgsl -lgslcblas -lsqlite3\n\n$(PROGNAME): $(objects)\n\\endcode\n\n\\li If your system has \\c pkg-config, then you can use it for a slightly more robust and readable makefile. Replace the above C and link flags with:\n\\code\nCFLAGS = -g -Wall `pkg-config --cflags apophenia` -O3\nLDLIBS = `pkg-config --libs apophenia`\n\\endcode\nThe \\c pkg-config program will then fill in the appropriate directories and libraries. Pkg-config knows Apophenia depends on the GSL and database libraries, so you need only list the most-dependent library.\n\n\\li The <tt>-O3</tt> flag is optional, asking the compiler to run its highest level of optimization (for speed).\n\n\\li GCC users may need the <tt>--std=gnu99</tt> or <tt>--std=gnu11</tt> flag to use post-1989 C standards.\n\n\\li Order matters in the linking list: the files a package depends on should be listed after the package. E.g., since sample.c depends on Apophenia, <tt>gcc sample.c -lapophenia</tt> will work, while <tt>gcc -lapophenia sample.c</tt> is likely to give you errors. Similarly, list <tt>-lapophenia</tt> before <tt>-lgsl</tt>, which comes before <tt>-lgslcblas</tt>.\n\n*/\n\n/** \\page designated Designated initializers\n\nFunctions so marked in this documentation use standard C designated initializers and compound literals to allow you to omit, call by name, or change the order of inputs. The following examples are all equivalent.\n\nThe standard format:\n\\code\napop_text_to_db(\"infile.txt\", \"intable\", 0, 1, NULL);\n\\endcode\n\nOmitted arguments are left at their default vaules:\n\\code\napop_text_to_db(\"infile.txt\", \"intable\");\n\\endcode\n\nYou can use the variable's name, if you forget its ordering:\n\\code\napop_text_to_db(\"infile.txt\", \"intable\", .has_col_name=1, .has_row_name=0);\n\\endcode\n\nIf an un-named element follows a named element, then that value is given to the next variable in the standard ordering:\n\\code\napop_text_to_db(\"infile.txt\", \"intable\", .has_col_name=1, NULL);\n\\endcode\n\n\\li There may be cases where you can not use this form (it relies on a macro, which\nmay not be available). You can always call the underlying function directly, by adding\n\\c _base to the name and giving all arguments:\n\n\\code\napop_text_to_db_base(\"infile.txt\", \"intable\", 0, 1, NULL);\n\\endcode\n\n\\li If one of the optional elements is an RNG and you do not provide one, I use one\nfrom \\ref apop_rng_get_thread.\n*/\n\n\n/** \\page preliminaries Getting started\n\nIf you are entirely new to Apophenia, \\ref gentle \"have a look at the Gentle Introduction here\".\n\nAs well as the information in this outline, there is a separate page covering the details of \n \\ref setup \"setting up a computing environment\" and another page with \\ref eg \"some sample code\" for your perusal.\n\n\\subpage gentle\n\n\\subpage setup\n\n\\subpage eg\n\n\\subpage refstatusext\n*/\n\n/** \\page refstatusext References and extensions\n\n\\section mwd The book version\n\nApophenia co-evolved with <em>Modeling with Data: Tools and Techniques for Statistical Computing</em>. You can read about the book, or download a free PDF copy of the full text, at <a href=\"http://modelingwithdata.org\">modelingwithdata.org</a>.\n\nAs with many computer programs, the preferred manner of citing Apophenia is to cite its related book.\nHere is a BibTeX-formatted entry giving the relevant information:\n\n\\code \n@book{klemens:modeling,\n    title = \"Modeling with Data: Tools and Techniques for Statistical Computing\",\n    author=\"Ben Klemens\",\n    year=2008,\n    publisher=\"Princeton University Press\"\n}\n\\endcode\n\nThe rationale for the \\ref apop_model struct, based on an algebraic system of models, is detailed in a <a href=\"http://www.census.gov/srd/papers/pdf/rrs2014-06.pdf\">U.S. Census Bureau research report</a>:\n\n\\code \n@techreport{klemens:algebra,\n    title = \"A Useful Algebraic System of Statistical Models\",\n    author=\"Ben Klemens\",\n    month=jul,\n    year=2014,\n    institution=\"U.S.\\ Census Bureau\",\n    number=\"06\"\n}\n\\endcode\n\n\\section ext How do I write extensions?\n\nThe system is written to not require a registration or initialization step to add a new\nmodel or other such parts.  Just write your code and <tt>include</tt> it like any other\nC code.  A new \\ref apop_model has to conform to some rules if it is to play well with\n\\ref apop_estimate, \\ref apop_draw, and so forth. See the notes at \\ref modeldetails.\nOnce your new model or function is working, please post the code or a link to the code\non the <a href=\"https://github.com/b-k/apophenia/wiki\">Apophenia wiki</a>.\n\n\\subpage c\n\n\\section links Further references\n\nFor your convenience, here are links to some other libraries you are probably using.\n\n\\li <a href=\"http://www.gnu.org/software/libc/manual/html_node/index.html\">The standard C library</a>\n\\li <a href=\"http://www.gnu.org/software/gsl/manual/html_node/index.html\">The\nGSL documentation</a>, and <a href=\"http://www.gnu.org/software/gsl/manual/html_node/Function-Index.html\">its index</a>\n\\li <a href=\"http://sqlite.org/lang.html\">SQL understood by SQLite</a>\n\n\n*/\n\n/** \\page outline An outline of the library\n\nThe narrative in this section goes into greater detail on how to use the \ncomponents of Apophenia. You are encouraged to read \\ref gentle first.\n\nThis overview begins with the \\ref apop_data set, which is the central data structure\nused throughout the system. Section \\ref dbs covers the use of the database interface,\nbecause there are a lot of things that a database will do better than a matrix\nstructure like the \\ref apop_data struct.\n\nSection \\ref modelsec covers statistical models, in the form of the \\ref apop_model structure.\nThis part of the system is built upon the \\ref apop_data set to hold parameters, statistics, data sets, and so on.\n\n\\ref histosec covers probability mass functions, which are statistical models\nbuilt directly around a data set, where the chance of drawing a given observation is\nproportional to how often that observation appears in the source data. There are many\nsituations where one would want to treat a data set as a probability distribution,\nsuch as using \\ref apop_kl_divergence to find the information loss from an observed\ndata set to a theoretical model fit to that data.\n\nSection \\ref testpage covers traditional hypothesis testing, beginning with common\nstatistics that take an \\ref apop_data set or two as input, and continuing on to\ngeneralized hypothesis testing for any \\ref apop_model.\n\nBecause estimation in the \\ref apop_model relies heavily on maximum likelihood\nestimation, Apophenia's optimizer subsystem is extensive.  \\ref maxipage offers\nsome additional notes on optimization and how it can be used in non-statistical contexts.\n\n\\subpage dataoverview\n\n\\subpage dbs\n\n\\subpage modelsec\n\n\\subpage histosec\n\n\\subpage testpage\n\n\\subpage maxipage\n\n\\subpage moreasst\n*/\n\n/** \\page c C, SQL and coding utilities\n \n<em>Learning C</em>\n\n<a href=\"http://modelingwithdata.org\">Modeling with Data</a> has a full tutorial for C, oriented at users of standard stats packages. More nuts-and-bolts tutorials are <a href=\"http://www.google.com/search?hl=es&amp;c2coff=1&amp;q=c+tutorial\">in abundance</a>.  Some people find pointers to be especially difficult; fortunately, there's a <a href=\"http://www.youtube.com/watch?v=6pmWojisM_E\">claymation cartoon</a> which clarifies everything.\n\n<em>Header aggregation</em>\n\nThere is only one header. Put\n\\code\n#include <apop.h>\n\\endcode\nat the top of your file, and you're done. Everything declared in that file starts with \\c apop_ or \\c Apop_. It also includes \n\\c assert.h, \\c math.h, \\c signal.h, and \\c string.h.\n\n<em>Linking</em>\n\nYou will need to link to the Apophenia library, which involves adding the <tt>-lapophenia</tt> flag to your compiler. Apophenia depends on SQLite3 and the GNU Scientific Library (which depends on a BLAS), so you will probably need something like:\n\n\\code\ngcc sample.c -lapophenia -lsqlite3 -lgsl -lgslcblas -o run_me -g -Wall -O3\n\\endcode\n\nYour best bet is to encapsulate this mess in a \\ref makefile \"Makefile\". Even if you are using an IDE and its command-line management tools, see the Makefile page for notes on useful flags.\n\n<em>Standards compliance</em>\n\nTo the best of our abilities, Apophenia complies to the C standard (ISO/IEC\n9899:2011). As well as relying on the GSL and SQLite, it uses some POSIX function calls,\nsuch as \\c strcasecmp and \\c popen.\n\n\\subpage designated\n\n\\section debugging  Errors, logging, debugging and stopping\n\n<em>The \\c error element</em> \n\nThe \\ref apop_data set and the \\ref apop_model both include an element named \\c error. It is normally \\c 0, indicating no (known) error. \n\nFor example, \\ref apop_data_copy detects allocation errors and some circular links\n(when <tt>Data->more == Data</tt>) and fails in those cases. You could thus use the\nfunction with a form like\n\n\\code\napop_data *d = apop_text_to_data(\"indata\");\napop_data *cp = apop_data_copy(d);\nif (cp->error) {printf(\"Couldn't copy the input data; failing.\\n\"); return 1;}\n\\endcode\n\nThere is sometimes (but not always) benefit to handling specific error codes, which are listed in the documentation of those functions that set the \\c error element. E.g.,\n\n\\code\napop_data *d = apop_text_to_data(\"indata\");\napop_data *cp = apop_data_copy(d);\nif (cp->error == 'a') {printf(\"Couldn't allocate space for the copy; failing.\\n\"); return 1;}\nif (cp->error == 'c') {printf(\"Circular link in the data set; failing.\\n\"); return 2;}\n\\endcode\n\nThe end of <a href=\"http://modelingwithdata.org/appendix_o.html\">Appendix O</a>\nof <em>Modeling with Data</em> offers some GDB macros which can make dealing with\nApophenia from the GDB command line much more pleasant. As discussed below, it\nalso helps to set <tt>apop_opts.stop_on_warning='v'</tt> or <tt>'w'</tt> when running\nunder the debugger.\n\n\n\\section verbsec Verbosity level and logging\n\nThe global variable <tt>apop_opts.verbose</tt> determines how many notifications and warnings get printed by Apophenia's warning mechanism:\n\n-1: turn off logging, print nothing (ill-advised) <br>\n0: notify only of failures and clear danger <br>\n1: warn of technically correct but odd situations that might indicate, e.g., numeric instability <br>\n2: debugging-type information; print queries  <br>\n3: give me everything, such as the state of the data at each iteration of a loop.\n\nThese levels are of course subjective, but should give you some idea of where to place the\nverbosity level. The default is 1.\n\nThe messages are printed to the \\c FILE* handle at <tt>apop_opts.log_file</tt>. If\nthis is blank (which happens at startup), then this is set to \\c stderr. This is the\ntypical behavior for a console program. Use\n\n\\code\napop_opts.log_file = fopen(\"mylog\", \"w\");\n\\endcode\n\nto write to the \\c mylog file instead of \\c stderr.\n\nAs well as the error and warning messages, some functions can also print diagnostics,\nusing the \\ref Apop_notify macro.  For example, \\ref apop_query and friends will print the\nquery sent to the database engine iff <tt>apop_opts.verbose >=2</tt> (which is useful\nwhen building complex queries). The diagnostics attempt to follow\nthe same verbosity scale as the warning messages.\n\n\\section Stopping\n\nBy default, warnings and errors never halt processing. It is up to the calling function\nto decide whether to stop.\n\nWhen running the program under a debugger, this is an annoyance: we want to stop as\nsoon as a problem turns up.\n\nThe global variable <tt>apop_opts.stop_on_warning</tt> changes when the system halts:\n\n\\c 'n': never halt. If you were using Apophenia to support a user-friendly GUI, for example, you would use this mode.<br>\nThe default: if the variable is <tt>'\\0'</tt> (the default), halt on severe errors, continue on all warnings.<br>\n\\c 'v': If the verbosity level of the warning is such that the warning would print to screen, then halt;\nif the warning message would be filtered out by your verbosity level, continue.<br>\n\\c 'w': Halt on all errors or warnings, including those below your verbosity threshold.\n\nSee the documentation for individual functions for details on how each reports errors to the caller and the level at which warnings are posted.\n\n\\section Legi Legible output\n\nThe output routines handle four sinks for your output. There is a global variable that\nyou can use for small projects where all data will go to the same place.\n\n\\code \napop_opts.output_type = 'f'; //named file\napop_opts.output_type = 'p'; //a pipe or already-opened file\napop_opts.output_type = 'd'; //the database\n\\endcode\n\nYou can also set the output type, the name of the output file or table, and other options\nvia arguments to individual calls to output functions. See \\ref apop_prep_output for the list of options.\n\nC makes minimal distinction between pipes and files, so you can set a\npipe or file as output and send all output there until further notice:\n\n\\code\napop_opts.output_type = 'p';\napop_opts.output_pipe = popen(\"gnuplot\", \"w\");\napop_plot_lattice(...); //see https://github.com/b-k/Apophenia/wiki/gnuplot_snippets\nfclose(apop_opts.output_pipe);\napop_opts.output_pipe = fopen(\"newfile\", \"w\");\napop_data_print(set1);\nfprintf(apop_opts.output_pipe, \"\\nNow set 2:\\n\");\napop_data_print(set2);\n\\endcode\n\nContinuing the example, you can always override the global data with\na specific request:\n\\code\napop_vector_print(v, \"vectorfile\"); //put vectors in a separate file\napop_matrix_print(m, \"matrix_table\", .output_type = 'd'); //write to the db\napop_matrix_print(m, .output_pipe = stdout);  //now show the same matrix on screen\n\\endcode\n\nI will first look to the input file name, then the input pipe, then the\nglobal \\c output_pipe, in that order, to determine to where I should\nwrite.  Some combinations (like output type = \\c 'd' and only a pipe) don't\nmake sense, and I'll try to warn you about those. \n\nWhat if you have too much output and would like to use a pager, like \\c less or \\c more?\nIn C and POSIX terminology, you're asking to pipe your output to a paging program. Here is\nthe form:\n\\code\nFILE *lesspipe = popen(\"less\", \"w\");\nassert(lesspipe);\napop_data_print(your_data_set, .output_pipe=lesspipe);\npclose(lesspipe);\n\\endcode\n\\c popen will search your usual program path for \\c less, so you don't have to give a full path.\n\n\\li\\ref apop_data_print  \n\\li\\ref apop_matrix_print\n\\li\\ref apop_vector_print\n\n\\section sqlsec About SQL, the syntax for querying databases\n\nFor a reference, your best bet is the <a href=\"http://www.sqlite.org/lang.html\">Structured Query Language reference</a> for SQLite.  For a tutorial; there is an abundance of <a href=\"http://www.google.com/search?q=sql+tutorial\">tutorials online</a>.  Here is a nice blog <a href=\"http://fluff.info/blog/arch/00000118.htm\">entry</a> about complementaries between SQL and matrix manipulation packages.\n\nApophenia currently supports two database engines: SQLite and mySQL/mariaDB. SQLite is the default, because it is simpler and generally more easygoing than mySQL, and supports in-memory databases.\n\nThe global <tt>apop_opts.db_engine</tt> is initially \\c NULL, indicating no preference\nfor a database engine. You can explicitly set it:\n\n\\code\napop_opts.db_engine='s' //use SQLite\napop_opts.db_engine='m' //use mySQL/mariaDB\n\\endcode\n\nIf \\c apop_opts.db_engine is still \\c NUL on your first database operation, then I will check\nfor an environment variable <tt>APOP_DB_ENGINE</tt>, and set \n<tt>apop_opts.db_engine='m'</tt> if it is found and matches (case insensitive) \\c mariadb or \\c mysql.\n\n\\code\nexport APOP_DB_ENGINE=mariadb\napop_text_to_db indata mtab db_for_maria\n\nunset APOP_DB_ENGINE\napop_text_to_db indata stab db_for_sqlite.db\n\\endcode\n\n\nWrite \\ref apop_data sets to the database using \\ref apop_data_print, with <tt>.output_type='d'</tt>.\n\n\\li Column names are inserted if there are any. If there are, all dots are converted\n    to underscores.  Otherwise, the columns will be named \\c c1, \\c c2, \\c c3, &c.\n\\li If \\ref apop_opts_type \"apop_opts.db_name_column\" is not blank (the default is\n    <tt>\"row_name\"</tt>), then a so-named column is created, and the row names are placed there.\n\\li If there are weights, they will be the last column of the table, and the column will be named \\c weights.\n\\li  If the table does not exist, create it. Use \\ref apop_data_print <tt>(data, \"tabname\", .output_type='d', .output_append='w')</tt>\n    to overwrite an existing table or with <tt>.output_append='a'</tt> to append.\n    Appending is the default. Or,\n    call \\ref apop_table_exists <tt>(\"tabname\", 'd')</tt> to ensure that the table is\n    removed ahead of time.\n\n\\li If your data set has zero data (i.e., is just a list of column names or is entirely\n    blank), \\ref apop_data_print returns without creating anything in the database.\n\\li Especially if you are using a pre-2007 version of SQLite, there may be a speed\n    gain to wrapping the call to this function in a begin/commit pair:\n\n\\code\napop_query(\"begin;\");\napop_data_print(dataset, .output_name=\"dbtab\", .output_type='d');\napop_query(\"commit;\");\n\\endcode\n\n\nFinally, Apophenia provides a few nonstandard SQL functions to facilitate math via database; see \\ref db_moments.\n\n\n\\section threads Threading\n\nApophenia uses OpenMP for threading. You generally do not need to know how OpenMP works\nto use Apophenia, and many points of work will thread without your doing anything.\n\n\\li All functions strive to be thread-safe. Part of how this is achieved is that static\nvariables are marked as thread-local or atomic, as per the C standard. There still\nexist compilers that can't implement thread-local or atomic variables, in which case\nyour safest bet is to set OMP's thread count to one as below (or get a new compiler).\n\n\\li Some functions modify their inputs. It is up to you to use those functions in\na thread-safe manner. The \\ref apop_matrix_realloc handles states and global variables\ncorrectly in a threaded environment, but if you have two threads resizing the same \\c\ngsl_matrix at the same time, you're going to have problems.\n\n\\li There are few compilers that don't support OpenMP. When compiling on such a system all work will be single-threaded.\n\n\\li Set the maximum number of threads to \\c N with the environment variable\n\n\\code\nexport OMP_NUM_THREADS N\n\\endcode\n\nor the C function\n\n\\code\n#include <omp.h>\nomp_set_num_threads(N);\n\\endcode\n\nUse one of these methods with <tt>N=1</tt> if you want a single-threaded program.\nYou can return later to using all available threads via <tt>omp_set_num_threads(omp_get_num_procs())</tt>.\n\n\\li \\ref apop_map and friends distribute their \\c for loop over the input \\ref apop_data\nset across multiple threads. Therefore, be careful to send thread-unsafe functions to\nit only after calling \\c omp_set_num_threads(1).\n\n\\li There are a few functions, like \\ref apop_model_draws, that rely on \\ref apop_map, and\ntherefore also thread by default.\n\n\\li The function \\ref apop_rng_get_thread retrieves a statically-stored RNG specific\nto a given thread. Therefore, if you use that function in the place of a \\c gsl_rng,\nyou can parallelize functions that make random draws.\n\n\\li \\ref apop_rng_get_thread allocates its store of threads using <tt>apop_opts.rng_seed</tt>,\nthen incrementing that seed by one. You thus probably have threads with seeds 479901,\n479902, 479903, .... [If you have a better way to do it, please feel free to modify the\ncode to implement your improvement and submit a pull request on Github.]\n\nSee <a href=\"http://modelingwithdata.org/arch/00000175.htm\">this tutorial on C\nthreading</a> if you would like to know more, or are unsure about whether your functions\nare thread-safe or not.\n*/\n\n\n/** \\page dataoverview Data sets\n\nThe \\ref apop_data structure represents a data set.  It joins together a \\c gsl_vector,\na \\c gsl_matrix, an \\ref apop_name, and a table of strings. It tries to be lightweight,\nso you can use it everywhere you would use a \\c gsl_matrix or a \\c gsl_vector.\n\nHere is a diagram showing a sample data set with all of the elements in place. Together,\nthey represent a data set where each row is an observation, which includes both numeric\nand text values, and where each row/column may be named.\n\n\\htmlinclude apop_data_fig.html\n\\latexinclude apop_data_fig.tex\n\nIn a regression, the vector would be the dependent variable, and the other columns\n(after factor-izing the text) the independent variables. Or think of the \\ref apop_data\nset as a partitioned matrix, where the vector is column -1, and the first column of\nthe matrix is column zero. Here is some sample code to print the vector and matrix,\nstarting at column -1 (but you can use \\ref apop_data_print to do this).\n\n\\code\nfor (int j = 0; j< data->matrix->size1; j++){\n    printf(\"%s\\t\", apop_name_get(data->names, j, 'r'));\n    for (int i = -1; i< data->matrix->size2; i++)\n        printf(\"%g\\t\", apop_data_get(data, j, i));\n    printf(\"\\n\");\n}\n\\endcode\n\nMost functions assume that each row represents one observation, so the data vector,\ndata matrix, and text have the same row count: \\c data->vector->size==data->matrix->size1\nand \\c data->vector->size==*data->textsize. This means that the \\ref apop_name structure\ndoesn't have separate \\c vector_names, \\c row_names, or \\c text_row_names elements:\nthe \\c rownames are assumed to apply for all.\n\nSee below for notes on managing the \\c text element and the row/column names.\n\n\\section pps Pages\n\nThe \\ref apop_data set includes a \\c more pointer, which will typically be \\c NULL,\nbut may point to another \\ref apop_data set. This is intended for a main data set\nand a second or third page with auxiliary information, such as estimated parameters\non the front page and their covariance matrix on page two, or predicted data on the\nfront page and a set of prediction intervals on page two.\n\nThe \\c more pointer is not intended for a linked list for millions of data points. In\nsuch situations, you can often improve efficiency by restructuring your data to use\na single table (perhaps via \\ref apop_data_pack and \\ref apop_data_unpack).\n\nMost functions, such as \\ref apop_data_copy and \\ref apop_data_free, will handle all\nthe pages of information. For example, an optimization search over multi-page parameter\nsets would search the space given by all pages.\n\nPages may also be appended as output or auxiliary information, such as\ncovariances, and an MLE would not search over these elements. Any page with a name in\nXML-ish brackets, such as <tt>\\<Covariance\\></tt>, is considered information about the\ndata, not data itself, and therefore ignored by search routines, missing data routines,\net cetera. This is achieved by a rule in \\ref apop_data_pack and \\ref apop_data_unpack.\n\nHere is a toy example that establishes a baseline data set, adds a page,\nmodifies it, and then later retrieves it.\n\\code\napop_data *d = apop_data_alloc(10, 10, 10); //the base data set, a 10-item vector + 10x10 matrix\napop_data *a_new_page = apop_data_add_page(d, apop_data_alloc(2,2), \"new 2 x 2 page\");\ngsl_vector_set_all(a_new_page->matrix, 3);\n\n//later:\napop_data *retrieved = apop_data_get_page(d, \"new\", 'r'); //'r'=search via regex, not literal match.\napop_data_print(retrieved); //print a 2x2 grid of 3s.\n\\endcode\n\n\\section datafns Functions for using apop_data sets\n\nThere are a great many functions to collate, copy, merge, sort, prune, and otherwise\nmanipulate the \\ref apop_data structure and its components.\n\n\\li\\ref apop_data_add_named_elmt\n\\li\\ref apop_data_copy\n\\li\\ref apop_data_fill\n\\li\\ref apop_data_memcpy\n\\li\\ref apop_data_pack\n\\li\\ref apop_data_rm_columns\n\\li\\ref apop_data_sort\n\\li\\ref apop_data_split\n\\li\\ref apop_data_stack\n\\li\\ref apop_data_transpose : transpose matrices (square or not) and text grids\n\\li\\ref apop_data_unpack\n\\li\\ref apop_matrix_copy\n\\li\\ref apop_matrix_realloc\n\\li\\ref apop_matrix_stack\n\\li\\ref apop_text_set\n\\li\\ref apop_text_paste\n\\li\\ref apop_text_to_data\n\\li\\ref apop_vector_copy\n\\li\\ref apop_vector_fill\n\\li\\ref apop_vector_stack\n\\li\\ref apop_vector_realloc\n\\li\\ref apop_vector_unique_elements\n\nApophenia builds upon the GSL, but it would be inappropriate to redundantly replicate\nthe <a href=\"http://www.gnu.org/software/gsl/manual/html_node/index.html\">GSL's documentation</a> here.\nMeanwhile, here are prototypes for a few common functions. The GSL's\nnaming scheme is very consistent, so a simple reminder of the function name may be\nsufficient to indicate how they are used.\n\n\\li <tt>gsl_matrix_swap_rows (gsl_matrix * m, size_t i, size_t j)</tt>\n\\li <tt>gsl_matrix_swap_columns (gsl_matrix * m, size_t i, size_t j)</tt>\n\\li <tt>gsl_matrix_swap_rowcol (gsl_matrix * m, size_t i, size_t j)</tt>\n\\li <tt>gsl_matrix_transpose_memcpy (gsl_matrix * dest, const gsl_matrix * src)</tt>\n\\li <tt>gsl_matrix_transpose (gsl_matrix * m)</tt> : square matrices only\n\\li <tt>gsl_matrix_set_all (gsl_matrix * m, double x)</tt>\n\\li <tt>gsl_matrix_set_zero (gsl_matrix * m)</tt>\n\\li <tt>gsl_matrix_set_identity (gsl_matrix * m)</tt>\n\\li <tt>gsl_matrix_memcpy (gsl_matrix * dest, const gsl_matrix * src)</tt>\n\\li <tt>void gsl_vector_set_all (gsl_vector * v, double x)</tt>\n\\li <tt>void gsl_vector_set_zero (gsl_vector * v)</tt>\n\\li <tt>int gsl_vector_set_basis (gsl_vector * v, size_t i)</tt>: set all elements to zero, but set item \\f$i\\f$ to one.\n\\li <tt>gsl_vector_reverse (gsl_vector * v)</tt>: reverse the order of your vector's elements\n\\li <tt>gsl_vector_ptr</tt> and <tt>gsl_matrix_ptr</tt>. To increment an element in a vector use, e.g., <tt>*gsl_vector_ptr(v, 7) += 3;</tt> or <tt>(*gsl_vector_ptr(v, 7))++</tt>.\n\\li <tt>gsl_vector_memcpy (gsl_vector * dest, const gsl_vector * src)</tt>\n\n\\subsection readin Reading from text files\n\nThe \\ref apop_text_to_data() function takes in the name of a text file with a grid of data in (comma|tab|pipe|whatever)-delimited format and reads it to a matrix. If there are names in the text file, they are copied in to the data set. See \\ref text_format for the full range and details of what can be read in.\n\nIf you have any columns of text, then you will need to read in via the database: use\n\\ref apop_text_to_db() to convert your text file to a database table, \ndo any database-appropriate cleaning of the input data, then use \\ref\napop_query_to_data() or \\ref apop_query_to_mixed_data() to pull the data to an \\ref apop_data set.\n\n\\subpage text_format\n\n\\section datalloc Alloc/free\n\nYou may not need to use these functions often, given that \\ref apop_query_to_data, \\ref apop_text_to_data, and many transformation functions will auto-allocate \\ref apop_data sets for you.\n\nThe \\ref apop_data_alloc function allocates a vector, a matrix, or both. After this call, the structure will have blank names, \\c NULL \\c text element, and \\c NULL \\c weights.  See \\ref names for discussion of filling the names. Use \\ref apop_text_alloc to allocate the \\c text grid. The \\c weights are a simple \\c gsl_vector, so allocate a 100-unit weights vector via <tt>allocated_data_set->weights = gsl_vector_alloc(100)</tt>.\n\nExamples of use can be found throughout the documentation; for example, see \\ref gentle.\n\n\\li\\ref apop_data_alloc\n\\li\\ref apop_data_calloc\n\\li\\ref apop_data_free\n\\li\\ref apop_text_alloc : allocate or resize the text part of an \\ref apop_data set.\n\\li\\ref apop_text_free\n\nSee also:\n\n\\li <tt>gsl_matrix * gsl_matrix_alloc (size_t n1, size_t n2)</tt>\n\\li <tt>gsl_matrix * gsl_matrix_calloc (size_t n1, size_t n2)</tt>\n\\li <tt>void gsl_matrix_free (gsl_matrix * m)</tt>\n\\li <tt>gsl_vector * gsl_vector_alloc (size_t n)</tt>\n\\li <tt>gsl_vector * gsl_vector_calloc (size_t n)</tt>\n\\li <tt>void gsl_vector_free (gsl_vector * v)</tt>\n\n\n\\section gslviews \tUsing views\n\nThere are several macros for the common task of viewing a single row or column of a \\ref\napop_data set.\n\n\\code\napop_data *d = apop_query_to_data(\"select obs1, obs2, obs3 from a_table\");\n\n//Get a column using its name. Note that the generated view, ov, is the\n//last item named in the call to the macro.\nApop_col_t(d, \"obs1\", ov);\ndouble obs1_sum = apop_vector_sum(ov);\n\n//Get row zero of the data set's matrix as a vector; get its sum\ndouble row_zero_sum = apop_vector_sum(Apop_rv(d, 0));\n\n//Get a row or rows as a standalone one-row apop_data set\napop_data_print(Apop_r(d, 0));\n\n//ten rows starting at row 3:\napop_data *d10 = Apop_rs(d, 3, 10);\napop_data_print(d10);\n\n//Column zero's sum\ngsl_vector *cv = Apop_cv(d, 0);\ndouble col_zero_sum = apop_vector_sum(cv);\n//or one one line:\ndouble col_zero_sum = apop_vector_sum(Apop_cv(d, 0));\n\n//Pull a 10x5 submatrix, whose origin element is the (2,3)rd\n//element of the parent data set's matrix\n\ndouble sub_sum = apop_matrix_sum(Apop_subm(d, 2,3, 10,5));\n\\endcode\n\nBecause these macros can be used as arguments to a function, these macros have abbreviated names to save line space.\n\n\\li\\ref Apop_r : get row as one-observation \\ref apop_data set\n\\li\\ref Apop_c : get column as \\ref apop_data set\n\\li\\ref Apop_cv : get column as \\ref gsl_vector\n\\li\\ref Apop_rv : get row as \\ref gsl_vector\n\\li\\ref Apop_cs : get columns as \\ref apop_data set\n\\li\\ref Apop_rs : get rows as \\ref apop_data set\n\\li\\ref Apop_mcv : matrix column as vector\n\\li\\ref Apop_mrv : matrix row as vector\n\\li\\ref Apop_subm : get submatrix of a \\ref gsl_matrix\n\nA second set of macros have a slightly different syntax, taking the name of the object to be declared as the last argument. These can not be used as expressions such as function arguments.\n\n\\li\\ref Apop_col_t\n\\li\\ref Apop_row_t\n\\li\\ref Apop_col_tv\n\\li\\ref Apop_row_tv\n\nThe view is an automatic variable, not a pointer, and therefore disappears at the end\nof the scope in which it is declared. If you want to retain the data after the function\nexits, copy it to another vector:\n\n\\code\nreturn apop_vector_copy(Apop_rv(d, 2)); //return a gsl_vector copy of row 2\n\\endcode\n\nCurly braces always delimit scope, not just at the end of a function. \nWhen program evaluation exits a given block, all variables in that block are\nerased. Here is some sample code that won't work:\n\n\\code\napop_data *outdata;\nif (get_odd){\n    outdata = Apop_r(data, 1);\n} else {\n    outdata = Apop_r(data, 0);\n}\napop_data_print(outdata); //breaks: outdata points to out-of-scope variables.\n\\endcode\n\nFor this if/then statement, there are two sets of local variables\ngenerated: one for the \\c if block, and one for the \\c else block. By the last line,\nneither exists. You can get around the problem here by making sure to not put the macro\ndeclaring new variables in a block. E.g.:\n\n\\code\napop_data *outdata = Apop_r(data, get_odd ? 1 : 0);\napop_data_print(outdata);\n\\endcode\n\n\n\\section data_set_get Set/get\n\nFirst, some examples:\n\n\\code\napop_data *d = apop_data_alloc(10, 10, 10);\napop_name_add(d->names, \"Zeroth row\", 'r');\napop_name_add(d->names, \"Zeroth col\", 'c');\n\n//set cell at row=8 col=0 to value=27\napop_data_set(d, 8, 0, .val=27);\nassert(apop_data_get(d, 8, .colname=\"Zeroth\") == 27);\ndouble *x = apop_data_ptr(d, .col=7, .rowname=\"Zeroth\");\n*x = 270;\nassert(apop_data_get(d, 0, 7) == 270);\n\n// This is invalid---the value doesn't follow the colname. Use .val=5.\n// apop_data_set(d, .row = 3, .colname=\"Column 8\", 5);  \n\n// This is OK, to set (3, 8) to 5:\napop_data_set(d, 3, 8, 5);\n\n\n//apop_data set holding a scalar:\napop_data *s = apop_data_alloc(1);\napop_data_set(s, .val=12);\nassert(apop_data_get(s) == 12);\n\n//apop_data set holding a vector:\napop_data *v = apop_data_alloc(12);\nfor (int i=0; i< 12; i++) apop_data_set(s, i, .val=i*10);\nassert(apop_data_get(s,3) == 30);\n\n//This is a common form from pulling from a list of named scalars, \n//produced via apop_data_add_named_elmt\ndouble AIC = apop_data_get(your_model->info, .rowname=\"AIC\");\n\\endcode\n\n\\li The versions that take a column/row name use  \\ref apop_name_find\n    for the search; see notes there on the name matching rules.\n\\li For those that take a column number, column -1 is the vector element. \n\\li For those that take a column name, I will search the vector last---if I don't find the name among the matrix columns, but the name matches the vector name, I return column -1.\n\\li If you give me both a <tt>.row</tt> and a <tt>.rowname</tt>, I go with the name; similarly for <tt>.col</tt> and\n<tt>.colname</tt>.\n\\li You can give the name of a page, e.g.\n\\code\ndouble AIC = apop_data_get(data, .rowname=\"AIC\", .col=-1, .page=\"<Info>\");\n\\endcode\n\n\\li Numeric values default to zero, which is how the examples above that treated the \\ref apop_data set as a vector or scalar could do so relatively gracefully.\nSo <tt>apop_data_get(dataset, 1)</tt> gets item (1, 0) from the matrix\nelement of \\c dataset. But as a do-what-I-mean exception, if there is no matrix element\nbut there is a vector, then this form will get vector element 1. Relying on this DWIM\nexception is useful iff you can guarantee that a data set will have only a vector or\na matrix but not both. Otherwise, be explicit and use <tt>apop_data_get(dataset, 1, -1)</tt>.\n\nThe \\ref apop_data_ptr function follows the lead of \\c gsl_vector_ptr and \\c\ngsl_matrix_ptr, and like those functions, returns a pointer to the appropriate \\c double.\nFor example, to increment the (3,7)th element of an \\ref apop_data set:\n\n\\code\n(*apop_data_ptr(dataset, 3, 7))++;\n\\endcode\n\n\\li\\ref apop_data_get\n\\li\\ref apop_data_set\n\\li\\ref apop_data_ptr : returns a pointer to the element.\n\\li\\ref apop_data_get_page : retrieve a named page from a data set. If you only need a few items, you can specify a page name to \\c apop_data_(get|set|ptr).\n\n    See also:\n\n\\li <tt>double gsl_matrix_get (const gsl_matrix * m, size_t i, size_t j)</tt>\n\\li <tt>double gsl_vector_get (const gsl_vector * v, size_t i)</tt>\n\\li <tt>void gsl_matrix_set (gsl_matrix * m, size_t i, size_t j, double x)</tt>\n\\li <tt>void gsl_vector_set (gsl_vector * v, size_t i, double x)</tt>\n\\li <tt>double * gsl_matrix_ptr (gsl_matrix * m, size_t i, size_t j)</tt>\n\\li <tt>double * gsl_vector_ptr (gsl_vector * v, size_t i)</tt>\n\\li <tt>const double * gsl_matrix_const_ptr (const gsl_matrix * m, size_t i, size_t j)</tt>\n\\li <tt>const double * gsl_vector_const_ptr (const gsl_vector * v, size_t i)</tt>\n\\li <tt>gsl_matrix_get_row (gsl_vector * v, const gsl_matrix * m, size_t i)</tt>\n\\li <tt>gsl_matrix_get_col (gsl_vector * v, const gsl_matrix * m, size_t j)</tt>\n\\li <tt>gsl_matrix_set_row (gsl_matrix * m, size_t i, const gsl_vector * v)</tt>\n\\li <tt>gsl_matrix_set_col (gsl_matrix * m, size_t j, const gsl_vector * v)</tt>\n\n\n\\section mapply   Map/apply\n\n\\anchor outline_mapply\nThese functions allow you to send each element of a vector or matrix to a function, either producing a new matrix (map) or transforming the original (apply).  The \\c ..._sum functions return the sum of the mapped output.\n\nThere are two types, which were developed at different times. The \\ref apop_map and\n\\ref apop_map_sum functions use variadic function inputs to cover a lot of different\ntypes of process depending on the inputs. Other functions with types in their names,\nlike \\ref apop_matrix_map and \\ref apop_vector_apply, may be easier to use in some\ncases. They use the same routines internally, so use whichever type is convenient.\n\nYou can do many things quickly with these functions.\n\nGet the sum of squares of a vector's elements:\n\n\\code\n  //given apop_data *dataset and gsl_vector *v:\ndouble sum_of_squares = apop_map_sum(dataset, gsl_pow_2);\ndouble sum_of_sqvares = apop_vector_map_sum(v, gsl_pow_2);\n\\endcode\n\nCreate an index vector [\\f$0, 1, 2, ...\\f$].\n\n\\code\ndouble index(double in, int index){return index;}\napop_data *d = apop_map(apop_data_alloc(100), .fn_di=index, .inplace='y');\n\\endcode\n\nGiven your log likelihood function, which acts on a \\ref apop_data set with only one\nrow, and a data set where each row of the matrix is an observation, find the total\nlog likelihood via:\n\n\\code\nstatic double your_log_likelihood_fn(apop_data * in)\n     {[your math goes here]}\n\ndouble total_ll = apop_map_sum(dataset, .fn_r=your_log_likelihood_fn);\n\\endcode\n\nHow many missing elements are there in your data matrix? \n\n\\code\nstatic double nan_check(const double in){ return isnan(in);}\n\nint missing_ct = apop_map_sum(in, nan_check, .part='m');\n\\endcode\n\nGet the mean of the not-NaN elements of a data set:\n\n\\code\nstatic double no_nan_val(const double in){ return isnan(in)? 0 : in;}\nstatic double not_nan_check(const double in){ return !isnan(in);}\n\nstatic double apop_mean_no_nans(apop_data *in){\n    return apop_map_sum(in, no_nan_val)/apop_map_sum(in, not_nan_check);\n}\n\\endcode\n\nThe following program randomly generates a data set where each row is a list of numbers with a different mean. It then finds the \\f$t\\f$ statistic for each row, and the confidence with which we reject the claim that the statistic is less than or equal to zero.\n\nNotice how the older \\ref apop_vector_apply uses file-global variables to pass information into the functions, while the \\ref apop_map uses a pointer to send parameters to the functions.\n\n\\include t_test_by_rows.c\n\nOne more toy example demonstrating the use of \\ref apop_map and \\ref apop_map_sum :\n\n\\include apop_map_row.c\n\n\n\\li If the number of threads is greater than one, then the matrix will be broken\ninto chunks and each sent to a different thread. Notice that the GSL is generally\nthreadsafe, and SQLite is threadsafe conditional on several commonsense caveats that\nyou'll find in the SQLite documentation. See \\ref apop_rng_get_thread() to use the GSL's RNGs in a threaded environment.\n\n\\li The \\c ...sum functions are convenience functions that call \\c ...map and then add up the contents. Thus, you will need to have adequate memory for the allocation of the temp matrix/vector.\n\n\\li\\ref apop_map\n\\li\\ref apop_map_sum\n\\li\\ref apop_matrix_apply\n\\li\\ref apop_matrix_map\n\\li\\ref apop_matrix_map_all_sum\n\\li\\ref apop_matrix_map_sum\n\\li\\ref apop_vector_apply\n\\li\\ref apop_vector_map\n\\li\\ref apop_vector_map_sum\n\n\n\n\\section  matrixmathtwo  Basic Math\n\n\\li\\ref apop_vector_exp : exponentiate every element of a vector\n\\li\\ref apop_vector_log : take the natural log of every element of a vector\n\\li\\ref apop_vector_log10 : take the log (base 10) of every element of a vector\n\\li\\ref apop_vector_distance : find the distance between two vectors via various metrics\n\\li\\ref apop_vector_normalize : scale/shift a matrix to have mean zero, sum to one, have a range of exactly \\f$[0, 1]\\f$, et cetera\n\\li\\ref apop_vector_entropy : calculate the entropy of a vector of frequencies or probabilities\n\nSee also:\n\n\\li <tt>int gsl_matrix_add (gsl_matrix * a, const gsl_matrix * b)</tt>\n\\li <tt>int gsl_matrix_sub (gsl_matrix * a, const gsl_matrix * b)</tt>\n\\li <tt>int gsl_matrix_mul_elements (gsl_matrix * a, const gsl_matrix * b)</tt>\n\\li <tt>int gsl_matrix_div_elements (gsl_matrix * a, const gsl_matrix * b)</tt>\n\\li <tt>int gsl_matrix_scale (gsl_matrix * a, const double x)</tt>\n\\li <tt>int gsl_matrix_add_constant (gsl_matrix * a, const double x)</tt>\n\\li <tt>gsl_vector_add (gsl_vector * a, const gsl_vector * b)</tt>\n\\li <tt>gsl_vector_sub (gsl_vector * a, const gsl_vector * b)</tt>\n\\li <tt>gsl_vector_mul (gsl_vector * a, const gsl_vector * b)</tt>\n\\li <tt>gsl_vector_div (gsl_vector * a, const gsl_vector * b)</tt>\n\\li <tt>gsl_vector_scale (gsl_vector * a, const double x)</tt>\n\\li <tt>gsl_vector_add_constant (gsl_vector * a, const double x)</tt>\n\n            \n\\section  matrixmath  Matrix math\n\n\\li\\ref apop_dot : matrix \\f$\\cdot\\f$ matrix, matrix \\f$\\cdot\\f$ vector, or vector \\f$\\cdot\\f$ matrix\n\\li\\ref apop_matrix_determinant\n\\li\\ref apop_matrix_inverse\n\\li\\ref apop_det_and_inv : find determinant and inverse at the same time\n\nSee the GSL documentation for myriad further options.\n\n\n\\section  sumstats  Summary stats\n\n\\li\\ref apop_data_summarize\n\\li\\ref apop_vector_moving_average\n\\li\\ref apop_vector_percentiles\n\\li\\ref apop_vector_bounded\n\nSee also:\n\n\\li <tt>double gsl_matrix_max (const gsl_matrix * m)</tt>\n\\li <tt>double gsl_matrix_min (const gsl_matrix * m)</tt>\n\\li <tt>void gsl_matrix_minmax (const gsl_matrix * m, double * min_out, double * max_out)</tt>\n\\li <tt>void gsl_matrix_max_index (const gsl_matrix * m, size_t * imax, size_t * jmax)</tt>\n\\li <tt>void gsl_matrix_min_index (const gsl_matrix * m, size_t * imin, size_t * jmin)</tt>\n\\li <tt>void gsl_matrix_minmax_index (const gsl_matrix * m, size_t * imin, size_t * jmin, size_t * imax, size_t * jmax)</tt>\n\\li <tt>gsl_vector_max (const gsl_vector * v)</tt>\n\\li <tt>gsl_vector_min (const gsl_vector * v)</tt>\n\\li <tt>gsl_vector_minmax (const gsl_vector * v, double * min_out, double * max_out)</tt>\n\\li <tt>gsl_vector_max_index (const gsl_vector * v)</tt>\n\\li <tt>gsl_vector_min_index (const gsl_vector * v)</tt>\n\\li <tt>gsl_vector_minmax_index (const gsl_vector * v, size_t * imin, size_t * imax)</tt>\n\n\n\\section  moments  Moments\n\nFor most of these, you can add a weights vector for weighted mean/var/cov/..., such as\n<tt>apop_vector_mean(d->vector, .weights=d->weights)</tt>\n\n\\li\\ref apop_mean : the first three with short names operate on a vector.\n\\li\\ref apop_sum\n\\li\\ref apop_var\n\\li\\ref apop_matrix_sum\n\\li\\ref apop_data_correlation\n\\li\\ref apop_data_covariance\n\\li\\ref apop_data_summarize\n\\li\\ref apop_matrix_mean\n\\li\\ref apop_matrix_mean_and_var\n\\li\\ref apop_vector_correlation\n\\li\\ref apop_vector_cov\n\\li\\ref apop_vector_kurtosis\n\\li\\ref apop_vector_kurtosis_pop \n\\li\\ref apop_vector_mean\n\\li\\ref apop_vector_skew\n\\li\\ref apop_vector_skew_pop\n\\li\\ref apop_vector_sum\n\\li\\ref apop_vector_var\n\\li\\ref apop_vector_var_m \n\n\n\\section convsec   Conversion among types\n\nThere are no functions provided to convert from \\ref apop_data to the constituent\nelements, because you don't need a function.\n\nIf you need an individual element, you can use its pointer to retrieve it:\n\n\\code\napop_data *d = apop_query_to_mixed_data(\"vmmw\", \"select result, age, \"\n                                     \"income, replicate_weight from data\");\ndouble avg_result = apop_vector_mean(d->vector, .weights=d->weights);\n\\endcode\n\nIn the other direction, you can use compound literals to wrap an \\ref apop_data struct\naround a loose vector or matrix:\n\n\\code\n//Given:\ngsl_vector *v;\ngsl_matrix *m;\n\n// Then this form wraps the elements into automatically-allocated apop_data structs.\n\napop_data *dv = &(apop_data){.vector=v}; \napop_data *dm = &(apop_data){.matrix=m};\n\napop_data *v_dot_m = apop_dot(dv, dm);\n\n//Here is a macro to hide C's ugliness:\n#define As_data(...) (&(apop_data){__VA_ARGS__})\n\napop_data *v_dot_m2 = apop_dot(As_data(.vector=v), As_data(.matrix=m));\n\n//The wrapped object is an automatically-allocated structure pointing to the\n//original data. If it needs to persist or be separate from the original,\n//make a copy:\napop_data *dm_copy = apop_data_copy(As_data(.vector=v, .matrix=m));\n\\endcode\n\n\\li\\ref apop_array_to_vector : <tt>double*</tt>\\f$\\to\\f$ <tt>gsl_vector</tt>\n\\li\\ref apop_data_fill : <tt>double*</tt>\\f$\\to\\f$  \\ref apop_data\n\\li\\ref apop_data_falloc : macro to allocate and fill a \\ref apop_data set\n\\li\\ref apop_text_to_data : delimited text file\\f$\\to\\f$ \\ref apop_data\n\\li\\ref apop_text_to_db : delimited text file\\f$\\to\\f$ database table\n\\li\\ref apop_vector_to_matrix\n\n\n\\section names   Name handling\n\nIf you generate your data set via \\ref apop_text_to_data or from the database via\n\\ref apop_query_to_data (or \\ref apop_query_to_text or \\ref apop_query_to_mixed_data)\nthen column names appear as expected.  Set <tt>apop_opts.db_name_column</tt> to the\nname of a column in your query result to use that column name for row names.\n\nSample uses, given \\ref apop_data set <tt>d</tt>:\n\n\\code\nint row_name_count = d->names->rowct\nint col_name_count = d->names->colct\nint text_name_count = d->names->textct\n\n//Manually add names in sequence:\napop_name_add(d->names, \"the vector\", 'v');\napop_name_add(d->names, \"row 0\", 'r');\napop_name_add(d->names, \"row 1\", 'r');\napop_name_add(d->names, \"row 2\", 'r');\napop_name_add(d->names, \"numeric column 0\", 'c');\napop_name_add(d->names, \"text column 0\", 't');\napop_name_add(d->names, \"The name of the data set.\", 'h');\n\n//or append several names at once\napop_data_add_names(d, 'c', \"numeric column 1\", \"numeric column 2\", \"numeric column 3\");\n\n//point to element i from the row/col/text names:\n\nchar *rowname_i = d->names->row[i];\nchar *colname_i = d->names->col[i];\nchar *textname_i = d->names->text[i];\n\n//The vector also has a name:\nchar *vname = d->names->vector;\n\\endcode\n\n\\li\\ref apop_name_add : add one name\n\\li\\ref apop_data_add_names : add a sequence of names at once\n\\li\\ref apop_name_stack : copy the contents of one name list to another\n\\li\\ref apop_name_find : find the row/col number for a given name.\n\\li\\ref apop_name_print : print the \\ref apop_name struct, for diagnostic purposes.\n\n\n\\section textsec   Text data\n\nThe \\ref apop_data set includes a grid of strings, named <tt>text</tt>, for holding text data. \n\nText should be encoded in UTF-8. ASCII is a subset of UTF-8, so that's OK too.\n\nThere are a few simple forms for handling the \\c text element of an \\c apop_data set.\n\n\\li Use \\ref apop_text_alloc to allocate the block of text. It is actually a realloc function, which you can use to resize an existing block without leaks. See the example below.\n\\li Use \\ref apop_text_set to write text elements. It replaces any existing text in the given slot without memory leaks.\n\\li The number of rows of text data in <tt>tdata</tt> is\n<tt>tdata->textsize[0]</tt>; \nthe number of columns is <tt>tdata->textsize[1]</tt>.\n\\li Refer to individual elements using the usual 2-D array notation, <tt>tdata->text[row][col]</tt>.\n\\li <tt>x[0]</tt> can always be written as <tt>*x</tt>, which may save some typing. The number of rows is <tt>*tdata->textsize</tt>. If you have a single column of text data (i.e., all data is in column zero), then item \\c i is <tt>*tdata->text[i]</tt>. If you know you have exactly one cell of text, then its value is <tt>**tdata->text</tt>.\n\\li After \\ref apop_text_alloc, all elements are the empty string <tt>\"\"</tt>, which\nyou can check via \n\\code\nif (!strlen(dataset->text[i][j])) printf(\"<blank>\")\n//or\nif (!*dataset->text[i][j]) printf(\"<blank>\")\n\\endcode\nFor the sake of efficiency when dealing with large, sparse data sets, all blank cells\npoint to <em>the same</em> static empty string, meaning that freeing cells must be\ndone with care. Your best bet is to rely on \\ref apop_text_set, \\ref apop_text_alloc,\nand \\ref apop_text_free to do the memory management for you.\n\nHere is a sample program that uses these forms, plus a few text-handling functions.\n\n\\include eg/text_demo.c\n\n\\li\\ref apop_data_transpose() : also transposes the text data. Say that you use\n<tt>dataset = apop_query_to_text(\"select onecolumn from data\");</tt> then you have a\nsequence of strings, <tt>d->text[0][0], d->text[1][0], </tt>.... After <tt>apop_data\n*dt = apop_data_transpose(dataset)</tt>, you will have a single list of strings,\n<tt>dt->text[0]</tt>, which is often useful as input to list-of-strings handling\nfunctions.\n\n\\li\\ref apop_query_to_text\n\\li\\ref apop_text_alloc : allocate or resize the text part of an \\ref apop_data set.\n\\li\\ref apop_text_set : replace a single cell of the text grid with new text.\n\\li\\ref apop_text_paste : convert a table of strings into one long string.\n\\li\\ref apop_text_unique_elements : get a sorted list of unique elements for one column of text.\n\\li\\ref apop_text_free : you may never need this, because \\ref apop_data_free calls it.\n\\li\\ref apop_regex : friendlier front-end for POSIX-standard regular expression\n            searching; pulls matches into an \\ref apop_data set.\n\\li\\ref apop_text_unique_elements\n\n\\subsection fact   Generating factors\n\n\\em Factor is jargon for a numbered category. Number-crunching programs prefer integers over text, so we need a function to produce a one-to-one mapping from text categories into numeric factors. \n\nA \\em dummy is a variable that is either one or zero, depending on membership in a given\ngroup. Some methods (typically when the variable is an input or independent variable\nin a regression) prefer dummies; some methods (typically for outcome or dependent\nvariables) prefer factors. The functions that generate factors and dummies will add\nan informational page to your \\ref apop_data set with a name like <tt>\\<categories\nfor your_column\\></tt> listing the conversion from the artificial numeric factor to\nthe original data. Use \\ref apop_data_get_factor_names to get a pointer to that page.\n\nYou can use the factor table to translate from numeric categories back to text (though\nyou probably have the original text column in your data anyway).\n\nHaving the factor list in an auxiliary table makes it easy to ensure that multiple\n\\ref apop_data sets use the same single categorization scheme. Generate factors in the\nfirst set, then copy the factor list to the second, then run \\ref apop_data_to_factors\non the second:\n\n\\code\napop_data_to_factors(d1);\nd2->more = apop_data_copy(apop_data_get_factor_names(d1));\napop_data_to_factors(d2);\n\\endcode\n\nSee the documentation for \\ref apop_logit for a sample linear model using a factor dependent variable and dummy independent variable.\n\n\\li\\ref apop_data_to_dummies\n\\li\\ref apop_data_to_factors\n\\li\\ref apop_data_get_factor_names\n*/\n\n/** \\page dbs Databases\n\nThese are convenience functions to handle interaction with SQLite or mySQL/mariaDB. They open one and only one database, and handle most of the interaction therewith for you.\n\nYou will probably first use \\ref apop_text_to_db to pull data into the database, then \\ref apop_query to clean the data in the database, and finally \\ref apop_query_to_data to pull some subset of the data out for analysis.\n\n\\li In all cases, your query may be in <tt>printf</tt> form. For example:\n\\code\nchar tabname[] = \"demographics\";\nchar colname[] = \"heights\";\nint min_height = 175;\napop_query(\"select %s from %s where %s > %i\", colname, tabname, colname, min_height);\n\\endcode\n\n\nSee the \\ref db_moments section below for not-SQL-standard math functions that you can\nuse when sending queries from Apophenia, such as \\c pow, \\c stddev, or \\c sqrt.\n\n\\li \\ref apop_text_to_db : Read a text file on disk into the database. Data analysis projects often start with a call to this.\n\\li \\ref apop_data_print : If you include the argument <tt>.output_type='d'</tt>, this prints your \\ref apop_data set to the database.\n\\li \\ref apop_query : Manipulate the database, return nothing (e.g., insert rows or create table).\n\\li \\ref apop_db_open : Optional, for when you want to use a database on disk.\n\\li \\ref apop_db_close : A useful (and in some cases, optional) companion to \\ref apop_db_open.\n\\li \\ref apop_table_exists : Check to make sure you aren't reinventing or destroying data. Also, a clean way to drop a table.\n\n\\li Apophenia reserves the right to insert temp tables into the opened database. They\nwill all have names beginning with <tt>apop_</tt>, so the reader is advised to not\ngenerate tables with such names, and is free to ignore or delete any such tables that\nturn up.\n\\li If you need to deal with two databases, use SQL's <a\nhref=\"https://sqlite.org/lang_attach.html\"><tt>attach database</tt></a>. By default\nwith SQLite, Apophenia opens an in-memory database handle. It is a sensible workflow to\nuse the faster in-memory database as the primary database, and then attach an on-disk database\nto read in data and write final output tables.\n\n\\section edftd Extracting data from the database\n\n\\li\\ref apop_db_to_crosstab : take up to three columns in the database (row, column, value) and produce a table of values.\n\\li\\ref apop_query_to_data\n\\li\\ref apop_query_to_float\n\\li\\ref apop_query_to_mixed_data\n\\li\\ref apop_query_to_text\n\\li\\ref apop_query_to_vector\n\n\\section wdttd Writing data to the database\n\nSee the print functions at \\ref Legi. E.g.\n\n\\code\napop_data_print(yourdata, .output_type='d', .output_name=\"dbtab\");\n\\endcode\n\n\\section cmdline Command-line utilities\n\nA few functions have proven to be useful enough to be worth breaking out into their own programs, for use in scripts or other data analysis from the command line:\n\n\\li The \\c apop_text_to_db command line utility is a wrapper for the \\ref apop_text_to_db command.\n\\li The \\c apop_db_to_crosstab function is a wrapper for the \\ref apop_db_to_crosstab function.\n\n\\section db_moments Database moments (plus pow()!)\n\nSQLite lets users define new functions for use in queries, and Apophenia uses this facility to define a few common functions.\n\n\\li <tt>select ran() from table</tt> will produce a new random number between zero and one for every row of the input table, using \\c gsl_rng_uniform. \n\n\\li The SQL standard includes the <tt>count(x)</tt> and <tt>avg(x)</tt> aggregators,\nbut statisticians are usually interested in higher moments as well---at least the\nvariance. Therefore, SQL queries using the Apophenia library may include any of these moments:\n\n\\code\nselect count(x), stddev(x), avg(x), var(x), variance(x), skew(x), kurt(x), kurtosis(x),\nstd(x), stddev_samp(x), stddev_pop(x), var_samp(x), var_pop(x)\nfrom table\ngroup by whatever\n\\endcode\n\n<tt>var</tt> and <tt>variance</tt>; <tt>kurt</tt> and <tt>kurtosis</tt> do the same thing; choose the one that sounds better to you.\nKurtosis is the fourth central moment by itself, not adjusted by subtracting three or dividing by variance squared.\n<tt>var</tt>, <tt>var_samp</tt>, <tt>stddev</tt> and <tt>stddev_samp</tt> give sample variance/standard deviation; <tt>variance</tt>, <tt>var_pop</tt>, <tt>std</tt> and <tt>stddev_pop</tt> give population standard deviation. The plethora of variants are for mySQL compatibility.\n\n\\li The  var/skew/kurtosis functions calculate sample moments. If you want the second\npopulation moment, multiply the variance by \\f$(n-1)/n\\f$; for the third population moment,\nmultiply the skew by    \\f$(n-1)(n-2)/n^2\\f$. The equation for the unbiased sample kurtosis\nas calculated in <a href=\"http://modelingwithdata.org/pdfs/moments.pdf\">Appendix M of\n<em>Modeling with Data</em></a> is not quite as easy to adjust.\n\n\\li Also provided: wrapper functions for standard math library\nfunctions---<tt>sqrt(x)</tt>, <tt>pow(x,y)</tt>, <tt>exp(x)</tt>, <tt>log(x)</tt>,\nand trig functions. They call the standard math library function of the same name\nto calculate \\f$\\sqrt{x}\\f$, \\f$x^y\\f$, \\f$e^x\\f$, \\f$\\ln(x)\\f$, \\f$\\sin(x)\\f$,\n\\f$\\arcsin(x)\\f$, et cetera. For example:\n\n\\code\nselect sqrt(x), pow(x,0.5), exp(x), log(x), log10(x),\n    sin(x), cos(x), tan(x), asin(x), acos(x), atan(x)\nfrom table\n\\endcode\n\n\\li The <tt>ran()</tt> function calls <tt>gsl_rng_uniform</tt> to produce a uniform\ndraw between zero and one. It uses the stock of RNGs from \\ref apop_rng_get_thread.\n\nHere is a test script using many of the above.\n\n\\include db_fns.c\n*/\n\n\n/** \\page modelsec Models\nSee \\ref gentle_model for an overview of the intent and basic use of the \\ref apop_model struct.\n\nThis segment goes into greater detail on the use of existing \\ref apop_model objects.\nIf you need to write a new model, see \\ref modeldetails.\n\nThe \\c estimate function will estimate the parameters of your model. Just prep the data, select a model, and produce an estimate:\n\n\\code\n    apop_data *data = apop_query_to_data(\"select outcome, in1, in2, in3 from dataset\");\n    apop_model *the_estimate = apop_estimate(data, apop_probit);\n    apop_model_print(the_estimate);\n\\endcode\n\nAlong the way to estimating the parameters, most models also find covariance estimates for\nthe parameters, calculate statistics like log likelihood, and so on, which the final print statement will show.\n\nThe <tt>apop_probit</tt> model that ships with Apophenia is unparameterized:\n<tt>apop_probit->parameters==NULL</tt>. The output from the estimation,\n<tt>the_estimate</tt>, has the same form as <tt>apop_probit</tt>, but\n<tt>the_estimate->parameters</tt> has a meaningful value.\n\nApophenia ships with many well-known models for your immediate use, including\nprobability distributions, such as the \\ref apop_normal, \\ref apop_poisson, or \\ref\napop_beta models. The data is assumed to have been drawn from a given distribution and\nthe question is only what distributional parameters best fit. For example, given that\nthe data is Normally distributed, find \\f$\\mu\\f$ and \\f$\\sigma\\f$ via\n<tt>apop_estimate(your_data, apop_normal)</tt>.\n\nThere are also linear models like \\ref apop_ols, \\ref apop_probit, and \\ref apop_logit. As in the example, they are on equal footing with the distributions, so nothing keeps you from making random draws from an estimated linear model.\n\n  \\li If you send a data set with the \\c weights vector filled, \\ref apop_ols estimates Weighted OLS.\n  \\li If the dependent variable has more than two categories, the \\ref apop_probit and\n\\ref apop_logit models estimate a multinomial logit or probit.\n  \\li There are separate \\ref apop_normal and \\ref apop_multivariate_normal functions\nbecause the parameter formats are slightly different: the univariate Normal keeps both\n\\f$\\mu\\f$ and \\f$\\sigma\\f$ in the vector element of the parameters; the multivariate\nversion uses the vector for the vector of means and the matrix for the \\f$\\Sigma\\f$\nmatrix. The univariate version is so heavily used that it merits a special-case model.\n\nSee the \\ref models page for a list of models shipped with Apophenia,\nincluding popular favorites like \\ref apop_beta, \\ref apop_binomial, \\ref apop_iv\n(instrumental variables), \\ref apop_kernel_density, \\ref apop_loess, \\ref apop_lognormal,\n\\ref apop_pmf (see \\ref histosec below), and \\ref apop_poisson.\n\nSimulation models seem to not fit this form, but you will see below that if you can write an objective function for the \\c p method of the model, you can use the above tools. Notably, you can estimate parameters via maximum likelihood and then give confidence intervals around those parameters.\n\n<em>More estimation output</em>\n\nIn the \\ref apop_model returned by \\ref apop_estimate, you will find:\n\n  \\li The actual parameter estimates are in an \\ref apop_data set at \\c your_model->parameters.\n  \\li A pointer to the \\ref apop_data set used for estimation, named \\c data.\n  \\li Scalar statistics of the model listed in the output model's \\c info group,\nwhich may include some hypothesis tests, a list of expected values, log likelihood, AIC, AIC_c, BIC, et cetera.\nThese can be retrieved via a form like\n\\code\napop_data_get(your_model->info, .rowname=\"log likelihood\");\n//or\napop_data_get(your_model->info, .rowname=\"AIC\");\n\\endcode\nIf those are not necessary, adding to your model an \\ref apop_parts_wanted_settings\ngroup with its default values (see below on settings groups) signals to the model\nthat you want only the parameters and to not waste possibly significant CPU time\non covariances, expected values, et cetera. See the \\ref apop_parts_wanted_settings\ndocumentation for examples and further refinements.\n  \\li In many cases, covariances of the parameters as a page appended to the parameters; retrieve via\n\\code\napop_data *cov = apop_data_get_page(your_model->parameters, \"<Covariance>\");\n\\endcode\n  \\li Typically for regression-type models, the table of expected values (typically including\nexpected value, actual value, and residual) is a page stapled to the main info\npage. Retrieve via:\n\\code\napop_data *predict = apop_data_get_page(your_model->info, \"<Predicted>\");\n\\endcode\n\nSee individual model documentation for what is provided by any given model.\n\n<em>Post-estimation uses</em>\n\nBut we expect much more from a model than estimating parameters from data.  \n\nContinuing the above example where we got an estimated Probit model named \\c the_estimate, we can interrogate the estimate in various familiar ways:\n\n\\code\napop_data *expected_value = apop_predict(NULL, the_estimate);\n\ndouble density_under =  apop_cdf(expected_value, the_estimate);\n\napop_data *draws = apop_model_draws(the_estimate, .count=1000);\n\\endcode\n\n\\subpage dataones\n\n\\section modelparameterization  Parameterizing or initializing a model\n\nThe models that ship with Apophenia have the requisite procedures for estimation,\nmaking draws, and so on, but have <tt>parameters==NULL</tt> and <tt>settings==NULL</tt>. The\nmodel is thus, for many purposes, incomplete, and you will need to take some action to\ncomplete the model. As per the examples to follow, there are several possibilities:\n\n  \\li Estimate it! Almost all models can be sent with a data set as an argument to the\n<tt>apop_estimate</tt> function. The input model is unchanged, but the output model\nhas parameters and settings in place.\n  \\li If your model has a fixed number of numeric parameters, then you can set them with\n\\ref apop_model_set_parameters.\n  \\li If your model has a variable number of parameters, you can directly set the \n\\c parameters element via \\ref apop_data_falloc. For most purposes, you will also need to\nset the \\c msize1, \\c msize2, \\c vsize, and \\c dsize elements to the size you want. See\nthe example below.\n  \\li Some models have disparate, non-numeric settings rather than a simple matrix of\nparameters. For example, an kernel density estimate needs a model as a kernel and a\nbase data set, which can be set via \\ref apop_model_set_settings.\n\nHere is an example that shows the options for parameterizing a model. After each\nparameterization, 20 draws are made and written to a file named draws-[modelname].\n\n\\include ../eg/parameterization.c\n\n\n\\section transformsec Filtering & updating\n\nThe model structure makes it easy to generate new models that are variants of prior\nmodels. Bayesian updating, for example, takes in one \\ref apop_model that we call the\nprior, one \\ref apop_model that we call a likelihood, and outputs an \\ref apop_model\nthat we call the posterior. One can produce complex models using simpler transformations\nas well. For example, \\ref apop_model_fix_params will set the free parameters of\nan input model to a fixed value, thus producing a model with fewer parameters. To\ntransform a Normal(\\f$\\mu\\f$, \\f$\\sigma\\f$) into a one-parameter Normal(\\f$\\mu\\f$, 1):\n\n\\code\napop_model *N_sigma1 = apop_model_fix_params(apop_model_set_parameters(apop_normal, NAN, 1));\n\\endcode\n\nThis can be used anywhere the original Normal distribution can be. To give another\nexample, if we need to truncate the distribution in the data space:\n\n\\code\n//The constraint function.\ndouble over_zero(apop_data *in, apop_model *m){\n    return apop_data_get(in) > 0;\n}\n\napop_model *trunc = apop_model_dconstrain(.base_model=N_sigma1,\n                                          .constraint=over_zero);\n\\endcode\n\nChaining together simpler transformations is an easy method to produce \nmodels of arbitrary detail.  In the following example:\n\n\\li Nature generated data using a mixture of three Poisson distributions,\nwith \\f$\\lambda=2.8\\f$, \\f$2.0\\f$, and \\f$1.3\\f$.\nThe resulting model is generated using \\ref apop_model_mixture. \n\\li Not knowing the true distribution, the analyst \nmodels the data with a single Poisson\\f$(\\lambda)\\f$ distribution with a prior on \\f$\\lambda\\f$.\nThe prior selected is a\ntruncated Normal(2, 1), generated by sending the stock\n\\ref apop_normal model to the data-space constraint function \\ref apop_dconstrain.\n\\li The \\ref apop_update function takes three arguments: the data set, which comes from\ndraws from the mixture, the prior, and the likelihood. It produces an output model\nwhich, in this case, is a PMF describing a distribution over \\f$\\lambda\\f$, because\na truncated Normal and a Poisson are not conjugate distributions. Knowing that it is\na PMF, the <tt>->data</tt> element holds a set of draws from the posterior.\n\\li The analyst would like to present an approximation to the posterior in a simpler form,\nand so finds the parameters \\f$\\mu\\f$ and \\f$\\sigma\\f$ of the Normal distribution that\nis closest to that posterior.\n\nHere is a program---almost a single line of code---that builds the final approximation to the posterior\nmodel from the subcomponents, including draws from Nature and the analyst's prior\nand likelihood:\n\n\\include ../eg/transform.c\n\n\\section mathmethods Model methods\n\n\\li\\ref apop_estimate : estimate the parameters of the model with data.\n\\li\\ref apop_predict : the expected value function.\n\\li\\ref apop_draw : random draws from an estimated model.\n\\li\\ref apop_p : the probability of a given data set given the model.\n\\li\\ref apop_log_likelihood : the log of \\ref apop_p\n\\li\\ref apop_score : the derivative of \\ref apop_log_likelihood\n\\li\\ref apop_model_print : write model components to the screen or a file\n\\li\\ref apop_model_copy : duplicate a model\n\\li\\ref apop_model_set_parameters :  Use this to convert a Normal(\\f$\\mu\\f$, \\f$\\sigma\\f$) with unknown \\f$\\mu\\f$ and \\f$\\sigma\\f$ into a Normal(0, 1), for example.\n\\li\\ref apop_model_free\n\\li\\ref apop_model_clear , \\ref apop_prep : remove the parameters from a parameterized model. Used infrequently.\n\\li\\ref apop_model_draws : many random draws from an estimated model.\n\n\n\n\\li\\ref apop_update : Bayesian updating\n\\li\\ref apop_model_coordinate_transform : apply an invertible transformation to the data space\n\\li\\ref apop_model_dconstrain : constrain the data space of a model to a subspace. E.g., truncate a Normal distribution so \\f$x>0\\f$.\n\\li\\ref apop_model_fix_params : hold some parameters constant\n\\li\\ref apop_model_mixture : a linear combination of models\n\\li\\ref apop_model_cross : If \\f$(d_1)\\f$ has a Normal\\f$(\\mu, \\sigma)\\f$ distribution\nand \\f$d_2\\f$ has an independent Poisson\\f$(\\lambda)\\f$ distribution, then \\f$(d_1,\nd_2)\\f$ has an <tt>apop_model_cross(apop_normal, apop_poisson)</tt> distribution with\nparameters \\f$(\\mu, \\sigma, \\lambda)\\f$.\n\n\n\\section modelsettings Settings groups\n\nDescribing a statistical, agent-based, social, or physical model in a standardized\nform is difficult because every model has significantly different settings. An\nMLE requires a method of search (conjugate gradient, simplex, simulated annealing),\nand a histogram needs the number of bins to be filled with data.\n\nSo, the \\ref apop_model includes a single list which can hold an arbitrary number of settings groups, like the search specifications for finding the maximum likelihood, a histogram for making random draws, and options about the model type.\n\nSettings groups are automatically initialized with default values when\nneeded. If the defaults do no harm, then you don't need to think about\nthese settings groups at all.\n\nHere is an example where a settings group is worth tweaking: the \\ref apop_parts_wanted_settings group indicates which parts\nof the auxiliary data you want. \n\n\\code\n1 apop_model *m = apop_model_copy(apop_ols);\n2 Apop_settings_add_group(m, apop_parts_wanted, .covariance='y');\n3 apop_model *est = apop_estimate(data, m);\n\\endcode\n\n\nLine one establishes the baseline form of the model. Line two adds a settings group\nof type \\ref apop_parts_wanted_settings to the model. By default other auxiliary items, like the expected values, are set to \\c 'n' when using this group, so this specifies that we want covariance and only covariance. Having stated our preferences, line three does the estimation we want.\n\nNotice that the \\c _settings ending to the settings group's name isn't written---macros\nmake it happen.  The remaining arguments to \\c Apop_settings_add_group (if any) follow\nthe \\ref designated syntax of the form <tt>.setting=value</tt>.\n\nThere is an \\ref apop_model_copy_set macro that adds a settings group when it is first copied, joining up lines one and two above:\n\n\\code\napop_model *m = apop_model_copy_set(apop_ols, apop_parts_wanted, .covariance='y');\n\\endcode\n\nSettings groups are copied with the model, which facilitates chaining\nestimations. Continuing the above example, you could re-estimate to get the predicted\nvalues and covariance via:\n\n\\code\nApop_settings_set(est, apop_parts_wanted, predicted, 'y');\napop_model *est2 = apop_estimate(data, est);\n\\endcode\n\nMaximum likelihood search has many settings that could be modified, and so provides\nanother common example of using settings groups:\n\n\\code\napop_model *the_estimate = apop_estimate(data, apop_probit);\n\n//Redo the Probit's MLE search using Newton's Method:\nApop_settings_add_group(the_estimate, apop_mle, .verbose='y', \n                        .tolerance=1e-4, .method=\"Newton\");\napop_model *re_est = apop_estimate(data, the_estimate);\n\\endcode\n\nTo clarify the distinction between parameters and settings, note that parameters are\nestimated from the data, often via a maximum likelihood search. In an ML search,\nthe method of search, the number of bins in a histogram, or the number of steps in a\nsimulation would be held fixed as the search iterates over possible parameters (and\nif these settings do change, then that is a meta-model that could be encapsulated into another\n\\ref apop_model). As a consequence, parameters are always numeric, while settings may\nbe any type.\n\n\\li \\ref Apop_settings_set, for modifying a single setting, doesn't use the designated initializers format.\n\\li Because the settings groups are buried within the model, debugging them can be a\npain. Here is a documented macro for \\c gdb that will help you pull a settings group out of a \nmodel for your inspection, to cut and paste into your <tt>.gdbinit</tt>. It shouldn't be too difficult to modify this macro for other debuggers.\n\n\\code\ndefine get_group\n    set $group = ($arg1_settings *) apop_settings_get_grp( $arg0, \"$arg1\", 0 )\n    p *$group\nend\ndocument get_group \nGets a settings group from a model.\nGive the model name and the name of the group, like\nget_group my_model apop_mle \nand I will set a gdb variable named $group that points to that model, \nwhich you can use like any other pointer. For example, print the contents with\np *$group\nThe contents of $group are printed to the screen as visible output to this macro.\nend \n\\endcode\n\nFor using a model, that's all of what you need to know. For details on writing a new settings group, see \\ref settingswriting .\n\n\\li\\ref Apop_settings_add_group\n\\li\\ref Apop_settings_set\n\\li\\ref Apop_settings_get : get a single element from a settings group.\n\\li\\ref Apop_settings_get_group : get the whole settings group.\n*/\n\n/** \\page dataones Data format for regression-type models\n\nRegression-type estimations typically require a constant column. That is, the 0th\ncolumn of the data is a constant (one), so the parameter \\f$\\beta_0\\f$ is\nslightly special in corresponding to a constant rather than a variable.\n\nSome stats packages implicitly assume a constant column, which the user never\nsees. This violates the principle of transparency upon which Apophenia is based.\nGiven a data matrix \\f$X\\f$ with the estimated parameters\n\\f$\\beta\\f$, if the model asserts that the product \\f$X\\beta\\f$ has meaning, then you\nshould be able to easily calculate that product. With a ones column, a dot product is one line:\n<tt>apop_dot(x, your_est->parameters)</tt>; without a ones column, one would basically\nhave to construct one (using \\c gsl_matrix_set_all and \\c apop_data_stack).\n\nEach regression-type estimation has one dependent variable and several independent. In\nthe end, we want the dependent variable to be in the vector element. Removing\na column from a <tt>gsl_matrix</tt> and adjusting all subsequent columns is relatively\ndifficult, because (like most structs built with the aim of very efficient processing) the\nstruct depends on an equal spacing in memory between each element.\n\n<em> The automatic case</em>\n\nWe can resolve both the need for a ones column and for having the dependent column in\nthe vector at the same time. Given a data set with no vector element and the dependent\nvariable in the first column of the matrix, we can copy the dependent variable into\nthe vector and then replace the first column of the matrix with ones. The result fits\nall of the above expectations.\n\nYou as a user merely have to send in a \\c apop_data set with \\c NULL vector and a dependent\ncolumn in the first column. If the data is coming from the database, then the query\nis natural:\n\n\\code\napop_data *regression_data = apop_query_to_data(\"select depvar, indyvar1, indyvar2, indyvar3 from dataset\");\napop_model_print(apop_estimate(regression_data, apop_ols));\n\\endcode\n\n<em> The already-prepped case</em>\n\nIf your data has a vector element, then the prep routines won't change anything.\nIf you don't want to use a constant column, or your data has already been prepped by\nan estimation, then this is what you want.\n\n\\code\napop_data *regression_data = apop_query_to_mixed_data(\"vmmm\", \"select depvar, indyvar1, indvar2, indvar3 from dataset\");\napop_model_print(apop_estimate(regression_data, apop_logit));\n\\endcode\n*/\n\n\n/** \\page testpage Tests & diagnostics\n\nHere is the model for all hypothesis testing within Apophenia:\n\n\\li Calculate a statistic.\n\\li Describe the distribution of that statistic.\n\\li Work out how much of the distribution is (above|below|closer to zero than) the statistic.\n\nThere are a handful of named tests that produce a known statistic and then compare to a\nknown distribution, like \\ref apop_test_kolmogorov or \\ref apop_test_fisher_exact. For\ntraditional distributions (Normal, \\f$t\\f$, \\f$\\chi^2\\f$), use the \\ref apop_test convenience\nfunction.\n\nIn especially common cases, like the parameters from an OLS regression,\nthe commonly-associated \\f$t\\f$ test is included as part of the estimation\noutput, typically as a row in the \\c info element of the output \\ref apop_model.\n\n\n\\li\\ref apop_test\n\\li\\ref apop_paired_t_test\n\\li\\ref apop_f_test\n\\li\\ref apop_t_test\n\\li\\ref apop_test_anova_independence\n\\li\\ref apop_test_fisher_exact\n\\li\\ref apop_test_kolmogorov\n\\li\\ref apop_estimate_coefficient_of_determination\n\\li\\ref apop_estimate_r_squared\n\nSee also these Monte Carlo methods:\n\n\\li\\ref apop_bootstrap_cov\n\\li\\ref apop_jackknife_cov\n\nTo give another example of testing, here is a function that was briefly a part of\nApophenia, but seemed a bit out of place. Here it is as a sample:\n\n\\code\n// Input: any vector, which will be normalized in place. Output: 1 - the p-value\n// for a chi-squared test to answer the question, \"with what confidence can I\n// reject the hypothesis that the variance of my data is zero?\"\n\ndouble apop_test_chi_squared_var_not_zero(gsl_vector *in){\n    Apop_stopif(!in, return NAN, 0, \"input vector is NULL. Doing nothing.\");\n    apop_vector_normalize(in, .normalization_type='s');\n    double sum=apop_vector_map_sum(in, gsl_pow_2);\n    return gsl_cdf_chisq_P(sum, in->size);\n}\n\\endcode\n\nOr, consider the Rao statistic, \n\\f${\\partial\\over \\partial\\beta}\\log L(\\beta)'I^{-1}(\\beta){\\partial\\over \\partial\\beta}\\log L(\\beta)\\f$\nwhere \\f$L\\f$ is a model's likelihood function and \\f$I\\f$ its information matrix. In code:\n\n\\code\napop_data * infoinv = apop_model_numerical_covariance(data, your_model);\napop_data * score = &(apop_data*){.vector=apop_numerical_gradient(data, your_model)};\napop_data * stat = apop_dot(apop_dot(score, infoinv), score);\n\\endcode\n\nGiven the correct assumptions, this is \\f$\\sim \\chi^2_m\\f$, where \\f$m\\f$ is the dimension of \\f$\\beta\\f$, so the odds of a Type I error given the model is:\n\n\\code\ndouble p_value = apop_test(stat, \"chi squared\", beta->size);\n\\endcode\n\n<em>Generalized parameter tests</em>\n\nBut if your model is not from the textbook, then you have the tools to apply the\nabove three-step process to the parameters of any \\ref apop_model.\n\n\\li Model parameters are a statistic, and you know that <tt>apop_estimate(your_data,\n        your_model)</tt> will output a model with a <tt>parameters</tt> element.\n\\li \\ref apop_parameter_model will return an \\ref apop_model describing \n    the distribution of these parameters.\n\\li We now have the two ingredients to send to \\ref apop_cdf, which takes in a model\nand a data point and returns the area under the data point.\n\nDefaults for the parameter models are filled in via bootstrapping or resampling, meaning\nthat if your model's parameters are decidedly off the Normal path, you can still test\nclaims about the parameters.\n\nThe introductory example in \\ref gentle ran a standard OLS regression, whose output includes some\nstandard hypothesis tests; to conclude, let us go the long way and replicate those results\nvia the general \\ref apop_parameter_model mechanism. The results here will of course be\nidentical, but the more general mechanism can be used in situations where the standard\nmodels don't apply.\n\nThe first part of this program is identical to the introductory program, using \\c\nss08pdc.csv if you have downloaded it as per the instructions in \\ref gentle, or a\nsimple sample data set if not. The second half executes the three steps uses many\nof the above features: one of the inputs to \\ref apop_parameter_model (which row of\nthe parameter set to use) is sent by adding a settings group, we pull that row into\na separate data set using \\ref Apop_r, and we set its vector value by referring to it\nas the -1st element.\n\n\\include ols2.c\n\nNote that the procedure did not assume the model parameters had a certain form. It\nqueried the model for the distribution of parameter \\c agep, and if the model didn't have\na closed-form answer then a distribution via bootstrap would be provided. Then that model\nwas queried for its CDF. [The procedure does assume a symmetric distribution. Fixing this\nis left as an exercise for the reader.] For a model like OLS, this is entirely overkill, \nwhich is why OLS provides the basic hypothesis tests automatically. But for models\nwhere the distribution of parameters is unknown or has no closed-form solution, this\nmay be the only recourse.\n*/\n\n/** \\page histosec Empirical distributions and PMFs (probability mass functions)\n\nThe \\ref apop_pmf model wraps an \\ref apop_data set so it can be read as an empirical\nmodel, with a likelihood function (equal to the associated weight for observed\nvalues and zero for unobserved values), a random number generator (which \nsimply makes weighted random draws from the data), and so on.  Setting it up is a\nmodel estimation from data like any other, done via \\ref apop_estimate(\\c your_data,\n\\ref apop_pmf).\n\nYou have the option of cleaning up the data before turning it into a PMF. For example...\n\n\\code\napop_data_pmf_compress(your_data);          //remove duplicates\napop_data_sort(your_data);\napop_vector_normalize(your_data->weights);  //weights sum to one\napop_model *a_pmf = apop_estimate(your_data, apop_pmf);\n\\endcode\n\nThese are largely optional.\n\n\\li The CDF is calculated based on the percent of the weights between the zeroth row of the PMF\nand the row specified. This generally makes more sense after \\ref apop_data_sort.\n\\li Compression produces a corresponding improvement in efficiency when first calculating\nCDFs, but is otherwise not necessary.\n\\li Sorting or normalizing is not necessary for making draws or getting a likelihood or log likelihood.\n\nIt is the \\c weights vector that holds the density represented by each row; the rest of the row represents the coordinates of that density. If the input data set has no \\c weights segment, then I assume that all rows have equal weight.\n\nFor a PMF model, the \\c parameters are \\c NULL, and the \\c data itself\nis used for calculation. Therefore, modifying the data post-estimation can break some\ninternal settings set during estimation. If you modify the data, throw away any existing\nPMFs (via \\ref apop_model_free) and re-estimate a new one.\n\n\\section histocompare Comparing histograms\n\nUsing \\ref apop_data_pmf_compress puts the data into one bin for each unique value in\nthe data set.  You may instead want bins of fixed with, in the style of a histogram,\nwhich you can get via \\ref apop_data_to_bins. It requires a bin specification. If you\nsend a \\c NULL binspec, then the offset is zero and the bin size is big enough to ensure\nthat there are \\f$\\sqrt{N}\\f$ bins from minimum to maximum. The binspec will be added\nas a page to the data set, named <tt>\"<binspec>\"</tt>. See the \\ref apop_data_to_bins\ndocumentation on how to write a custom bin spec.\n\n\nThere are a few ways of testing the claim that one distribution equals another, typically an empirical PMF versus a smooth theoretical distribution. In both cases, you will need two distributions based on the same binspec. \n\nFor example, if you do not have a prior binspec in mind, then you can use the one generated by the first call to the histogram binning function to make sure that the second data set is in sync:\n\n\\code\napop_data_to_bins(first_set, NULL);\napop_data_to_bins(second_set, apop_data_get_page(first_set, \"<binspec>\"));\n\\endcode\n\nYou can use \\ref apop_test_kolmogorov or \\ref apop_histograms_test_goodness_of_fit to generate the appropriate statistics from the pairs of bins.\n\nKernel density estimation will produce a smoothed PDF. See \\ref apop_kernel_density for details.\nOr, use \\ref apop_vector_moving_average for a simpler smoothing method.\n\n\n\\li\\ref apop_data_pmf_compress() : merge together redundant rows in a data set before calling \n                \\ref apop_estimate(\\c your_data, \\ref apop_pmf); optional.\n\\li\\ref apop_vector_moving_average() : smooth a vector (e.g., <tt>your_pmf->data->weights</tt>) via moving average.\n\\li\\ref apop_histograms_test_goodness_of_fit() : goodness-of-fit via \\f$\\chi^2\\f$ statistic\n\\li\\ref apop_test_kolmogorov() : goodness-of-fit via Kolmogorov-Smirnov statistic\n\\li\\ref apop_kl_divergence() : measure the information loss from one (typically empirical) distribution to another distribution.\n*/\n\n/** \\page maxipage Optimization\n\nThis section includes some notes on the maximum likelihood routine. As in the section\non writing models above, if a model has a \\c p or \\c log_likelihood method but no \\c\nestimate method, then calling \\c apop_estimate(your_data, your_model) executes the\ndefault estimation routine of maximum likelihood.\n\nIf you are a not a statistician, then there are a few things you will need to keep in\nmind:\n\n\\li Physicists, pure mathematicians, and the GSL minimize; economists, statisticians, and\nApophenia maximize. If you are doing a minimization, be sure that your function returns minus the objective\nfunction's value.\n\n\\li The overall setup is about estimating the parameters of a model with data. The user\nprovides a data set and an unparameterized model, and the system tries parameterized\nmodels until one of them is found to be optimal. The data is fixed.  The optimization\ntries a series of parameterized models, searching for the one that is most likely. In\na non-stats setting, you may have \\c NULL data.\n\n\\li Because the unit of analysis is a parameterized model,\nnot just parameters, you need to have an \\ref apop_model\nwrapping your objective function.\n\nThis example, to be discussed in detail below, optimizes \nRosenbrock's banana function, \\f$(1-x)^2+ s(y - x^2)^2\\f$, where the\nscaling factor \\f$s\\f$ is fixed ahead of time, say at 100.\n\n\\include ../eg/banana.c\n\nThe \\c banana function returns a single number to be minimized.  You will need to write an\n\\ref apop_model to send to the optimizer, which is a two step process: write a log\nlikelihood function wrapping the real objective function (\\c ll), and a model that uses that\nlog likelihood (\\c b).\n\n  \\li The <tt>.vsize=2</tt> part of the declaration of \\c b on the second\nline of <tt>main()</tt> specified that the model's parameters are a vector of\nsize two.  That is, the list of <tt>double</tt>s to send to \\c banana is set in\n<tt>in->parameters->vector->data</tt>.  \n  \\li The \\c more element of the \\ref apop_model\nstructure is designed to hold any arbitrary structure of size \\c more_size, which\nis useful for models that require additional constants or other settings, like the\n<tt>coeff_struct</tt> here. See \\ref settingswriting for more on handling model settings.\n  \\li Statisticians want the covariance and basic tests about the parameters. \nThis line shuts off all auxiliary calculations:\n\\code\nApop_settings_add_group(your_model, apop_parts_wanted);\n\\endcode\nSee the documentation for \\ref apop_parts_wanted_settings for details about how this\nworks.  It can also offer quite the speedup: especially for high-dimensional problems,\nfinding the covariance matrix without any additional information can take dozens of evaluations\nof the objective function for each evaluation that is part of the search itself.\n  \\li MLEs have an especially large number of parameter tweaks that could be made;\nsee the \\ref apop_mle_settings page.\n  \\li As a useful diagnostic, you can add a \\c NULL \\ref apop_data set to the MLE\nsettings in the <tt>.path</tt> slot, and it will be allocated and filled with the\nsequence of points tried by the optimizer.\n  \\li The program has some extras above and beyond the necessary: it uses two methods\n(notice how easy it is to re-run an estimation with an alternate method, but the syntax\nfor modifying a setting differs from the initialization syntax) and checks that the\nresults are accurate.\n\n\n\\section constr Setting Constraints\n\nThe problem is that the parameters of a function must not take on certain values,\neither because the function is undefined for those values or because parameters with\ncertain values would not fit the real-world problem.\n\nIf you give the optimizer an unconstrained likelihood function plus a separate constraint\nfunction, \\ref apop_maximum_likelihood will combine them to a function that is continuous\nat the constraint boundary, but which is guaranteed to never have an optimum outside of the constraint.\n\nA constraint function must do three things:\n\\li If the constraint does not bind (i.e. the parameter values are OK), then it must return zero.\n\\li If the constraint does bind, it must return a penalty, that indicates how far off the parameter is from meeting the constraint.\n\\li If the constraint does bind, it must set a return vector that the likelihood function can take as a valid input. The penalty at this returned value must be zero.\n\nThe idea is that if the constraint returns zero, the log likelihood function will\nreturn the log likelihood as usual, and if not, it will return the log likelihood at\nthe constraint's return vector minus the penalty. To give a concrete example, here\nis a constraint function that will ensure that both parameters of a two-dimensional\ninput are both greater than zero, and that their sum is greater than two. As with the\nconstraints for many of the models that ship with Apophenia, it is a wrapper for \\ref\napop_linear_constraint.\n\n\\code\nstatic long double greater_than_zero_constraint(apop_data *data, apop_model *v){\n    static apop_data *constraint = NULL;\n    if (!constraint) constraint= apop_data_falloc((3,3,2), 0,  1, 0,   //0 < 1x + 0y\n                                                           0,  0, 1,   //0 < 0x + 1y\n                                                           2,  1, 1);  //2 < 1x + 1y\n    return apop_linear_constraint(v->parameters->vector, constraint, 1e-3);\n}\n\\endcode\n\n\\li\\ref apop_linear_constraint()\n\n\\section simanneal Notes on simulated annealing\n\nFor convex optimizations, methods like conjugate gradient search work well, and\nfor relatively smooth optimizations, the Nelder-Mead simplex algorithm is a good\nchoice. For situations where the surface being searched may have several local optima\nand be otherwise badly behaved, there is simulated annealing.\n\nSimulated annealing is a controlled random walk.  As with the other methods, the\nsystem tries a new point, and if it is better, switches. Initially, the system is\nallowed to make large jumps, and then with each iteration, the jumps get smaller,\neventually converging. Also, there is some decreasing probability that if the new\npoint is less likely, it will still be chosen. Simulated annealing is best for\nsituations where there may be multiple local optima. Early in the random walk, the\nsystem can readily jump from one to another; later it will fine-tune its way toward the\noptimum. The number of points tested is determined by the parameters of the simulated\ncolling program, not the values returned by the likelihood function.  If you know your\nfunction is globally convex (as are most standard probability functions), then this\nmethod is overkill.\n\n\n\\section mlfns Useful functions\n\n\\li\\ref apop_estimate_restart : Restarting an MLE with different settings can improve results.\n\\li\\ref apop_maximum_likelihood : Rarely called directly. If a model has no \\c estimate element, call \\ref apop_estimate to prep the model and run an MLE.\n\\li\\ref apop_model_numerical_covariance\n\\li\\ref apop_numerical_gradient\n*/\n\n\n/** \\page moreasst Assorted\n\nSome functions for missing data:\n\n\\li\\ref apop_data_listwise_delete\n\\li\\ref apop_ml_impute\n\nA few more descriptive methods:\n\n\\li\\ref apop_matrix_pca : Principal component analysis\n\\li\\ref apop_anova : One-way or two-way ANOVA tables\n\\li\\ref apop_rake : Iterative proportional fitting on large, sparse tables\n\n\nGeneral utilities:\n\n\\li\\ref Apop_stopif : Apophenia's error-handling and warning-printing macro. \n\\li\\ref apop_opts : the global options\n\\li\\ref apop_system : a printf-style wrapper around the standard \\c system function.\n\nA few more math utilities:\n\n\\li\\ref apop_matrix_is_positive_semidefinite\n\\li\\ref apop_matrix_to_positive_semidefinite\n\\li\\ref apop_generalized_harmonic\n\\li\\ref apop_multivariate_gamma\n\\li\\ref apop_multivariate_lngamma\n\\li\\ref apop_rng_alloc\n\n*/\n\n/** \\page mingw MinGW\n\nMinimalist GNU for Windows is indeed minimalist: it is not a full POSIX subsystem, and provides no package manager. Therefore, you will have to make some adjustments and install the dependencies yourself.\n\nMatt P. Dziubinski successfully used Apophenia via MinGW; here are his instructions (with edits by BK):\n\n\\li get libregex (the ZIP file) from:\nhttp://sourceforge.net/project/showfiles.php?group_id=204414&package_id=306189\n\\li get libintl (three ZIP files) from:\n http://gnuwin32.sourceforge.net/packages/libintl.htm .\n download \"Binaries\", \"Dependencies\", \"Developer files\"\n\\li follow \"libintl\" steps from:\nhttp://kayalang.org/download/compiling/windows\n\n\\li Modify \\c Makefile, adding -lpthread to AM_CFLAGS (removing -pthread) and -lregex to AM_CFLAGS and LIBS\n\n\\li Now compile the main library:\n\\code\nmake\n\\endcode\n\n\\li Finally, put one more expected directory in place and install:\n\\code\nmkdir -p -- \"/usr/local/Lib/site-packages\"\nmake install\n\\endcode\n\n\\li You will get the usual warning about library paths, and may have to take the specified action:\n\\code\n----------------------------------------------------------------------\nLibraries have been installed in:\n  /usr/local/lib\n\nIf you ever happen to want to link against installed libraries\nin a given directory, LIBDIR, you must either use libtool, and\nspecify the full pathname of the library, or use the `-LLIBDIR'\nflag during linking and do at least one of the following:\n  - add LIBDIR to the `PATH' environment variable\n    during execution\n  - add LIBDIR to the `LD_RUN_PATH' environment variable\n    during linking\n  - use the `-LLIBDIR' linker flag\n\nSee any operating system documentation about shared libraries for\nmore information, such as the ld(1) and ld.so(8) manual pages.\n----------------------------------------------------------------------\n\\endcode\n*/\n\n\n/* optionaldetails Implementation of optional arguments  [this section ignored by doxygen]\nOptional and named arguments are among the most commonly commented-on features of Apophenia, so this page goes into full detail about the implementation. \n\nTo use these features, see the all-you-really-need summary at the \\ref designated\npage. For a background and rationale, see the blog entry at http://modelingwithdata.org/arch/00000022.htm . \n\nI'll assume you've read both links before continuing.\n\nOK, now that you've read the how-to-use and the discussion of how optional and named arguments can be constructed in C, this page will show how they are done in Apophenia. The level of details should be sufficient to implement them in your own code if you so desire.\n\nThere are three components to the process of generating optional arguments as implemented here:\n\\li Produce a \\c struct whose elements match the arguments to the function.\n\\li Write a wrapper function that takes in the struct, unpacks it, and calls the original function.\n\\li Write a macro that makes the user think the wrapper function is the real thing.\n\nNone of these steps are really rocket science, but there is a huge amount of redundancy. \nApophenia includes some macros that reduce the boilerplate redundancy significantly. There are two layers: the C-standard code, and the script that produces the C-standard code.\n\nWe'll begin with the C-standard header file:\n\\code \n#ifdef APOP_NO_VARIADIC\n void apop_vector_increment(gsl_vector * v, int i, double amt);\n#else\n void apop_vector_increment_base(gsl_vector * v, int i, double amt);\n apop_varad_declare(void, apop_vector_increment, gsl_vector * v; int i; double amt);\n#define apop_vector_increment(...) apop_varad_link(apop_vector_increment, __VA_ARGS__)\n#endif\n\\endcode\n\nFirst, there is an if/else that allows the system to degrade gracefully\nif you are sending C code to a parser like swig, whose goals differ\ntoo much from straight C compilation for this to work. Set \\c\nAPOP_NO_VARIADIC to produce a plain function with no variadic support.\n\nElse, we begin the above steps. The \\c apop_varad_declare line expands to the following:\n\n\\code\ntypedef struct { \n    gsl_vector * v; int i; double amt ; \n} variadic_type_apop_vector_increment; \n\nvoid variadic_apop_vector_increment(variadic_type_apop_vector_increment varad_in);\n  \\endcode\n\nSo there's the ad-hoc struct and the declaration for the wrapper\nfunction. Notice how the arguments to the macro had semicolons, like a\nstruct declaration, rather than commas, because the macro does indeed\nwrap the arguments into a struct.\n\n  Here is what the \\c apop_varad_link would expand to:\n  \\code\n#define apop_vector_increment(...) variadic_apop_increment_base((variadic_type_apop_vector_increment) {__VA_ARGS__})\n  \\endcode\nThat gives us part three: a macro that lets the user think that they are\nmaking a typical function call with a set of arguments, but wraps what\nthey type into a struct.\n\nNow for the code file where the function is declared. Again, there is is an \\c APOP_NO_VARIADIC wrapper. Inside the interesting part, we find the wrapper function to unpack the struct that comes in.\n\n\\code\n\\#ifdef APOP_NO_VARIADIC \n void apop_vector_increment(gsl_vector * v, int i, double amt){\n\\#else\napop_varad_head( void , apop_vector_increment){\n    gsl_vector * apop_varad_var(v, NULL);\n    Apop_stopif(!v, return, 0, \"You sent me a NULL vector.\");\n    int apop_varad_var(i, 0);\n    double apop_varad_var(amt, 1);\n    apop_vector_increment_base(v, i, amt);\n}\n\n void apop_vector_increment_base(gsl_vector * v, int i, double amt){\n#endif\n\tv->data[i * v->stride]\t+= amt;\n}\n\\endcode\n\nThe \n\\c apop_varad_head macro reduces redundancy, and will expand to\n\\code\nvoid variadic_apop_vector_increment (variadic_type_variadic_apop_vector_increment varad_in)\n\\endcode\n\nThe function with this header thus takes in a single struct, and for every variable, there is a line like\n\\code\n    double apop_varad_var(amt, 1);\n\\endcode\nwhich simply expands to:\n\\code\n    double amt = varad_in.amt ? varad_in.amt : 1;\n\\endcode\nThus, the macro declares each not-in-struct variable, and so there will need to be\none such declaration line for each argument. Apart from requiring declarations, you\ncan be creative: include sanity checks, post-vary the variables of the inputs, unpack\nwithout the macro, and so on. That is, this parent function does all of the bookkeeping,\nchecking, and introductory shunting, so the base function can do the math. Finally,\nthe introductory section will call the base function.\n\nThe setup goes out of its way to leave the \\c _base function in the public namespace,\nso that those who would prefer speed to bounds-checking can simply call that function\ndirectly, using standard notation. You could eliminate this feature by merging\nthe two functions.\n\n\n<b>The m4 script</b>\n\nThe above is all you need to make this work: the varad.h file, and the above structures. But there is still a lot of redundancy, which can't be eliminated by the plain C preprocessor.\n\nThus, in Apophenia's code base (the one you'll get from checking out the git repository, not the gzipped distribution that has already been post-processed) you will find a pre-preprocessing script that converts a few markers to the above form. Here is the code that will expand to the above C-standard code:\n\n\\code\n//header file\nAPOP_VAR_DECLARE void apop_vector_increment(gsl_vector * v, int i, double amt);\n\n//code file\nAPOP_VAR_HEAD void apop_vector_increment(gsl_vector * v, int i, double amt){\n    gsl_vector * apop_varad_var(v, NULL);\n    Apop_stopif(!v, return, 0, \"You sent me a NULL vector.\");\n    int apop_varad_var(i, 0);\n    double apop_varad_var(amt, 1);\nAPOP_VAR_END_HEAD\n\tv->data[i * v->stride]\t+= amt;\n}\n\\endcode\n\nIt is obviously much shorter. The declaration line is actually a C-standard declaration with the \\c APOP_VAR_DECLARE preface, so you don't have to remember when to use semicolons. The function itself looks like a single function, but there is again a marker before the declaration line, and the introductory material is separated from the main matter by the \\c APOP_VAR_END_HEAD line. Done right, drawing a line between the introductory checks or initializations and the main function can improve readability.\n\nThe m4 script inserts a <tt>return function_base(...)</tt> at the end of the header\nfunction, so you don't have to. If you want to call the function before the last line, you\ncan do so explicitly, as in the expansion above, and add a bare <tt>return;</tt> to\nguarantee that the call to the base function that the m4 script will insert won't ever be\nreached.\n\nOne final detail: it is valid to have types with commas in them---function arguments. Because commas get turned to semicolons, and m4 isn't a real parser, there is an exception built in: you will have to replace commas with exclamation marks in the header file (only). E.g.,\n\n\\code\nAPOP_VAR_DECLARE apop_data * f_of_f(apop_data *in, void *param, int n, double (*fn_d)(double ! void * !int));\n\\endcode\n\nm4 is POSIX standard, so even if you can't read the script, you have the program needed to run it. For example, if you name it \\c prep_variadics.m4, then run\n\\code\nm4 prep_variadics.m4 myfile.m4.c > myfile.c\n\\endcode\n*/\n\n\n/**\n\\page gentle A quick overview\n\nThis is a \"gentle introduction\" to the Apophenia library. It is intended \nto give you some initial bearings on the typical workflow and the concepts and tricks that\nthe manual pages assume you are familiar with.\n\nIf you want to install Apophenia now so you can try the samples on this page, see the \\ref setup page.\n\nAn outline of this overview:\n\n\\li Apophenia fills a space between traditional C libraries and stats packages. \n\\li The \\ref apop_data structure represents a data set (of course). Data sets are inherently complex,\nbut there are many functions that act on \\ref apop_data sets to make life easier.\n\\li The \\ref apop_model encapsulates the sort of actions one would take with a model, like estimating model parameters or predicting values based on new inputs.\n\\li Databases are great, and a perfect fit for the sort of paradigm here. Apophenia\nprovides functions to make it easy to jump between database tables and \\ref apop_data sets.\n\n<em> The opening example</em>\n\nSetting aside the more advanced applications and model-building tasks, let us begin with\nthe workflow of a typical fitting-a-model project using Apophenia's tools:\n\n\\li Read the raw data into the database using \\ref apop_text_to_db.\n\\li Use SQL queries handled by \\ref apop_query to massage the data as needed.\n\\li Use \\ref apop_query_to_data to pull some of the data into an in-memory \\ref apop_data set.\n\\li Call a model estimation such as \\code apop_estimate (data_set, apop_ols)\\endcode  or \\code apop_estimate (data_set, apop_probit)\\endcode to fit parameters to the data. This will return an \\ref apop_model with parameter estimates.\n\\li Interrogate the returned estimate, by dumping it to the screen with \\ref apop_model_print, sending its parameters and variance-covariance matrices to additional tests (the \\c estimate step runs a few for you), or send the model's output to be input to another model.\n\nHere is an example of most of the above steps which you can compile and run, to be discussed in detail below.\n\nThe program relies on the U.S. Census's American Community Survey public use microdata for DC 2008, which you can get from the command line via:\n\n\\code\nwget https://raw.github.com/rodri363/tea/master/demo/ss08pdc.csv\n\\endcode\nor by pointing your browser to that address and saving the file.\n\nThe program:\n\\code\n#include <apop.h>\n\nint main(){\n    apop_text_to_db(.text_file=\"ss08pdc.csv\", .tabname=\"dc\");\n    apop_data *data = apop_query_to_data(\"select log(pincp+10) as log_income, agep, sex \"\n                    \"from dc where agep+ pincp+sex is not null and pincp>=0\");\n    apop_model *est = apop_estimate(data, apop_ols);\n    apop_model_print(est);\n}\n\\endcode\n\nIf you saved the code to <tt>census.c</tt> and don't have a \\ref makefile or other\nbuild system, then you can compile it with\n\n\\code\ngcc census.c -std=gnu99 -lapophenia -lgsl -lgslcblas -lsqlite3 -o census\n\\endcode\n\nor \n\n\\code\nclang census.c -lapophenia -lgsl -lgslcblas -lsqlite3 -o census\n\\endcode\n\nand then run it with <tt>./census</tt>. This compile line will work on any system with all the requisite tools,\nbut for full-time work with this or any other C library, you will probably want to write a \\ref makefile.\n\nThe results are unremarkable---age has a positive effect on income, and sex\n(1=male, 2=female) does has a negative effect---but it does give us some lines of\ncode to dissect.\n\nThe first two lines in \\c main() make use of a database.  \nI'll discuss the value of the database step more at the end of this page, but for now,\nnote that there are several functions, \\ref apop_query and \\ref apop_query_to_data\nbeing the ones you will most frequently be using, that will allow you to talk to and\npull data from either an SQLite or mySQL/mariaDB database. The database is a natural\nplace to do data processing like renaming variables, selecting subsets, and transforming values.\n\n<em> Designated initializers</em>\n\nLike this line,\n\n\\code\napop_text_to_db(.text_file=\"data\", .tabname=\"d\");\n\\endcode\n\nmany Apophenia functions accept named, optional arguments.  To give another example,\nthe \\ref apop_data set has the usual row and column numbers, but also row and column\nnames. So you should be able to refer to a cell by any combination of name or number;\nfor the data set you read in above, which has column names, all of the following work:\n\n\\code\nx = apop_data_get(data, 2, 3); //observation 2, column 3\nx = apop_data_get(data, .row=2, .colname=\"sex\"); // same\napop_data_set(data, 2, 3, 1);\napop_data_set(data, .colname=\"sex\", .row=2, .val=1);\n\\endcode\n\nDefault values mean that the \\ref apop_data_get, \\ref apop_data_set, and \\ref apop_data_ptr functions handle matrices, vectors, and scalars sensibly:\n\\code\n//Let v be a hundred-element vector:\napop_data *v = apop_data_alloc(100);\n[fill with data here]\ndouble x1 = apop_data_get(v, 10);\napop_data_set(v, 2, .val=x1);\n\n//A 100x1 matrix behaves like a vector\napop_data *m = apop_data_alloc(100, 1);\n[fill with data here]\ndouble m1 = apop_data_get(v, 1);\n\n//let s be a scalar stored in a 1x1 apop_data set:\napop_data *v = apop_data_alloc(1);\ndouble *scalar = apop_data_ptr(s);\n\\endcode\n\nThese conveniences may be new to users of less user-friendly C libraries, but it it fully\nconforms to the C standard (ISO/IEC 9899:2011). See the \\ref designated page for details.\n\n\n\\section apop_data\n\nA lot of real-world data processing is about quotidian annoyances about text versus\nnumeric data or dealing with missing values, and the \\ref apop_data set and its\nmany support functions are intended to make data processing in C easy. Some users of\nApophenia use the library only for its \\ref apop_data set and associated functions. See\n\\ref dataoverview for extensive notes on using the structure.\n\nThe structure includes seven parts:\n\n\\li a vector,\n\\li a matrix,\n\\li a grid of text elements,\n\\li a vector of weights,\n\\li names for everything: row names, a vector name, matrix column names, text names,\n\\li a link to a second page of data, and\n\\li an error marker\n\nThis is not a generic and abstract ideal, but is the sort of mess that real-world data sets look like. For\nexample, here is some data for a weighted OLS regression. It includes an outcome\nvariable in the vector, dependent variables in the matrix and text grid,\nreplicate weights, and column names in bold labeling the variables:\n\n\\htmlinclude apop_data_fig.html\n\\latexinclude apop_data_fig.tex\n\nApophenia's functions generally assume that one row across all of these elements\ndescribes a single observation or data point.\n\nSee above for some examples of getting and setting individual elements.\n\nAlso, \\ref apop_data_get, \\ref apop_data_set, and \\ref apop_data_ptr consider the vector to be the -1st column,\nso using the data set in the figure, <tt>apop_data_get(sample_set, .row=0, .col=-1) == 1</tt>.\n\n<em> Reading in data</em>\n\nAs per the example above, use \\ref apop_text_to_data or \\ref apop_text_to_db and then \\ref apop_query_to_data.\n\n<em> Subsets</em>\n\nThere are many macros to get views of subsets of the data. Each generates a disposable\nwrapper around the base data: once the variable goes out of scope, the wrapper\ndisappears, but modifications made to the data in the view are modifications to the\nbase data itself.\n\n\\include simple_subsets.c\n\nAll of these slicing routines are macros, because they generate several\nbackground variables in the current scope (something a function can't do). Traditional\ncustom is to put macro names in all caps, like \\c APOP_DATA_ROWS, which to modern\nsensibilities looks like yelling. The custom has a logic: there are ways to hang\nyourself with macros, so it is worth distinguishing them typographically. \nApophenia tones it down by capitalizing only the first letter.\n\n<em> Basic manipulations</em>\n\nSee \\ref dataoverview for a list of many other manipulations of data sets, such as \n\\ref apop_data_listwise_delete for quick-and-dirty removal of observations with <tt>NaN</tt>s,\n\\ref apop_data_split / \\ref apop_data_stack,\nor \\ref apop_data_sort to sort all elements by a single column.\n\n<em> Apply and map</em>\n\nIf you have an operation of the form <em>for each element of my data set, call this\nfunction</em>, then you can use \\ref apop_map to do it. You could basically do everything you\ncan do with an apply/map function via a \\c for loop, but the apply/map approach is clearer\nand more fun. Also, if you set OpenMP's <tt>omp_set_num_threads(N)</tt> for any \\c N\ngreater than 1 (the default on most systems is the number of CPU cores), then the work\nof mapping will be split across multiple CPU threads.  See \\ref mapply for a number\nof examples.\n\n<em> Text</em>\n\nString handling in C usually requires some tedious pointer and memory handling, but the functions \nto put strings into the text grid in the \\ref apop_data structure and get them out\nagain will do the pointer shunting for you.  The \\ref apop_text_alloc function is\nreally a realloc function: you can use it to resize the text grid as necessary. The\n\\ref apop_text_set function will write a single string to the grid, though you may be\nusing \\ref apop_query_to_text or \\ref apop_query_to_mixed_data to read in an entire\ndata set at once. Functions that act on entire data sets, like \\ref apop_data_rm_rows,\nhandle the text part as well.\n\nThe text grid for \\c your_data has <tt>your_data->textsize[0]</tt> rows and <tt>your_data->textsize[1]</tt> columns. If you are using only the functions to this point, then empty elements are a blank string (<tt>\"\"</tt>), not \\c NULL.\nFor reading individual elements, refer to the \\f$(i,j)\\f$th text element via <tt>your_data->text[i][j]</tt>.\n\n<em> Errors</em>\n\nMany functions will set the <tt>error</tt> element of the \\ref apop_data structure being operated on if anything goes wrong. You can use this to halt the program or take corrective action:\n\n\\code \napop_data *the_data = apop_query_to_data(\"select * from d\");\nApop_stopif(!the_data || the_data->error, exit(1), 0, \"Trouble querying the data\");\n\\endcode \n\n<em> The whole structure</em>\n\nHere is a diagram of all of Apophenia's structures and how they\nrelate. It is taken from this\n<a href=\"http://modelingwithdata.org/pdfs/cheatsheet.pdf\">cheat sheet</a> on general C and SQL use (2 page PDF).\n\n\\image html http://apophenia.info/structs.png width=\"100%\"\n\\image latex ../structs.png width=18cm\n\nAll of the elements of the \\ref apop_data structure are laid out at middle-left. You have\nalready met the vector, matrix, weights, and text grid.\n\nThe diagram shows the \\ref apop_name structure, which has received little mention so far because names\nbasically take care of themselves. A query will bring in column names (and row names if you set <tt>apop_opts.db_name_column</tt>), or use \\ref apop_data_add_names to add names to your data\nset and \\ref apop_name_stack to copy from one data set to another.\n\nThe \\ref apop_data structure has a \\c more element, for when your data is best expressed\nin more than one page of data. Use \\ref apop_data_add_page, \\ref apop_data_rm_page,\nand \\ref apop_data_get_page. Output routines will sometimes append an extra page of\nauxiliary information to a data set, such as pages named <tt>\\<Covariance\\></tt> or\n<tt>\\<Factors\\></tt>. The angle-brackets indicate a page that describes the data set\nbut is not a part of it (so an MLE search would ignore that page, for example).\n\n\nNow let us move up the structure diagram to the \\ref apop_model structure. \n\n\\section gentle_model apop_model\n\nEven restricting ourselves to the most basic operations, there are a lot of things that \nwe want to do with our models: use a data set to estimate the parameters of a model (like the mean and\nvariance of a Normal distribution), or draw random numbers, or show the\nexpected value, or show the expected value of one part of the data given fixed values\nfor the rest of it. The \\ref apop_model is intended to encapsulate most of these desires\ninto one object, so that models can easily be swapped around, modified to create new models, \ncompared, and so on.\n\nFrom the figure above, you can see that the \\ref apop_model structure \nincludes a number of informational items, key being the \\c parameters, \\c data, and\n\\c info elements; a list of settings to be discussed below; and a set of procedures\nfor many operations.  Its contents are not (entirely) arbitrary: the overall intent\nand the theoretical basis for what is and is not included in an \\ref apop_model are\ndescribed in this <a href=\"http://www.census.gov/srd/papers/pdf/rrs2014-06.pdf\">U.S.\nCensus Bureau research report</a>.\n\nThere are helper functions that will allow you to avoid dealing with the model\ninternals. For example, the \\ref apop_estimate helper function means you never have\nto look at the model's \\c estimate method (if it even has one), and you will simply\npass the model to a function, as with the above form:\n\n\\code\n    apop_model *est = apop_estimate(data, apop_ols);\n\\endcode\n\n\\li Apophenia ships with a broad set of models, like \\ref apop_ols, \\ref apop_dirichlet,\n    \\ref apop_loess, and \\ref apop_pmf (probability mass function); see the full list on <a href=\"http://apophenia.info/group__models.html\">the models documentation page</a>. You would fit\nany of them using \\ref apop_estimate call, with the appropriate model as the second input.\n\\li The models that ship with Apophenia, like \\ref apop_ols, include the procedures and some metadata, but are of course not yet estimated using a data set (i.e., <tt>data == NULL</tt>, <tt>parameters == NULL</tt>). The line above generated a new\nmodel, \\c est, which is identical to the base OLS model but has estimated parameters\n(and covariances, and basic hypothesis tests, a log likelihood, \\f$AIC_c\\f$, \\f$BIC\\f$, et cetera), and a \\c data pointer to the \\ref apop_data set used for estimation. \n\\li You will mostly use the models by passing them as inputs to \nfunctions like \\ref apop_estimate, \\ref apop_draw, or \\ref apop_predict; more examples below.\nOther than \\ref apop_estimate, most require a parameterized model like \\c est. After all, it doesn't make sense to\ndraw from a Normal distribution until its mean and standard deviation are specified.\n\\li If you know what the parameters should be, for most models use \\ref apop_model_set_parameters. E.g.\n\n\\code\napop_model *std_normal = apop_model_set_parameters(apop_normal, 0, 1);\napop_data *a_thousand_normals = apop_model_draws(std_normal, 1000);\n\napop_model *poisson = apop_model_set_parameters(apop_poisson, 1.5);\napop_data *a_thousand_waits = apop_model_draws(poisson, 1000);\n\\endcode\n\n\\li You can use \\ref apop_model_print to print the various elements to screen.\n\\li You can combine and transform models with functions such as \\ref\napop_model_fix_params, \\ref apop_model_coordinate_transform, or \\ref\napop_model_mixture. Each of these functions produce a new model, which can be estimated,\nre-combined, or otherwise used like any other model.\n\n\\code\n//A helper function to check whether a data point is nonnegative\ndouble over_zero(apop_data *in, apop_model *m){ return apop_data_get(in) > 0; }\n\n//Generate a truncated Normal distribution by adding a data constraint:\napop_model *truncated_normal= apop_model_dconstrain(.base_model=apop_normal,\n                                                    .constraint=over_zero);\n\n//Get the cross product of that and a free Normal.\napop_model *cross = apop_model_cross(apop_normal, truncated_normal);\n\n//Given assumed data, estimate the parameters of the cross product.\napop_model *xest = apop_estimate(assumed_data, cross);\n\n//Assuming more data, use the fitted cross product as the prior for a Normal distribution.\napop_model *posterior = apop_update(moredata, .prior=xest, .likelihood=apop_normal);\n\n//Assuming more data, use the posterior as the prior for another updating round.\napop_model *post2 = apop_update(moredata2, .prior=posterior, .likelihood=apop_normal);\n\\endcode\n\n\\li Writing your own models won't be covered in this introduction, but it can be easy to\ncopy and modify the procedures of an existing model to fit your needs. When in doubt, delete a procedure, because any procedures that are missing will have\ndefaults filled when used by functions like \\ref apop_estimate (which uses \\ref\napop_maximum_likelihood) or \\ref apop_cdf (which uses integration via random draws). See \\ref modeldetails for details.\n\\li There's a simple rule of thumb for remembering the order of the arguments to most of\nApophenia's functions, including \\ref apop_estimate : the data always comes first.\n\n<em> Settings</em>\n\nHow many bins are in a histogram? At what tolerance does the maximum likelihood\nsearch end? What are the models being combined in an \\ref apop_mixture distribution?\n\nApophenia organizes settings in <em>settings groups</em>, which are then attached\nto models.  In the following snippet demonstrating Bayesian updating, we specify a Beta distribution prior.  If the\nlikelihood function were a Binomial distribution, \\ref apop_update knows the closed-form\nposterior for a Beta-Binomial pair, but in this case, with a PMF as a likelihood,\nit will have to run Markov chain Monte Carlo. The \\ref apop_mcmc_settings group attached\nto the prior specifies details of how the run should work.\n\nFor a likelihood, we generate an empirical distribution---a PMF---from an input \ndata set, via <tt>apop_estimate(your_data, apop_pmf)</tt>.\nWhen we call \\ref apop_update on the last line, it already has all of the above info\non hand.\n\n\\code\napop_model *beta = apop_model_set_parameters(apop_beta, 0.5, 0.25);\nApop_settings_add_group(beta, apop_mcmc, .burnin = 0.2, .periods =1e5);\napop_model *my_pmf = apop_estimate(your_data, apop_pmf);\napop_model *posterior = apop_update(.prior= beta, .likelihood = my_pmf);\n\\endcode\n\n<em> Databases and models</em>\n\nReturning to the introductory example, you saw that (1) the\nlibrary expects you to keep your data in a database, pulling out the\ndata as needed, and (2) that the workflow is built around\n\\ref apop_model structures.\n\nStarting with (2), \nif a stats package has something called a <em>model</em>, then it is\nprobably of the form Y = [an additive function of <b>X</b>], such as \\f$y = x_1 +\n\\log(x_2) + x_3^2\\f$. Trying new models means trying different\nfunctional forms for the right-hand side, such as including \\f$x_1\\f$ in\nsome cases and excluding it in others. Conversely, Apophenia is designed \nto facilitate trying new models in the broader sense of switching out a \nlinear model for a hierarchical, or a Bayesian model for a simulation. \nA formula syntax makes little sense over such a broad range of models.\n\nAs a result, the right-hand side is not part of \nthe \\ref apop_model. Instead, the data is assumed to be correctly formatted, scaled, or logged\nbefore being passed to the model. This is where part (1), the database,\ncomes in, because it provides a proxy for the sort of formula specification language above:\n \\code\napop_data *testme= apop_query_to_data(\"select y, x1, log(x2), pow(x3, 2) from data\");\napop_model *est = apop_estimate(testme, apop_ols);\n\\endcode\n\nGenerating factors and dummies is also considered data prep, not model\ninternals. See \\ref apop_data_to_dummies and \\ref apop_data_to_factors.\n\nNow that you have \\c est, an estimated model, you can interrogate it. This is where Apophenia and its encapsulated\nmodel objects shine, because you can do more than just admire the parameter estimates on\nthe screen: you can take your estimated data set and fill in or generate new data, use it\nas an input to the parent distribution of a hierarchical model, et cetera. Some simple\nexamples:\n\n \\code\n //If you have a new data set with missing elements (represented by NaN), you can fill in predicted values:\napop_predict(new_data_set, est);\napop_data_print(new_data_set);\n\n //Fill a matrix with random draws.\napop_data *d = apop_model_draws(est, .count=1000);\n\n //How does the AIC_c for this model compare to that of est2?\nprintf(\"ΔAIC_c=%g\\n\", apop_data_get(est->info, .rowname=\"AIC_c\") \n                       - apop_data_get(est2->info, .rowname=\"AIC_c\"));\n\\endcode\n\n\\section gentle_end Conclusion\n\nThis introduction has shown you the \\ref apop_data set and some of the functions\nassociated, which might be useful even if you aren't formally doing statistical\nwork but do have to deal with data with real-world elements like column names and\nmixed numeric/text values. You've seen how Apophenia encapsulates many of a model's\ncharacteristics into a single \\ref apop_model object, which you can send with data to\nfunctions like \\ref apop_estimate, \\ref apop_predict, or \\ref apop_draw. Once you've\ngot your data in the right form, you can use this to simply estimate model parameters,\nor as an input to later analysis.\n\nWhat's next?  \n  \\li Check out the system for hypothesis testing, both with traditional known\ndistributions (using \\ref apop_test for dealing with Normal-, \\f$t\\f$-,\n\\f$\\chi^2\\f$-distributed statistics); and for the parameters of any model; in \\ref testpage.\n  \\li Try your own hand at putting new models into the \\ref apop_model framework,\nas discussed in \\ref modeldetails.\n  \\li For example, have a look at <a href=\"http://modelingwithdata.org/arch/00000154.htm\">this blog</a>\nand its subsequent posts, which wrap a microsimulation into an \\ref apop_model, so\nthat its parameters can be estimated and confidence intervals set around them.\n  \\li See the \\ref maxipage page for discussion of the many features the optimization\nsystem has. It allows you to use a diverse set of search types on constrained or\nunconstrained models.\n  \\li Skim through <a href=\"http://apophenia.info/group__all__public.html\">the full list\nof macros and functions</a>---there are hundreds---to get a sense of what else\nApophenia offers.\n*/\n\n/** \\page modeldetails Writing new models\n\nThe \\ref apop_model is intended to provide a consistent expression of <em>any</em>\nmodel that (implicitly or explicitly) expresses a likelihood of data given parameters,\nincluding traditional linear models, textbook distributions, Bayesian hierarchies,\nmicrosimulations, and any combination of the above.  The unifying feature is that\nall of the models act over some data space and some parameter space (in some cases\none or both is the empty set), and can assign a likelihood for a fixed pair of\nparameters and data given the model. This is a very broad requirement, often used\nin the statistical literature.  For discussion of the theoretical structures, see <a\nhref=\"http://www.census.gov/srd/papers/pdf/rrs2014-06.pdf\"><em>A Useful Algebraic System\nof Statistical Models</em></a> (PDF).\n\nThis page is about writing new models from scratch, beginning with basic models and on\nup to models with arbitrary internal settings, specific methods of Bayesian updating\nusing your model as a prior or likelihood, and so on. I assume you have already read\n\\ref modelsec on using models and have tried a few things with the\ncanned models that come with Apophenia, so you already know how a user handles basic\nestimation, adding a settings group, and so on.\n\nThis page includes:\n\n\\li \\ref write_likelihoods of writing a new model from scratch.\n\\li \\ref settingswriting, covering the writing of <em>ad hoc</em> structures to hold model- or method-specific details, like the number of periods for burning in an MCMC run or the number of bins in a histogram.\n\\li \\ref vtables, covering the means of writing special-case routines for functions that are not part of the \\ref apop_model itself, including the score or conjugate prior/likelihood pairs for \\ref apop_update.\n\\li \\ref modeldataparts, a detailed list of the requirements for the non-function elements of an \\ref apop_model.\n\\li \\ref methodsection, a detailed list of requirements for the function elements of an \\ref apop_model.\n\n\\section write_likelihoods A walkthrough\n\nUsers are encouraged to always use models via the helper functions, like\n\\ref apop_estimate or \\ref apop_cdf.  The helper functions do some boilerplate error\nchecking, and call defaults as needed. For example, if your model has a \\c log_likelihood\nmethod but no \\c p method, then \\ref apop_p will use exp(\\c log_likelihood). If you don't\ngive an \\c estimate method, then \\c apop_estimate will call \\ref apop_maximum_likelihood.\n\nSo the game in writing a new model is to write just enough internal methods to give the helper functions what they need.\nIn the not-uncommon best case, all you need to do is write a log likelihood function or an RNG.\n\nHere is how one would set up a model that could be estimated using maximum likelihood:\n\n\\li Write a likelihood function. Its header will look like this:\n\\code\nlong double new_log_likelihood(apop_data *data, apop_model *m);\n\\endcode\nwhere \\c data is the input data, and \\c m is the parametrized model (i.e. your model\nwith a \\c parameters element already filled in by the caller).  This function will\nreturn the value of the log likelihood function at the given parameters.\n\n\\li Write the object:\n\n\\code\napop_model *your_new_model = &(apop_model){\"The Me distribution\", \n            .vsize=n0, .msize1=n1, .msize2=n2, .dsize=nd,\n            .log_likelihood = new_log_likelihood };\n\\endcode\n\n \\li The first element is the <tt>.name</tt>, a human-language name for your model.\n \\li The \\c vsize, \\c msize1, and \\c msize2 elements specify the shape of the parameter\nset. For example, if there are three numbers in the vector, then set <tt>.vsize=3</tt>\nand omit the matrix sizes. The default model prep routine will call\n<tt>new_est->parameters = apop_data_alloc(vsize, msize1, msize2)</tt>. \n \\li The \\c dsize element is the size of one random draw from your model.\n \\li It's common to have [the number of columns in your data set] parameters; this\ncount will be filled in if you specify \\c -1 for \\c vsize, <tt>msize(1|2)</tt>, or\n<tt>dsize</tt>. If the allocation is exceptional in a different way, then you will\nneed to allocate parameters by writing a custom \\c prep method for the model.\n \\li Is this a constrained optimization?  Add a <tt>.constraint</tt> element for those too.  See \\ref constr for more.\n\nYou already have more than enough that something like this will work (the \\c dsize is used for random draws):\n\\code\napop_model *estimated = apop_estimate(your_data, your_new_model);\n\\endcode\n\nOnce that baseline works, you can fill in other elements of the \\ref apop_model as needed.\n\nFor example, if you are using a maximum likelihood method to estimate parameters, you can get much faster estimates and better covariance estimates by specifying the dlog likelihood function (aka the score):\n\n\\code\nvoid apop_new_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *m){\n    //do algebra here to find df/dp0, df/dp1, df/dp2....\n    gsl_vector_set(gradient, 0, d_0);\n    gsl_vector_set(gradient, 1, d_1);\n}\n\\endcode\nThe score is not part of the model object, but is registered (see below) using \n\\code\napop_score_insert(apop_new_dlog_likelihood, your_new_model);\n\\endcode\n\n\\subsection On Threading\nMany procedures in Apophenia use OpenMP to thread operations, so assume your functions are running in a threaded environment. If a method can not be threaded, wrap it in an OpenMP critical region. E.g.,\n\n\n\\code\n\nvoid apop_new_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *m){\n    #pragma omp critical (newdlog)\n    {\n        //un-threadable algebra here\n    }\n    gsl_vector_set(gradient, 0, d_0);\n    gsl_vector_set(gradient, 1, d_1);\n}\n\\endcode\n\n\\section settingswriting  Writing new settings groups\n\nYour model may need additional settings or auxiliary information to function, which\nwould require associating a model-specific struct with the model.\nA method associated with a model that uses such a struct usually begins with calls like\n\\code\nlong double ysg_ll(apop_data *d, apop_model *m){\n    ysg_settings *sets = apop_settings_get(m, ysg);\n\n    ...\n}\n\\endcode\n\nThese model-specific structs are handled as expected by \\ref apop_model_copy and \\ref\napop_model_free, and many functions that modify or transform an \\ref apop_model try to\nhandle settings groups as expected. This section describes how to build a settings\ngroup so all these automatic steps happen as expected, and your methods can reliably retrieve settings as needed.\n\nBut before getting into the detail of how to make model-specific groups of settings\nwork, note that there's a lightweight method of storing sundry settings, so in many\ncases you can bypass all of the following.\nThe \\ref apop_model structure has a \\c void pointer named \\c more which you can use to\npoint to a model-specific struct. If \\c more_size is larger than zero (i.e. you set\nit to <tt>your_model.more_size=sizeof(your_struct)</tt>), then it will be copied via \\c\nmemcpy by \\ref apop_model_copy, and freed by \\ref apop_model_free. Apophenia's\nroutines will never impinge on this item, so do what you wish with it.\n\nThe remainder of this subsection describes the information you'll have to provide to make\nuse of the conveniences described to this point: initialization of defaults, smarter\ncopying and freeing, and adding to an arbitrarily long list of settings groups attached\nto a model.  You will need four items: a typedef for the structure itself, plus init, copy, and\nfree functions.  This is the sort of boilerplate that will be familiar to users of\nobject-oriented languages in the style of C++ or Java, but it's really a list of\narbitrarily-typed elements, which makes this feel more like LISP. [And being a\nreimplementation of an existing feature of LISP, this section will be macro-heavy.]\n\nThe settings struct will likely go into a header file, so here is a sample header\nfor a new settings group named \\c ysg_settings, with a dataset, two sizes, and\na reference counter. <tt>ysg</tt> stands for Your Settings Group; replace that\nsubstring with your preferred name in every instance to follow.\n\n\\code\ntypedef struct {\n    int size1, size2;\n    char *refs;\n    apop_data *dataset;\n} ysg_settings;\n\nApop_settings_declarations(ysg)\n\\endcode\n\nThe first item is a familiar structure definition. The last line is a macro that declares the\ninit, copy, and free functions discussed below. This is everything you would\nneed in a header file, should you need one. These are just declarations; we'll write\nthe actual init/copy/free functions below.\n\nThe structure itself gets the full name, \\c ysg_settings. Everything else is a macro\nkeyed on \\c ysg, without the \\c _settings part. Because of these macros, your struct\nname must end in \\c _settings.\n\nIf you have an especially simple structure, then you can generate the three functions with these three macros in your <tt>.c</tt> file:\n\n\\code\nApop_settings_init(ysg, )\nApop_settings_copy(ysg, )\nApop_settings_free(ysg, )\n\\endcode\n\nThese macros generate appropriate functions to do what you'd expect: allocating the\nmain structure, copying one struct to another, freeing the main structure.  \nThe spaces after the commas indicate that in these cases no special code gets added to\nthe functions that these macros expand into.\n\nYou'll never call the generated functions directly; they are called by \\ref Apop_settings_add_group,\n\\ref apop_model_free, and other model or settings-group handling functions.\n\nNow that initializing/copying/freeing of\nthe structure itself is handled, the remainder of this section will be about how to\nadd instructions for the structure internals, like data that is pointed to by the structure elements.\n\n\\li For the allocate function, use the above form if everything in your code defaults to zero/\\c NULL.  \nOtherwise, you will need a new line declaring a default for every element in your structure. There is a macro to help with this too. \nThese macros will define for your use a structure named \\c in, and an output pointer-to-struct named \\c out.\nContinuing the above example:\n\\code\nApop_settings_init (ysg, \n    Apop_stopif(!in.size1, return NULL, 0, \"I need you to give me a value for size1.\");\n    Apop_varad_set(size2, 10);\n    Apop_varad_set(dataset, apop_data_alloc(out->size1, out->size2));\n    Apop_varad_set(refs, malloc(sizeof(int)));\n    *refs=1;\n)\n\\endcode\nNow, <tt>Apop_settings_add(a_model, ysg, .size1=100)</tt> would set up a group with\na 100-by-10 data set, and set the reference counter allocated and to one.\n\n\\li Some functions do extensive internal copying, so you will need a copy function even\nif your code has no explicit calls to \\ref apop_model_copy. The default above simply\ncopies every element in the structure. Pointers are copied, giving you two pointers\npointing to the same data. We have to be careful to prevent double-freeing later.\n\\code\n//The elements of the set to copy are all copied by the function's boilerplate,\n//and then make one additional modification:\nApop_settings_copy (ysg,\n    #pragma omp critical (ysg_refs)\n        (*refs)++;\n)\n\\endcode\n\n\\li The struct itself is freed by boilerplate code, but add code in the free function\nto free data pointed to by pointers in the main structure. The macro defines a\npointer-to-struct named \\c in for your use. Continuing the example:\n\\code\nApop_settings_free (ysg,\n    #pragma omp critical (ysg_refs)\n        if (!(--in->refs)) {\n            free(in->dataset);\n            free(in->refs);\n        }\n)\n\\endcode\n\nWith those three macros in place and the header as above, Apophenia will treat your\nsettings group like any other, and users can use \\ref Apop_settings_add_group to\npopulate it and attach it to any model.\n\n\\section vtables Registering new methods in vtables\n\nThe settings groups are for adding arbitrary model-specific nouns; vtables are for\nadding arbitrary model-specific verbs.\n\nMany functions (e.g., entropy, the dlog likelihood, Bayesian updating) have\nspecial cases for well-known models like the Normal distribution. \nAny function may maintain a registry of models and associated special-case procedures, aka a vtable.\n\nLookups happen based on a hash that takes into account the elements of the model\nthat will be used in the calculation. For example, the \\c apop_update_hash takes in two\nmodels and calculates the hash based on the address of the prior's \\c draw method and\nthe likelihood's \\c log_likelihood or \\c p method. Thus, a vtable lookup for new models\nthat re-use the same methods (at the same addresses in memory) will still find the\nsame special-case function.\n\nIf you need to deregister the function, use the associated deregister function,\ne.g. <tt>apop_update_vtable_drop(apop_beta, apop_binomial)</tt>. You can guarantee\nthat a method will not be re-added by following up the <tt>_drop</tt> with, e.g.,\n<tt>apop_update_vtable_add(NULL, apop_beta, apop_binomial)</tt>.\n\nThe steps for adding a function to an existing vtable:\n\n\\li See \\ref apop_update, \\ref apop_score, \\ref apop_predict, \\ref apop_model_print, and \\ref\napop_parameter_model for examples and procedure-specific details.\n\\li Write a function following the given type definition, as listed in the function's documentation.\n\\li Use the associated <tt>_vtable_add</tt> function to add the function and associate it\nwith the given model. For example, to add a Beta-binomial routine named \\c betabinom\nto the registry of Bayesian updating routines, use <tt>apop_update_vtable_add(betabinom,\napop_beta, apop_binomial)</tt>.\n\\li Place a call to <tt>..._vtable_add</tt> in the \\c prep method of the given model, thus ensuring that the auxiliary functions are registered after the first time the model is sent to \\ref apop_estimate.\n\nThe easiest way to set up a new vtable is to copy/paste/modify an existing one. Briefly:\n\n\\li See the existing setups in the vtables portion of <tt>apop.h</tt>. \n\\li Cut/paste one and do a search and replace to change the name to match your desired use.\n\\li Set the typedef to describe the functions that get added to the vtable.\n\\li Rewrite the hash function to check the part of the inputs that interest you. For\nexample, the update vtable associates functions with the \\c draw, \\c log_likelihood,\nand \\p methods of the model. A model where these elements are identical \nwill still match even if other elements are different.\n\n\\section modeldataparts The data elements\n\nThe remainder of this section covers the detailed expectations regarding the elements\nof the \\ref apop_model structure. I begin with the data (non-function) elements,\nand then cover the method (function) elements. Some of the following will be\nrequirements for all models and some will be advice to authors; I use the accepted\ndefinitions of <a href=\"http://tools.ietf.org/html/rfc2119\">\"must\", \"shall\", \"may\"</a>\nand related words.\n\n\\subsection datasubsec Data\n\n\\li Each row of the \\c data element is treated as a single observation by many functions. \nFor example, \\ref apop_bootstrap_cov depends on each row being an iid observation to function correctly.\nCalculating the Bayesian Information Criterion (BIC) requires knowing\nthe number of observations in the data, and assumes that row count==observation count.\nFor complex data, the \\ref apop_data_pack and \\ref  apop_data_unpack functions can help with this. \n\n\\li Some functions (bootstrap again, or many uses of \\ref apop_kl_divergence) use \\ref\napop_draw to use your model's RNG (or a default) to draw a\n  value, write it to the matrix element of the data set, and then move on to an\n  estimation or other step. In this case, the data sent in will be entirely in the \\c\n  ->matrix element of the \\ref apop_data set sent to model methods. Your \\c likelihood, \\c p, \\c cdf, and \\c estimate routines\n  must accept data as a single row of the matrix of the \\ref apop_data set for such functions to work.\n  They may accept other formats. Tip: you can use \\ref apop_data_pack and \\ref apop_data_unpack to convert a structured set to a single row and back again.\n\n\\li Your routines may accept other data formats, as per contract with the user.\n    For example, regression-type functions use a function named \\c ols_shuffle\n    to convert a matrix where the first column is the dependent variable to a data\n    set with dependent variable in the vector and a column of ones in the first\n    matrix column.\n\n\\subsection paramsubsec Parameters, vsize, msize1,  msize2\n\n\\li The sizes will be used by the \\c prep method of the model; see below. Given the model \\c m and its elements \\c m.vsize, \\c m.msize1, \\c m.msize2,\n    functions that need to allocate a parameter set will do so via <tt>apop_data_alloc(m.vsize, m.msize1, m.msize2)</tt>. \n\n\n\\subsection infosubsec Info\n\n\\li The first page, which should be named \\c &lt;info&gt;, is typically a list of scalars. Nothing is guaranteed, but the elements may include:\n\n\\li AIC: <a href=\"https://en.wikipedia.org/wiki/Akaike's_Information_Criterion\">Aikake Information Criterion</a>\n\\li AIC_c: AIC with a finite sample correction. ``<em>Generally, we advocate the use of AIC_c when the ratio \\f$n/K\\f$ is small (say \\f$<\\f$ 40)</em>'' [Kenneth P. Burnham, David R. Anderson: <em>Model Selection and Multi-Model Inference</em>, p 66, emphasis in original.]\n\\li BIC: <a href=\"https://en.wikipedia.org/wiki/Bayesian_information_criterion\">Bayesian Information Criterion</a>\n\\li R squared\n\\li R squared adj\n\\li log likelihood\n\\li status [0=OK, nozero=other].\n\nFor those elements that require a count of input data, the calculations assume each row in the input \\ref apop_data set is a single datum.\n\nGet these via, e.g., <tt>apop_data_get(your_model->info, .rowname=\"log likelihood\")</tt>.\nWhen writing for any arbitrary function, be prepared to handle \\c NaN, indicating that the element is not calculated or saved in the info page by the given model.\n\nFor OLS-type estimations, each row corresponds to the row in the original data. For\nfilling in of missing data, the elements may appear anywhere, so the row/col indices are\nessential.\n\n\\subsection settingsgroupmention settings, more\n\nIn object-oriented jargon, settings groups are the private elements of the data set,\nto be pulled out in certain contexts, and ignored in all others. Therefore, there are\nno rules about internal use. The \\c more element of the \\ref apop_model provides a lightweight\nmeans of attaching an arbitrary struct to a model. See \\ref settingswriting above for details.\n\n\\li As many settings groups of different types as desired can be added to a single \\ref apop_model.\n\\li One \\ref apop_model can not hold two settings groups of the same type. Re-additions cause the removal of the previous version of the group.\n\\li If the \\c more pointer points to a structure or value (let it be \\c ss), then \\c more_size must be set to <tt>sizeof(ss)</tt>.\n\n\n\\section methodsection Methods\n\n\\subsection psubsection p, log_likelihood\n\n\\li Function headers look like  <tt>long double your_p_or_ll(apop_data *d, apop_model *params)</tt>.\n\\li The inputs are an \\ref apop_data set and an \\ref apop_model, which should include the elements needed to fully estimate the probability/likelihood (probably a filled <tt>->parameters</tt> element, possibly a settings group added by the user).\n\\li Assume that the parameters have been set, by users via \\ref apop_estimate or \\ref apop_model_set_parameters, or by \\ref apop_maximum_likelihood by its search algorithms. If the parameters are necessary, the function shall check that the parameters are not \\c NULL and set the model's \\c error element to \\c 'p' if they are missing.\n\\li Return \\c NaN on errors. If an error in the input model is found, the function may set the input model's \\c error element to an appropriate \\c char value.\n\\li If your model includes both \\c log_likelihood and \\c p methods, it must be the case that <tt>log(p(d, m))</tt> equals <tt>log_likelihood(d, m)</tt> for all \\c d and \\c m. This implies that \\c p must return a value \\f$\\geq 0\\f$. Note that \\ref apop_maximum_likelihood will accept functions where \\c p returns a negative value, but diagonstics that depend on log likelihood like AIC will return NaN.\n\\li If observations are assumed to be iid, you may be able to use \\ref apop_map_sum to write the core of the log likelihood function.\n\n\\subsection prepsubsection prep\n\n\\li Function header looks like <tt>void your_prep(apop_data *data, apop_model *params)</tt>.\n\\li Re-prepping a model after it has already been prepped shall have no effect. Where there is ambiguity with the other requirements, this takes precedence.\n\\li The model's <tt>data</tt> pointer shall be set to point to the input data.\n\\li The \\c info element shall be allocated and its title set to <tt>\\<Info\\></tt>.\n\\li If \\c vsize, \\c msize1, or \\c msize2 are -1, then the prep function shall set them to the width of the input data.\n\\li If \\c dsize is -1, then the prep function shall set it to the width of the input data.\n\\li If the \\c parameters element is not allocated, the function shall allocate it via <tt>apop_data_alloc(vsize, msize1, msize2)</tt> (or equivalent).\n\\li The default is \\ref apop_model_clear. It does all of the above.\n\\li The input data may be modified by the prep routine. For example, the \\ref apop_ols prep routine shuffles a single input matrix as described above under \\c data, and the \\ref apop_pmf prep routine calls \\ref apop_data_pmf_compress on the input data.\n\\li The prep routine may initialize any desired settings groups. Unless otherwise\nstated, these should not be removed if they are already there, so that users can override defaults by adding a settings group before starting an estimation.\n\\li If any functions associated with the model need to be added to \na vtable (see above), the registration shall happen here. Registration may also happen elsewhere.\n\n\\subsection estimatesubsection estimate\n\n\\li Function header looks like  <tt> void your_estimate(apop_data *data, apop_model *params)</tt>.\nIt modifies the input model, and returns nothing. Note that this is different from the wrapper function, \\ref apop_estimate, which makes a copy of its input model, preps it, and then calls the \\c estimate function with the prepeped copy.\n\\li Assume that the prep routine has already been run. Notably, this means that parameters have been allocated.\n\\li Assume that the \\c parameters hold garbage (as in a \\c malloc without a subsequent assignment to the <tt>malloc</tt>-ed space).\n\\li The function shall set the \\c parameters of the input model. For consistency with other models, the estimate should be the maximum likelihood estimate, unless otherwise documented.\n\\li Additional settings may be set.\n\\li The model's \\c &lt;Info&gt; page may be filled with statistics, as discussed at infosubsec. For scalars like log likelihood and AIC, use \\ref apop_data_add_named_elmt.\n\\li Data should not be modified by the \\c estimate routine; any changes to the data made by \\c estimate must be documented.\n\\li The default called by \\ref apop_estimate is \\ref apop_maximum_likelihood.\n\\li If errors occur during processing, set the model's \\c error element to a single character. Documentation should include the list of error characters and their meaning.\n\n\\subsection drawsubsection draw\n\n\\li Function header looks like <tt>void your_draw(double *out, gsl_rng* r, apop_model *params)</tt>\n\\li Assume that model \\c paramters are set, via \\ref apop_estimate or \\ref apop_model_set_parameters. The author of the draw method should check that \\c parameters are not \\c NULL if needed and fill the output with NaNs if necessary parameters are not set.\n\\li Caller inputs a pointer-to-<tt>double</tt> of length \\c dsize; user is expected to make sure that there is adequate space. Caller also inputs a \\c gsl_rng, already allocated (probably via \\ref apop_rng_alloc, possibly from \\ref apop_rng_get_thread).\n\\li The function shall fill the space pointed to by the input pointer with a random draw from the data space, where the likelihood of any given observation is proportional to its likelihood as given by the \\c p method. Data shall be reduced to a single vector via \\ref apop_data_pack if it is not already a single vector.\n\n\\subsection cdfsubsection cdf\n\n\\li Function header looks like <tt>long double your_cdf(apop_data *d, apop_model *params)</tt>.\n\\li Assume that \\c parameters are set, via \\ref apop_estimate or \\ref apop_model_set_parameters. The author of the CDF method should check that \\c parameters are not \\c NULL and return NaN if necessary parameters are not set.\n\\li The CDF method must accept data as a single row of data in the \\c matrix of the input \\ref apop_data set (as per a draw produced using the \\c draw method). May accept other formats.\n\\li Returns the percentage of the likelihood function \\f$\\leq\\f$ the first row of the input data. The definition of \\f$\\leq\\f$ is chosen by the model author.\n\\li If one is not already present, an \\c apop_cdf_settings group may be added to the model to store temp data. See the \\ref apop_cdf function for details.\n\n\\subsection constraintsubsection constraint\n\n\\li Function header looks like <tt>long double your_constraint(apop_data *data, apop_model *params)</tt>.\n\\li Assume that \\c parameters are set, via \\ref apop_estimate, \\ref apop_model_set_parameters, or the internals of an MLE search. The author of the constraint method should check that \\c parameters are not \\c NULL and return NaN if necessary parameters are not set.\n\\li See \\ref apop_linear_constraint for a useful basis and/or example. Many constraints can be written as wrappers for this function.\n\\li If the constraint is met, then return zero.\n\\li If the constraint fails, then (1) move the \\c parameters in the input model to a\nconstraint-satisfying value, and (2) return the distance between the input parameters and\nwhat you've moved the parameters to. The choice of within-bounds parameters and distance function is left to the author of the constraint function.\n*/\n\n\n/**\\defgroup models \n\nThis section is a detailed description of the stock models that ship with Apophenia.\nIt is a reference. For an explanation of what to do with an \\ref apop_model, see \\ref modelsec.\n\nThe primary questions one has about a model in practice are what format the input data should take and what to expect of an estimated output.\n\nGenerally, the input data consists of an \\ref apop_data set where each row is a single observation. Details beyond that are listed below.\n\nThe output after running \\ref apop_estimate to produce a fitted model are generally\nfound in three places: the vector of the output parameter set, its matrix, or a new\nsettings group. The basic intuition is that \nif the parameters are always a short list of scalars, they are in the vector;\nif there exists a situation where they could take matrix form, the parameters will be in the matrix;\nif they require more structure than that, they will be a settings group.\n\nIf the basic structure of the \\ref apop_data set is unfamiliar to you, see \\ref\ndataoverview, which will discuss the basic means of getting data out of a struct. For\nexample, the estimated \\ref apop_normal distribution has the mean in position zero of\nthe vector and the standard deviation in position one, so they could be extracted as follows:\n\n\\code\napop_data *d = apop_text_to_data(\"sample data from before\")\napop_model *out = apop_estimate(d, apop_normal);\ndouble mu = apop_data_get(out>parameters, 0);\ndouble sigma = apop_data_get(out>parameters, 1);\n\n//What is the p-value of test whose null hypothesis is that μ=3.3?\nprintf (\"pval=%g\\n\", apop_test(3.3, \"normal\", mu, sigma);\n\\endcode\n\nSee \\ref modelsec for discussion of how to pull settings groups using \\ref\nApop_settings_get (for one item) or \\ref apop_settings_get_group (for a full settings\ngroup).\n*/\n"
  },
  {
    "path": "docs/doxygen.conf.in",
    "content": "# Doxyfile 1.8.9.1\n\n# This file describes the settings to be used by the documentation system\n# doxygen (www.doxygen.org) for a project.\n#\n# All text after a double hash (##) is considered a comment and is placed in\n# front of the TAG it is preceding.\n#\n# All text after a single hash (#) is considered a comment and will be ignored.\n# The format is:\n# TAG = value [value, ...]\n# For lists, items can also be appended using:\n# TAG += value [value, ...]\n# Values that contain spaces should be placed between quotes (\\\" \\\").\n\n#---------------------------------------------------------------------------\n# Project related configuration options\n#---------------------------------------------------------------------------\n\n# This tag specifies the encoding used for all characters in the config file\n# that follow. The default is UTF-8 which is also the encoding used for all text\n# before the first occurrence of this tag. Doxygen uses libiconv (or the iconv\n# built into libc) for the transcoding. See http://www.gnu.org/software/libiconv\n# for the list of possible encodings.\n# The default value is: UTF-8.\n\nDOXYFILE_ENCODING      = UTF-8\n\n# The PROJECT_NAME tag is a single word (or a sequence of words surrounded by\n# double-quotes, unless you are using Doxywizard) that should identify the\n# project for which the documentation is generated. This name is used in the\n# title of most generated pages and in a few other places.\n# The default value is: My Project.\n\nPROJECT_NAME           = Apophenia\n\n# The PROJECT_NUMBER tag can be used to enter a project or revision number. This\n# could be handy for archiving the generated documentation or if some version\n# control system is used.\n\nPROJECT_NUMBER         =\n\n# Using the PROJECT_BRIEF tag one can provide an optional one line description\n# for a project that appears at the top of each page and should give viewer a\n# quick idea about the purpose of the project. Keep the description short.\n\nPROJECT_BRIEF          =\n\n# With the PROJECT_LOGO tag one can specify a logo or an icon that is included\n# in the documentation. The maximum height of the logo should not exceed 55\n# pixels and the maximum width should not exceed 200 pixels. Doxygen will copy\n# the logo to the output directory.\n\nPROJECT_LOGO           =\n\n# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute) path\n# into which the generated documentation will be written. If a relative path is\n# entered, it will be relative to the location where doxygen was started. If\n# left blank the current directory will be used.\n\nOUTPUT_DIRECTORY       =\n\n# If the CREATE_SUBDIRS tag is set to YES then doxygen will create 4096 sub-\n# directories (in 2 levels) under the output directory of each output format and\n# will distribute the generated files over these directories. Enabling this\n# option can be useful when feeding doxygen a huge amount of source files, where\n# putting all generated files in the same directory would otherwise causes\n# performance problems for the file system.\n# The default value is: NO.\n\nCREATE_SUBDIRS         = NO\n\n# If the ALLOW_UNICODE_NAMES tag is set to YES, doxygen will allow non-ASCII\n# characters to appear in the names of generated files. If set to NO, non-ASCII\n# characters will be escaped, for example _xE3_x81_x84 will be used for Unicode\n# U+3044.\n# The default value is: NO.\n\nALLOW_UNICODE_NAMES    = YES\n\n# The OUTPUT_LANGUAGE tag is used to specify the language in which all\n# documentation generated by doxygen is written. Doxygen will use this\n# information to generate all constant output in the proper language.\n# Possible values are: Afrikaans, Arabic, Armenian, Brazilian, Catalan, Chinese,\n# Chinese-Traditional, Croatian, Czech, Danish, Dutch, English (United States),\n# Esperanto, Farsi (Persian), Finnish, French, German, Greek, Hungarian,\n# Indonesian, Italian, Japanese, Japanese-en (Japanese with English messages),\n# Korean, Korean-en (Korean with English messages), Latvian, Lithuanian,\n# Macedonian, Norwegian, Persian (Farsi), Polish, Portuguese, Romanian, Russian,\n# Serbian, Serbian-Cyrillic, Slovak, Slovene, Spanish, Swedish, Turkish,\n# Ukrainian and Vietnamese.\n# The default value is: English.\n\nOUTPUT_LANGUAGE        = English\n\n# If the BRIEF_MEMBER_DESC tag is set to YES, doxygen will include brief member\n# descriptions after the members that are listed in the file and class\n# documentation (similar to Javadoc). Set to NO to disable this.\n# The default value is: YES.\n\nBRIEF_MEMBER_DESC      = YES\n\n# If the REPEAT_BRIEF tag is set to YES, doxygen will prepend the brief\n# description of a member or function before the detailed description\n#\n# Note: If both HIDE_UNDOC_MEMBERS and BRIEF_MEMBER_DESC are set to NO, the\n# brief descriptions will be completely suppressed.\n# The default value is: YES.\n\nREPEAT_BRIEF           = YES\n\n# This tag implements a quasi-intelligent brief description abbreviator that is\n# used to form the text in various listings. Each string in this list, if found\n# as the leading text of the brief description, will be stripped from the text\n# and the result, after processing the whole list, is used as the annotated\n# text. Otherwise, the brief description is used as-is. If left blank, the\n# following values are used ($name is automatically replaced with the name of\n# the entity):The $name class, The $name widget, The $name file, is, provides,\n# specifies, contains, represents, a, an and the.\n\nABBREVIATE_BRIEF       =\n\n# If the ALWAYS_DETAILED_SEC and REPEAT_BRIEF tags are both set to YES then\n# doxygen will generate a detailed section even if there is only a brief\n# description.\n# The default value is: NO.\n\nALWAYS_DETAILED_SEC    = NO\n\n# If the INLINE_INHERITED_MEMB tag is set to YES, doxygen will show all\n# inherited members of a class in the documentation of that class as if those\n# members were ordinary class members. Constructors, destructors and assignment\n# operators of the base classes will not be shown.\n# The default value is: NO.\n\nINLINE_INHERITED_MEMB  = NO\n\n# If the FULL_PATH_NAMES tag is set to YES, doxygen will prepend the full path\n# before files name in the file list and in the header files. If set to NO the\n# shortest path that makes the file name unique will be used\n# The default value is: YES.\n\nFULL_PATH_NAMES        = NO\n\n# The STRIP_FROM_PATH tag can be used to strip a user-defined part of the path.\n# Stripping is only done if one of the specified strings matches the left-hand\n# part of the path. The tag can be used to show relative paths in the file list.\n# If left blank the directory from which doxygen is run is used as the path to\n# strip.\n#\n# Note that you can specify absolute paths here, but also relative paths, which\n# will be relative from the directory where doxygen is started.\n# This tag requires that the tag FULL_PATH_NAMES is set to YES.\n\nSTRIP_FROM_PATH        =\n\n# The STRIP_FROM_INC_PATH tag can be used to strip a user-defined part of the\n# path mentioned in the documentation of a class, which tells the reader which\n# header file to include in order to use a class. If left blank only the name of\n# the header file containing the class definition is used. Otherwise one should\n# specify the list of include paths that are normally passed to the compiler\n# using the -I flag.\n\nSTRIP_FROM_INC_PATH    =\n\n# If the SHORT_NAMES tag is set to YES, doxygen will generate much shorter (but\n# less readable) file names. This can be useful is your file systems doesn't\n# support long names like on DOS, Mac, or CD-ROM.\n# The default value is: NO.\n\nSHORT_NAMES            = NO\n\n# If the JAVADOC_AUTOBRIEF tag is set to YES then doxygen will interpret the\n# first line (until the first dot) of a Javadoc-style comment as the brief\n# description. If set to NO, the Javadoc-style will behave just like regular Qt-\n# style comments (thus requiring an explicit @brief command for a brief\n# description.)\n# The default value is: NO.\n\nJAVADOC_AUTOBRIEF      = NO\n\n# If the QT_AUTOBRIEF tag is set to YES then doxygen will interpret the first\n# line (until the first dot) of a Qt-style comment as the brief description. If\n# set to NO, the Qt-style will behave just like regular Qt-style comments (thus\n# requiring an explicit \\brief command for a brief description.)\n# The default value is: NO.\n\nQT_AUTOBRIEF           = NO\n\n# The MULTILINE_CPP_IS_BRIEF tag can be set to YES to make doxygen treat a\n# multi-line C++ special comment block (i.e. a block of //! or /// comments) as\n# a brief description. This used to be the default behavior. The new default is\n# to treat a multi-line C++ comment block as a detailed description. Set this\n# tag to YES if you prefer the old behavior instead.\n#\n# Note that setting this tag to YES also means that rational rose comments are\n# not recognized any more.\n# The default value is: NO.\n\nMULTILINE_CPP_IS_BRIEF = NO\n\n# If the INHERIT_DOCS tag is set to YES then an undocumented member inherits the\n# documentation from any documented member that it re-implements.\n# The default value is: YES.\n\nINHERIT_DOCS           = YES\n\n# If the SEPARATE_MEMBER_PAGES tag is set to YES then doxygen will produce a new\n# page for each member. If set to NO, the documentation of a member will be part\n# of the file/class/namespace that contains it.\n# The default value is: NO.\n\nSEPARATE_MEMBER_PAGES  = NO\n\n# The TAB_SIZE tag can be used to set the number of spaces in a tab. Doxygen\n# uses this value to replace tabs by spaces in code fragments.\n# Minimum value: 1, maximum value: 16, default value: 4.\n\nTAB_SIZE               = 5\n\n# This tag can be used to specify a number of aliases that act as commands in\n# the documentation. An alias has the form:\n# name=value\n# For example adding\n# \"sideeffect=@par Side Effects:\\n\"\n# will allow you to put the command \\sideeffect (or @sideeffect) in the\n# documentation, which will result in a user-defined paragraph with heading\n# \"Side Effects:\". You can put \\n's in the value part of an alias to insert\n# newlines.\n\nALIASES                =\n\n# This tag can be used to specify a number of word-keyword mappings (TCL only).\n# A mapping has the form \"name=value\". For example adding \"class=itcl::class\"\n# will allow you to use the command class in the itcl::class meaning.\n\nTCL_SUBST              =\n\n# Set the OPTIMIZE_OUTPUT_FOR_C tag to YES if your project consists of C sources\n# only. Doxygen will then generate output that is more tailored for C. For\n# instance, some of the names that are used will be different. The list of all\n# members will be omitted, etc.\n# The default value is: NO.\n\nOPTIMIZE_OUTPUT_FOR_C  = YES\n\n# Set the OPTIMIZE_OUTPUT_JAVA tag to YES if your project consists of Java or\n# Python sources only. Doxygen will then generate output that is more tailored\n# for that language. For instance, namespaces will be presented as packages,\n# qualified scopes will look different, etc.\n# The default value is: NO.\n\nOPTIMIZE_OUTPUT_JAVA   = NO\n\n# Set the OPTIMIZE_FOR_FORTRAN tag to YES if your project consists of Fortran\n# sources. Doxygen will then generate output that is tailored for Fortran.\n# The default value is: NO.\n\nOPTIMIZE_FOR_FORTRAN   = NO\n\n# Set the OPTIMIZE_OUTPUT_VHDL tag to YES if your project consists of VHDL\n# sources. Doxygen will then generate output that is tailored for VHDL.\n# The default value is: NO.\n\nOPTIMIZE_OUTPUT_VHDL   = NO\n\n# Doxygen selects the parser to use depending on the extension of the files it\n# parses. With this tag you can assign which parser to use for a given\n# extension. Doxygen has a built-in mapping, but you can override or extend it\n# using this tag. The format is ext=language, where ext is a file extension, and\n# language is one of the parsers supported by doxygen: IDL, Java, Javascript,\n# C#, C, C++, D, PHP, Objective-C, Python, Fortran (fixed format Fortran:\n# FortranFixed, free formatted Fortran: FortranFree, unknown formatted Fortran:\n# Fortran. In the later case the parser tries to guess whether the code is fixed\n# or free formatted code, this is the default for Fortran type files), VHDL. For\n# instance to make doxygen treat .inc files as Fortran files (default is PHP),\n# and .f files as C (default is Fortran), use: inc=Fortran f=C.\n#\n# Note: For files without extension you can use no_extension as a placeholder.\n#\n# Note that for custom extensions you also need to set FILE_PATTERNS otherwise\n# the files are not read by doxygen.\n\nEXTENSION_MAPPING      =\n\n# If the MARKDOWN_SUPPORT tag is enabled then doxygen pre-processes all comments\n# according to the Markdown format, which allows for more readable\n# documentation. See http://daringfireball.net/projects/markdown/ for details.\n# The output of markdown processing is further processed by doxygen, so you can\n# mix doxygen, HTML, and XML commands with Markdown formatting. Disable only in\n# case of backward compatibilities issues.\n# The default value is: YES.\n\nMARKDOWN_SUPPORT       = YES\n\n# When enabled doxygen tries to link words that correspond to documented\n# classes, or namespaces to their corresponding documentation. Such a link can\n# be prevented in individual cases by putting a % sign in front of the word or\n# globally by setting AUTOLINK_SUPPORT to NO.\n# The default value is: YES.\n\nAUTOLINK_SUPPORT       = YES\n\n# If you use STL classes (i.e. std::string, std::vector, etc.) but do not want\n# to include (a tag file for) the STL sources as input, then you should set this\n# tag to YES in order to let doxygen match functions declarations and\n# definitions whose arguments contain STL classes (e.g. func(std::string);\n# versus func(std::string) {}). This also make the inheritance and collaboration\n# diagrams that involve STL classes more complete and accurate.\n# The default value is: NO.\n\nBUILTIN_STL_SUPPORT    = NO\n\n# If you use Microsoft's C++/CLI language, you should set this option to YES to\n# enable parsing support.\n# The default value is: NO.\n\nCPP_CLI_SUPPORT        = NO\n\n# Set the SIP_SUPPORT tag to YES if your project consists of sip (see:\n# http://www.riverbankcomputing.co.uk/software/sip/intro) sources only. Doxygen\n# will parse them like normal C++ but will assume all classes use public instead\n# of private inheritance when no explicit protection keyword is present.\n# The default value is: NO.\n\nSIP_SUPPORT            = NO\n\n# For Microsoft's IDL there are propget and propput attributes to indicate\n# getter and setter methods for a property. Setting this option to YES will make\n# doxygen to replace the get and set methods by a property in the documentation.\n# This will only work if the methods are indeed getting or setting a simple\n# type. If this is not the case, or you want to show the methods anyway, you\n# should set this option to NO.\n# The default value is: YES.\n\nIDL_PROPERTY_SUPPORT   = YES\n\n# If member grouping is used in the documentation and the DISTRIBUTE_GROUP_DOC\n# tag is set to YES then doxygen will reuse the documentation of the first\n# member in the group (if any) for the other members of the group. By default\n# all members of a group must be documented explicitly.\n# The default value is: NO.\n\nDISTRIBUTE_GROUP_DOC   = NO\n\n# Set the SUBGROUPING tag to YES to allow class member groups of the same type\n# (for instance a group of public functions) to be put as a subgroup of that\n# type (e.g. under the Public Functions section). Set it to NO to prevent\n# subgrouping. Alternatively, this can be done per class using the\n# \\nosubgrouping command.\n# The default value is: YES.\n\nSUBGROUPING            = YES\n\n# When the INLINE_GROUPED_CLASSES tag is set to YES, classes, structs and unions\n# are shown inside the group in which they are included (e.g. using \\ingroup)\n# instead of on a separate page (for HTML and Man pages) or section (for LaTeX\n# and RTF).\n#\n# Note that this feature does not work in combination with\n# SEPARATE_MEMBER_PAGES.\n# The default value is: NO.\n\nINLINE_GROUPED_CLASSES = NO\n\n# When the INLINE_SIMPLE_STRUCTS tag is set to YES, structs, classes, and unions\n# with only public data fields or simple typedef fields will be shown inline in\n# the documentation of the scope in which they are defined (i.e. file,\n# namespace, or group documentation), provided this scope is documented. If set\n# to NO, structs, classes, and unions are shown on a separate page (for HTML and\n# Man pages) or section (for LaTeX and RTF).\n# The default value is: NO.\n\nINLINE_SIMPLE_STRUCTS  = NO\n\n# When TYPEDEF_HIDES_STRUCT tag is enabled, a typedef of a struct, union, or\n# enum is documented as struct, union, or enum with the name of the typedef. So\n# typedef struct TypeS {} TypeT, will appear in the documentation as a struct\n# with name TypeT. When disabled the typedef will appear as a member of a file,\n# namespace, or class. And the struct will be named TypeS. This can typically be\n# useful for C code in case the coding convention dictates that all compound\n# types are typedef'ed and only the typedef is referenced, never the tag name.\n# The default value is: NO.\n\nTYPEDEF_HIDES_STRUCT   = YEs\n\n# The size of the symbol lookup cache can be set using LOOKUP_CACHE_SIZE. This\n# cache is used to resolve symbols given their name and scope. Since this can be\n# an expensive process and often the same symbol appears multiple times in the\n# code, doxygen keeps a cache of pre-resolved symbols. If the cache is too small\n# doxygen will become slower. If the cache is too large, memory is wasted. The\n# cache size is given by this formula: 2^(16+LOOKUP_CACHE_SIZE). The valid range\n# is 0..9, the default is 0, corresponding to a cache size of 2^16=65536\n# symbols. At the end of a run doxygen will report the cache usage and suggest\n# the optimal cache size from a speed point of view.\n# Minimum value: 0, maximum value: 9, default value: 0.\n\nLOOKUP_CACHE_SIZE      = 0\n\n#---------------------------------------------------------------------------\n# Build related configuration options\n#---------------------------------------------------------------------------\n\n# If the EXTRACT_ALL tag is set to YES, doxygen will assume all entities in\n# documentation are documented, even if no documentation was available. Private\n# class members and static file members will be hidden unless the\n# EXTRACT_PRIVATE respectively EXTRACT_STATIC tags are set to YES.\n# Note: This will also disable the warnings about undocumented members that are\n# normally produced when WARNINGS is set to YES.\n# The default value is: NO.\n\nEXTRACT_ALL            = NO\n\n# If the EXTRACT_PRIVATE tag is set to YES, all private members of a class will\n# be included in the documentation.\n# The default value is: NO.\n\nEXTRACT_PRIVATE        = NO\n\n# If the EXTRACT_PACKAGE tag is set to YES, all members with package or internal\n# scope will be included in the documentation.\n# The default value is: NO.\n\nEXTRACT_PACKAGE        = NO\n\n# If the EXTRACT_STATIC tag is set to YES, all static members of a file will be\n# included in the documentation.\n# The default value is: NO.\n\nEXTRACT_STATIC         = NO\n\n# If the EXTRACT_LOCAL_CLASSES tag is set to YES, classes (and structs) defined\n# locally in source files will be included in the documentation. If set to NO,\n# only classes defined in header files are included. Does not have any effect\n# for Java sources.\n# The default value is: YES.\n\nEXTRACT_LOCAL_CLASSES  = YES\n\n# This flag is only useful for Objective-C code. If set to YES, local methods,\n# which are defined in the implementation section but not in the interface are\n# included in the documentation. If set to NO, only methods in the interface are\n# included.\n# The default value is: NO.\n\nEXTRACT_LOCAL_METHODS  = NO\n\n# If this flag is set to YES, the members of anonymous namespaces will be\n# extracted and appear in the documentation as a namespace called\n# 'anonymous_namespace{file}', where file will be replaced with the base name of\n# the file that contains the anonymous namespace. By default anonymous namespace\n# are hidden.\n# The default value is: NO.\n\nEXTRACT_ANON_NSPACES   = NO\n\n# If the HIDE_UNDOC_MEMBERS tag is set to YES, doxygen will hide all\n# undocumented members inside documented classes or files. If set to NO these\n# members will be included in the various overviews, but no documentation\n# section is generated. This option has no effect if EXTRACT_ALL is enabled.\n# The default value is: NO.\n\nHIDE_UNDOC_MEMBERS     = NO\n\n# If the HIDE_UNDOC_CLASSES tag is set to YES, doxygen will hide all\n# undocumented classes that are normally visible in the class hierarchy. If set\n# to NO, these classes will be included in the various overviews. This option\n# has no effect if EXTRACT_ALL is enabled.\n# The default value is: NO.\n\nHIDE_UNDOC_CLASSES     = NO\n\n# If the HIDE_FRIEND_COMPOUNDS tag is set to YES, doxygen will hide all friend\n# (class|struct|union) declarations. If set to NO, these declarations will be\n# included in the documentation.\n# The default value is: NO.\n\nHIDE_FRIEND_COMPOUNDS  = NO\n\n# If the HIDE_IN_BODY_DOCS tag is set to YES, doxygen will hide any\n# documentation blocks found inside the body of a function. If set to NO, these\n# blocks will be appended to the function's detailed documentation block.\n# The default value is: NO.\n\nHIDE_IN_BODY_DOCS      = NO\n\n# The INTERNAL_DOCS tag determines if documentation that is typed after a\n# \\internal command is included. If the tag is set to NO then the documentation\n# will be excluded. Set it to YES to include the internal documentation.\n# The default value is: NO.\n\nINTERNAL_DOCS          = NO\n\n# If the CASE_SENSE_NAMES tag is set to NO then doxygen will only generate file\n# names in lower-case letters. If set to YES, upper-case letters are also\n# allowed. This is useful if you have classes or files whose names only differ\n# in case and if your file system supports case sensitive file names. Windows\n# and Mac users are advised to set this option to NO.\n# The default value is: system dependent.\n\nCASE_SENSE_NAMES       = YES\n\n# If the HIDE_SCOPE_NAMES tag is set to NO then doxygen will show members with\n# their full class and namespace scopes in the documentation. If set to YES, the\n# scope will be hidden.\n# The default value is: NO.\n\nHIDE_SCOPE_NAMES       = NO\n\n# If the HIDE_COMPOUND_REFERENCE tag is set to NO (default) then doxygen will\n# append additional text to a page's title, such as Class Reference. If set to\n# YES the compound reference will be hidden.\n# The default value is: NO.\n\nHIDE_COMPOUND_REFERENCE= NO\n\n# If the SHOW_INCLUDE_FILES tag is set to YES then doxygen will put a list of\n# the files that are included by a file in the documentation of that file.\n# The default value is: YES.\n\nSHOW_INCLUDE_FILES     = NO\n\n# If the SHOW_GROUPED_MEMB_INC tag is set to YES then Doxygen will add for each\n# grouped member an include statement to the documentation, telling the reader\n# which file to include in order to use the member.\n# The default value is: NO.\n\nSHOW_GROUPED_MEMB_INC  = NO\n\n# If the FORCE_LOCAL_INCLUDES tag is set to YES then doxygen will list include\n# files with double quotes in the documentation rather than with sharp brackets.\n# The default value is: NO.\n\nFORCE_LOCAL_INCLUDES   = NO\n\n# If the INLINE_INFO tag is set to YES then a tag [inline] is inserted in the\n# documentation for inline members.\n# The default value is: YES.\n\nINLINE_INFO            = YES\n\n# If the SORT_MEMBER_DOCS tag is set to YES then doxygen will sort the\n# (detailed) documentation of file and class members alphabetically by member\n# name. If set to NO, the members will appear in declaration order.\n# The default value is: YES.\n\nSORT_MEMBER_DOCS       = YES\n\n# If the SORT_BRIEF_DOCS tag is set to YES then doxygen will sort the brief\n# descriptions of file, namespace and class members alphabetically by member\n# name. If set to NO, the members will appear in declaration order. Note that\n# this will also influence the order of the classes in the class list.\n# The default value is: NO.\n\nSORT_BRIEF_DOCS        = YES\n\n# If the SORT_MEMBERS_CTORS_1ST tag is set to YES then doxygen will sort the\n# (brief and detailed) documentation of class members so that constructors and\n# destructors are listed first. If set to NO the constructors will appear in the\n# respective orders defined by SORT_BRIEF_DOCS and SORT_MEMBER_DOCS.\n# Note: If SORT_BRIEF_DOCS is set to NO this option is ignored for sorting brief\n# member documentation.\n# Note: If SORT_MEMBER_DOCS is set to NO this option is ignored for sorting\n# detailed member documentation.\n# The default value is: NO.\n\nSORT_MEMBERS_CTORS_1ST = NO\n\n# If the SORT_GROUP_NAMES tag is set to YES then doxygen will sort the hierarchy\n# of group names into alphabetical order. If set to NO the group names will\n# appear in their defined order.\n# The default value is: NO.\n\nSORT_GROUP_NAMES       = NO\n\n# If the SORT_BY_SCOPE_NAME tag is set to YES, the class list will be sorted by\n# fully-qualified names, including namespaces. If set to NO, the class list will\n# be sorted only by class name, not including the namespace part.\n# Note: This option is not very useful if HIDE_SCOPE_NAMES is set to YES.\n# Note: This option applies only to the class list, not to the alphabetical\n# list.\n# The default value is: NO.\n\nSORT_BY_SCOPE_NAME     = NO\n\n# If the STRICT_PROTO_MATCHING option is enabled and doxygen fails to do proper\n# type resolution of all parameters of a function it will reject a match between\n# the prototype and the implementation of a member function even if there is\n# only one candidate or it is obvious which candidate to choose by doing a\n# simple string match. By disabling STRICT_PROTO_MATCHING doxygen will still\n# accept a match between prototype and implementation in such cases.\n# The default value is: NO.\n\nSTRICT_PROTO_MATCHING  = NO\n\n# The GENERATE_TODOLIST tag can be used to enable (YES) or disable (NO) the todo\n# list. This list is created by putting \\todo commands in the documentation.\n# The default value is: YES.\n\nGENERATE_TODOLIST      = NO\n\n# The GENERATE_TESTLIST tag can be used to enable (YES) or disable (NO) the test\n# list. This list is created by putting \\test commands in the documentation.\n# The default value is: YES.\n\nGENERATE_TESTLIST      = YES\n\n# The GENERATE_BUGLIST tag can be used to enable (YES) or disable (NO) the bug\n# list. This list is created by putting \\bug commands in the documentation.\n# The default value is: YES.\n\nGENERATE_BUGLIST       = YES\n\n# The GENERATE_DEPRECATEDLIST tag can be used to enable (YES) or disable (NO)\n# the deprecated list. This list is created by putting \\deprecated commands in\n# the documentation.\n# The default value is: YES.\n\nGENERATE_DEPRECATEDLIST= NO\n\n# The ENABLED_SECTIONS tag can be used to enable conditional documentation\n# sections, marked by \\if <section_label> ... \\endif and \\cond <section_label>\n# ... \\endcond blocks.\n\nENABLED_SECTIONS       =\n\n# The MAX_INITIALIZER_LINES tag determines the maximum number of lines that the\n# initial value of a variable or macro / define can have for it to appear in the\n# documentation. If the initializer consists of more lines than specified here\n# it will be hidden. Use a value of 0 to hide initializers completely. The\n# appearance of the value of individual variables and macros / defines can be\n# controlled using \\showinitializer or \\hideinitializer command in the\n# documentation regardless of this setting.\n# Minimum value: 0, maximum value: 10000, default value: 30.\n\nMAX_INITIALIZER_LINES  = 0\n\n# Set the SHOW_USED_FILES tag to NO to disable the list of files generated at\n# the bottom of the documentation of classes and structs. If set to YES, the\n# list will mention the files that were used to generate the documentation.\n# The default value is: YES.\n\nSHOW_USED_FILES        = NO\n\n# Set the SHOW_FILES tag to NO to disable the generation of the Files page. This\n# will remove the Files entry from the Quick Index and from the Folder Tree View\n# (if specified).\n# The default value is: YES.\n\nSHOW_FILES             = NO\n\n# Set the SHOW_NAMESPACES tag to NO to disable the generation of the Namespaces\n# page. This will remove the Namespaces entry from the Quick Index and from the\n# Folder Tree View (if specified).\n# The default value is: YES.\n\nSHOW_NAMESPACES        = NO\n\n# The FILE_VERSION_FILTER tag can be used to specify a program or script that\n# doxygen should invoke to get the current version for each file (typically from\n# the version control system). Doxygen will invoke the program by executing (via\n# popen()) the command command input-file, where command is the value of the\n# FILE_VERSION_FILTER tag, and input-file is the name of an input file provided\n# by doxygen. Whatever the program writes to standard output is used as the file\n# version. For an example see the documentation.\n\nFILE_VERSION_FILTER    =\n\n# The LAYOUT_FILE tag can be used to specify a layout file which will be parsed\n# by doxygen. The layout file controls the global structure of the generated\n# output files in an output format independent way. To create the layout file\n# that represents doxygen's defaults, run doxygen with the -l option. You can\n# optionally specify a file name after the option, if omitted DoxygenLayout.xml\n# will be used as the name of the layout file.\n#\n# Note that if you run doxygen from a directory containing a file called\n# DoxygenLayout.xml, doxygen will parse it automatically even if the LAYOUT_FILE\n# tag is left empty.\n\nLAYOUT_FILE            =\n\n# The CITE_BIB_FILES tag can be used to specify one or more bib files containing\n# the reference definitions. This must be a list of .bib files. The .bib\n# extension is automatically appended if omitted. This requires the bibtex tool\n# to be installed. See also http://en.wikipedia.org/wiki/BibTeX for more info.\n# For LaTeX the style of the bibliography can be controlled using\n# LATEX_BIB_STYLE. To use this feature you need bibtex and perl available in the\n# search path. See also \\cite for info how to create references.\n\nCITE_BIB_FILES         =\n\n#---------------------------------------------------------------------------\n# Configuration options related to warning and progress messages\n#---------------------------------------------------------------------------\n\n# The QUIET tag can be used to turn on/off the messages that are generated to\n# standard output by doxygen. If QUIET is set to YES this implies that the\n# messages are off.\n# The default value is: NO.\n\nQUIET                  = YES\n\n# The WARNINGS tag can be used to turn on/off the warning messages that are\n# generated to standard error (stderr) by doxygen. If WARNINGS is set to YES\n# this implies that the warnings are on.\n#\n# Tip: Turn warnings on while writing the documentation.\n# The default value is: YES.\n\nWARNINGS               = YES\n\n# If the WARN_IF_UNDOCUMENTED tag is set to YES then doxygen will generate\n# warnings for undocumented members. If EXTRACT_ALL is set to YES then this flag\n# will automatically be disabled.\n# The default value is: YES.\n\nWARN_IF_UNDOCUMENTED   = NO\n\n# If the WARN_IF_DOC_ERROR tag is set to YES, doxygen will generate warnings for\n# potential errors in the documentation, such as not documenting some parameters\n# in a documented function, or documenting parameters that don't exist or using\n# markup commands wrongly.\n# The default value is: YES.\n\nWARN_IF_DOC_ERROR      = YES\n\n# This WARN_NO_PARAMDOC option can be enabled to get warnings for functions that\n# are documented, but have no documentation for their parameters or return\n# value. If set to NO, doxygen will only warn about wrong or incomplete\n# parameter documentation, but not about the absence of documentation.\n# The default value is: NO.\n\nWARN_NO_PARAMDOC       = NO\n\n# The WARN_FORMAT tag determines the format of the warning messages that doxygen\n# can produce. The string should contain the $file, $line, and $text tags, which\n# will be replaced by the file and line number from which the warning originated\n# and the warning text. Optionally the format may contain $version, which will\n# be replaced by the version of the file (if it could be obtained via\n# FILE_VERSION_FILTER)\n# The default value is: $file:$line: $text.\n\nWARN_FORMAT            = \"$file:$line: $text\"\n\n# The WARN_LOGFILE tag can be used to specify a file to which warning and error\n# messages should be written. If left blank the output is written to standard\n# error (stderr).\n\nWARN_LOGFILE           = doxygen.log\n\n#---------------------------------------------------------------------------\n# Configuration options related to the input files\n#---------------------------------------------------------------------------\n\n# The INPUT tag is used to specify the files and/or directories that contain\n# documented source files. You may enter file names like myfile.cpp or\n# directories like /usr/src/myproject. Separate the files or directories with\n# spaces.\n# Note: If this tag is empty the current directory is searched.\n\nINPUT                  = @abs_top_builddir@/docs/include @abs_top_srcdir@\n\n# This tag can be used to specify the character encoding of the source files\n# that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses\n# libiconv (or the iconv built into libc) for the transcoding. See the libiconv\n# documentation (see: http://www.gnu.org/software/libiconv) for the list of\n# possible encodings.\n# The default value is: UTF-8.\n\nINPUT_ENCODING         = UTF-8\n\n# If the value of the INPUT tag contains directories, you can use the\n# FILE_PATTERNS tag to specify one or more wildcard patterns (like *.cpp and\n# *.h) to filter out the source-files in the directories. If left blank the\n# following patterns are tested:*.c, *.cc, *.cxx, *.cpp, *.c++, *.java, *.ii,\n# *.ixx, *.ipp, *.i++, *.inl, *.idl, *.ddl, *.odl, *.h, *.hh, *.hxx, *.hpp,\n# *.h++, *.cs, *.d, *.php, *.php4, *.php5, *.phtml, *.inc, *.m, *.markdown,\n# *.md, *.mm, *.dox, *.py, *.f90, *.f, *.for, *.tcl, *.vhd, *.vhdl, *.ucf,\n# *.qsf, *.as and *.js.\n\nFILE_PATTERNS          =\n\n# The RECURSIVE tag can be used to specify whether or not subdirectories should\n# be searched for input files as well.\n# The default value is: NO.\n\nRECURSIVE              = YES\n\n# The EXCLUDE tag can be used to specify files and/or directories that should be\n# excluded from the INPUT source files. This way you can easily exclude a\n# subdirectory from a directory tree whose root is specified with the INPUT tag.\n#\n# Note that relative paths are relative to the directory from which doxygen is\n# run.\n\nEXCLUDE                =\n\n# The EXCLUDE_SYMLINKS tag can be used to select whether or not files or\n# directories that are symbolic links (a Unix file system feature) are excluded\n# from the input.\n# The default value is: NO.\n\nEXCLUDE_SYMLINKS       = NO\n\n# If the value of the INPUT tag contains directories, you can use the\n# EXCLUDE_PATTERNS tag to specify one or more wildcard patterns to exclude\n# certain files from those directories.\n#\n# Note that the wildcards are matched against the file with absolute path, so to\n# exclude all test directories for example use the pattern */test/*\n\nEXCLUDE_PATTERNS       =\n\n# The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names\n# (namespaces, classes, functions, etc.) that should be excluded from the\n# output. The symbol name can be a fully qualified name, a word, or if the\n# wildcard * is used, a substring. Examples: ANamespace, AClass,\n# AClass::ANamespace, ANamespace::*Test\n#\n# Note that the wildcards are matched against the file with absolute path, so to\n# exclude all test directories use the pattern */test/*\n\nEXCLUDE_SYMBOLS        =\n\n# The EXAMPLE_PATH tag can be used to specify one or more files or directories\n# that contain example code fragments that are included (see the \\include\n# command).\n\nEXAMPLE_PATH           = @abs_top_srcdir@\n\n# If the value of the EXAMPLE_PATH tag contains directories, you can use the\n# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and\n# *.h) to filter out the source-files in the directories. If left blank all\n# files are included.\n\nEXAMPLE_PATTERNS       =\n\n# If the EXAMPLE_RECURSIVE tag is set to YES then subdirectories will be\n# searched for input files to be used with the \\include or \\dontinclude commands\n# irrespective of the value of the RECURSIVE tag.\n# The default value is: NO.\n\nEXAMPLE_RECURSIVE      = YES\n\n# The IMAGE_PATH tag can be used to specify one or more files or directories\n# that contain images that are to be included in the documentation (see the\n# \\image command).\n\nIMAGE_PATH             = .\n\n# The INPUT_FILTER tag can be used to specify a program that doxygen should\n# invoke to filter for each input file. Doxygen will invoke the filter program\n# by executing (via popen()) the command:\n#\n# <filter> <input-file>\n#\n# where <filter> is the value of the INPUT_FILTER tag, and <input-file> is the\n# name of an input file. Doxygen will then use the output that the filter\n# program writes to standard output. If FILTER_PATTERNS is specified, this tag\n# will be ignored.\n#\n# Note that the filter must not add or remove lines; it is applied before the\n# code is scanned, but not when the output code is generated. If lines are added\n# or removed, the anchors will not be placed correctly.\n\nINPUT_FILTER           =\n\n# The FILTER_PATTERNS tag can be used to specify filters on a per file pattern\n# basis. Doxygen will compare the file name with each pattern and apply the\n# filter if there is a match. The filters are a list of the form: pattern=filter\n# (like *.cpp=my_cpp_filter). See INPUT_FILTER for further information on how\n# filters are used. If the FILTER_PATTERNS tag is empty or if none of the\n# patterns match the file name, INPUT_FILTER is applied.\n\nFILTER_PATTERNS        =\n\n# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using\n# INPUT_FILTER) will also be used to filter the input files that are used for\n# producing the source files to browse (i.e. when SOURCE_BROWSER is set to YES).\n# The default value is: NO.\n\nFILTER_SOURCE_FILES    = NO\n\n# The FILTER_SOURCE_PATTERNS tag can be used to specify source filters per file\n# pattern. A pattern will override the setting for FILTER_PATTERN (if any) and\n# it is also possible to disable source filtering for a specific pattern using\n# *.ext= (so without naming a filter).\n# This tag requires that the tag FILTER_SOURCE_FILES is set to YES.\n\nFILTER_SOURCE_PATTERNS =\n\n# If the USE_MDFILE_AS_MAINPAGE tag refers to the name of a markdown file that\n# is part of the input, its contents will be placed on the main page\n# (index.html). This can be useful if you have a project on for instance GitHub\n# and want to reuse the introduction page also for the doxygen output.\n\nUSE_MDFILE_AS_MAINPAGE =\n\n#---------------------------------------------------------------------------\n# Configuration options related to source browsing\n#---------------------------------------------------------------------------\n\n# If the SOURCE_BROWSER tag is set to YES then a list of source files will be\n# generated. Documented entities will be cross-referenced with these sources.\n#\n# Note: To get rid of all source code in the generated output, make sure that\n# also VERBATIM_HEADERS is set to NO.\n# The default value is: NO.\n\nSOURCE_BROWSER         = NO\n\n# Setting the INLINE_SOURCES tag to YES will include the body of functions,\n# classes and enums directly into the documentation.\n# The default value is: NO.\n\nINLINE_SOURCES         = NO\n\n# Setting the STRIP_CODE_COMMENTS tag to YES will instruct doxygen to hide any\n# special comment blocks from generated source code fragments. Normal C, C++ and\n# Fortran comments will always remain visible.\n# The default value is: YES.\n\nSTRIP_CODE_COMMENTS    = YES\n\n# If the REFERENCED_BY_RELATION tag is set to YES then for each documented\n# function all documented functions referencing it will be listed.\n# The default value is: NO.\n\nREFERENCED_BY_RELATION = NO\n\n# If the REFERENCES_RELATION tag is set to YES then for each documented function\n# all documented entities called/used by that function will be listed.\n# The default value is: NO.\n\nREFERENCES_RELATION    = NO\n\n# If the REFERENCES_LINK_SOURCE tag is set to YES and SOURCE_BROWSER tag is set\n# to YES then the hyperlinks from functions in REFERENCES_RELATION and\n# REFERENCED_BY_RELATION lists will link to the source code. Otherwise they will\n# link to the documentation.\n# The default value is: YES.\n\nREFERENCES_LINK_SOURCE = YES\n\n# If SOURCE_TOOLTIPS is enabled (the default) then hovering a hyperlink in the\n# source code will show a tooltip with additional information such as prototype,\n# brief description and links to the definition and documentation. Since this\n# will make the HTML file larger and loading of large files a bit slower, you\n# can opt to disable this feature.\n# The default value is: YES.\n# This tag requires that the tag SOURCE_BROWSER is set to YES.\n\nSOURCE_TOOLTIPS        = YES\n\n# If the USE_HTAGS tag is set to YES then the references to source code will\n# point to the HTML generated by the htags(1) tool instead of doxygen built-in\n# source browser. The htags tool is part of GNU's global source tagging system\n# (see http://www.gnu.org/software/global/global.html). You will need version\n# 4.8.6 or higher.\n#\n# To use it do the following:\n# - Install the latest version of global\n# - Enable SOURCE_BROWSER and USE_HTAGS in the config file\n# - Make sure the INPUT points to the root of the source tree\n# - Run doxygen as normal\n#\n# Doxygen will invoke htags (and that will in turn invoke gtags), so these\n# tools must be available from the command line (i.e. in the search path).\n#\n# The result: instead of the source browser generated by doxygen, the links to\n# source code will now point to the output of htags.\n# The default value is: NO.\n# This tag requires that the tag SOURCE_BROWSER is set to YES.\n\nUSE_HTAGS              = NO\n\n# If the VERBATIM_HEADERS tag is set the YES then doxygen will generate a\n# verbatim copy of the header file for each class for which an include is\n# specified. Set to NO to disable this.\n# See also: Section \\class.\n# The default value is: YES.\n\nVERBATIM_HEADERS       = YES\n\n#---------------------------------------------------------------------------\n# Configuration options related to the alphabetical class index\n#---------------------------------------------------------------------------\n\n# If the ALPHABETICAL_INDEX tag is set to YES, an alphabetical index of all\n# compounds will be generated. Enable this if the project contains a lot of\n# classes, structs, unions or interfaces.\n# The default value is: YES.\n\nALPHABETICAL_INDEX     = YES\n\n# The COLS_IN_ALPHA_INDEX tag can be used to specify the number of columns in\n# which the alphabetical index list will be split.\n# Minimum value: 1, maximum value: 20, default value: 5.\n# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.\n\nCOLS_IN_ALPHA_INDEX    = 3\n\n# In case all classes in a project start with a common prefix, all classes will\n# be put under the same header in the alphabetical index. The IGNORE_PREFIX tag\n# can be used to specify a prefix (or a list of prefixes) that should be ignored\n# while generating the index headers.\n# This tag requires that the tag ALPHABETICAL_INDEX is set to YES.\n\nIGNORE_PREFIX          =\n\n#---------------------------------------------------------------------------\n# Configuration options related to the HTML output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_HTML tag is set to YES, doxygen will generate HTML output\n# The default value is: YES.\n\nGENERATE_HTML          = YES\n\n# The HTML_OUTPUT tag is used to specify where the HTML docs will be put. If a\n# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of\n# it.\n# The default directory is: html.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_OUTPUT            = html\n\n# The HTML_FILE_EXTENSION tag can be used to specify the file extension for each\n# generated HTML page (for example: .htm, .php, .asp).\n# The default value is: .html.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_FILE_EXTENSION    = .html\n\n# The HTML_HEADER tag can be used to specify a user-defined HTML header file for\n# each generated HTML page. If the tag is left blank doxygen will generate a\n# standard header.\n#\n# To get valid HTML the header file that includes any scripts and style sheets\n# that doxygen needs, which is dependent on the configuration options used (e.g.\n# the setting GENERATE_TREEVIEW). It is highly recommended to start with a\n# default header using\n# doxygen -w html new_header.html new_footer.html new_stylesheet.css\n# YourConfigFile\n# and then modify the file new_header.html. See also section \"Doxygen usage\"\n# for information on how to generate the default header that doxygen normally\n# uses.\n# Note: The header is subject to change so you typically have to regenerate the\n# default header when upgrading to a newer version of doxygen. For a description\n# of the possible markers and block names see the documentation.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_HEADER            = @abs_top_srcdir@/docs/head.html\n\n# The HTML_FOOTER tag can be used to specify a user-defined HTML footer for each\n# generated HTML page. If the tag is left blank doxygen will generate a standard\n# footer. See HTML_HEADER for more information on how to generate a default\n# footer and what special commands can be used inside the footer. See also\n# section \"Doxygen usage\" for information on how to generate the default footer\n# that doxygen normally uses.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_FOOTER            = @abs_top_srcdir@/docs/foot.html\n\n# The HTML_STYLESHEET tag can be used to specify a user-defined cascading style\n# sheet that is used by each HTML page. It can be used to fine-tune the look of\n# the HTML output. If left blank doxygen will generate a default style sheet.\n# See also section \"Doxygen usage\" for information on how to generate the style\n# sheet that doxygen normally uses.\n# Note: It is recommended to use HTML_EXTRA_STYLESHEET instead of this tag, as\n# it is more robust and this tag (HTML_STYLESHEET) will in the future become\n# obsolete.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_STYLESHEET        = @abs_top_srcdir@/docs/typical.css\n\n# The HTML_EXTRA_STYLESHEET tag can be used to specify additional user-defined\n# cascading style sheets that are included after the standard style sheets\n# created by doxygen. Using this option one can overrule certain style aspects.\n# This is preferred over using HTML_STYLESHEET since it does not replace the\n# standard style sheet and is therefore more robust against future updates.\n# Doxygen will copy the style sheet files to the output directory.\n# Note: The order of the extra style sheet files is of importance (e.g. the last\n# style sheet in the list overrules the setting of the previous ones in the\n# list). For an example see the documentation.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_EXTRA_STYLESHEET  =\n\n# The HTML_EXTRA_FILES tag can be used to specify one or more extra images or\n# other source files which should be copied to the HTML output directory. Note\n# that these files will be copied to the base HTML output directory. Use the\n# $relpath^ marker in the HTML_HEADER and/or HTML_FOOTER files to load these\n# files. In the HTML_STYLESHEET file, use the file name only. Also note that the\n# files will be copied as-is; there are no commands or markers available.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_EXTRA_FILES       =\n\n# The HTML_COLORSTYLE_HUE tag controls the color of the HTML output. Doxygen\n# will adjust the colors in the style sheet and background images according to\n# this color. Hue is specified as an angle on a colorwheel, see\n# http://en.wikipedia.org/wiki/Hue for more information. For instance the value\n# 0 represents red, 60 is yellow, 120 is green, 180 is cyan, 240 is blue, 300\n# purple, and 360 is red again.\n# Minimum value: 0, maximum value: 359, default value: 220.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_COLORSTYLE_HUE    = 60\n\n# The HTML_COLORSTYLE_SAT tag controls the purity (or saturation) of the colors\n# in the HTML output. For a value of 0 the output will use grayscales only. A\n# value of 255 will produce the most vivid colors.\n# Minimum value: 0, maximum value: 255, default value: 100.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_COLORSTYLE_SAT    = 13\n\n# The HTML_COLORSTYLE_GAMMA tag controls the gamma correction applied to the\n# luminance component of the colors in the HTML output. Values below 100\n# gradually make the output lighter, whereas values above 100 make the output\n# darker. The value divided by 100 is the actual gamma applied, so 80 represents\n# a gamma of 0.8, The value 220 represents a gamma of 2.2, and 100 does not\n# change the gamma.\n# Minimum value: 40, maximum value: 240, default value: 80.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_COLORSTYLE_GAMMA  = 100\n\n# If the HTML_TIMESTAMP tag is set to YES then the footer of each generated HTML\n# page will contain the date and time when the page was generated. Setting this\n# to NO can help when comparing the output of multiple runs.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_TIMESTAMP         = YES\n\n# If the HTML_DYNAMIC_SECTIONS tag is set to YES then the generated HTML\n# documentation will contain sections that can be hidden and shown after the\n# page has loaded.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_DYNAMIC_SECTIONS  = NO\n\n# With HTML_INDEX_NUM_ENTRIES one can control the preferred number of entries\n# shown in the various tree structured indices initially; the user can expand\n# and collapse entries dynamically later on. Doxygen will expand the tree to\n# such a level that at most the specified number of entries are visible (unless\n# a fully collapsed tree already exceeds this amount). So setting the number of\n# entries 1 will produce a full collapsed tree by default. 0 is a special value\n# representing an infinite number of entries and will result in a full expanded\n# tree by default.\n# Minimum value: 0, maximum value: 9999, default value: 100.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nHTML_INDEX_NUM_ENTRIES = 100\n\n# If the GENERATE_DOCSET tag is set to YES, additional index files will be\n# generated that can be used as input for Apple's Xcode 3 integrated development\n# environment (see: http://developer.apple.com/tools/xcode/), introduced with\n# OSX 10.5 (Leopard). To create a documentation set, doxygen will generate a\n# Makefile in the HTML output directory. Running make will produce the docset in\n# that directory and running make install will install the docset in\n# ~/Library/Developer/Shared/Documentation/DocSets so that Xcode will find it at\n# startup. See http://developer.apple.com/tools/creatingdocsetswithdoxygen.html\n# for more information.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nGENERATE_DOCSET        = NO\n\n# This tag determines the name of the docset feed. A documentation feed provides\n# an umbrella under which multiple documentation sets from a single provider\n# (such as a company or product suite) can be grouped.\n# The default value is: Doxygen generated docs.\n# This tag requires that the tag GENERATE_DOCSET is set to YES.\n\nDOCSET_FEEDNAME        = \"Doxygen generated docs\"\n\n# This tag specifies a string that should uniquely identify the documentation\n# set bundle. This should be a reverse domain-name style string, e.g.\n# com.mycompany.MyDocSet. Doxygen will append .docset to the name.\n# The default value is: org.doxygen.Project.\n# This tag requires that the tag GENERATE_DOCSET is set to YES.\n\nDOCSET_BUNDLE_ID       = org.doxygen.Project\n\n# The DOCSET_PUBLISHER_ID tag specifies a string that should uniquely identify\n# the documentation publisher. This should be a reverse domain-name style\n# string, e.g. com.mycompany.MyDocSet.documentation.\n# The default value is: org.doxygen.Publisher.\n# This tag requires that the tag GENERATE_DOCSET is set to YES.\n\nDOCSET_PUBLISHER_ID    = org.doxygen.Publisher\n\n# The DOCSET_PUBLISHER_NAME tag identifies the documentation publisher.\n# The default value is: Publisher.\n# This tag requires that the tag GENERATE_DOCSET is set to YES.\n\nDOCSET_PUBLISHER_NAME  = Publisher\n\n# If the GENERATE_HTMLHELP tag is set to YES then doxygen generates three\n# additional HTML index files: index.hhp, index.hhc, and index.hhk. The\n# index.hhp is a project file that can be read by Microsoft's HTML Help Workshop\n# (see: http://www.microsoft.com/en-us/download/details.aspx?id=21138) on\n# Windows.\n#\n# The HTML Help Workshop contains a compiler that can convert all HTML output\n# generated by doxygen into a single compiled HTML file (.chm). Compiled HTML\n# files are now used as the Windows 98 help format, and will replace the old\n# Windows help format (.hlp) on all Windows platforms in the future. Compressed\n# HTML files also contain an index, a table of contents, and you can search for\n# words in the documentation. The HTML workshop also contains a viewer for\n# compressed HTML files.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nGENERATE_HTMLHELP      = NO\n\n# The CHM_FILE tag can be used to specify the file name of the resulting .chm\n# file. You can add a path in front of the file if the result should not be\n# written to the html output directory.\n# This tag requires that the tag GENERATE_HTMLHELP is set to YES.\n\nCHM_FILE               =\n\n# The HHC_LOCATION tag can be used to specify the location (absolute path\n# including file name) of the HTML help compiler (hhc.exe). If non-empty,\n# doxygen will try to run the HTML help compiler on the generated index.hhp.\n# The file has to be specified with full path.\n# This tag requires that the tag GENERATE_HTMLHELP is set to YES.\n\nHHC_LOCATION           =\n\n# The GENERATE_CHI flag controls if a separate .chi index file is generated\n# (YES) or that it should be included in the master .chm file (NO).\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTMLHELP is set to YES.\n\nGENERATE_CHI           = NO\n\n# The CHM_INDEX_ENCODING is used to encode HtmlHelp index (hhk), content (hhc)\n# and project file content.\n# This tag requires that the tag GENERATE_HTMLHELP is set to YES.\n\nCHM_INDEX_ENCODING     =\n\n# The BINARY_TOC flag controls whether a binary table of contents is generated\n# (YES) or a normal table of contents (NO) in the .chm file. Furthermore it\n# enables the Previous and Next buttons.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTMLHELP is set to YES.\n\nBINARY_TOC             = NO\n\n# The TOC_EXPAND flag can be set to YES to add extra items for group members to\n# the table of contents of the HTML help documentation and to the tree view.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTMLHELP is set to YES.\n\nTOC_EXPAND             = NO\n\n# If the GENERATE_QHP tag is set to YES and both QHP_NAMESPACE and\n# QHP_VIRTUAL_FOLDER are set, an additional index file will be generated that\n# can be used as input for Qt's qhelpgenerator to generate a Qt Compressed Help\n# (.qch) of the generated HTML documentation.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nGENERATE_QHP           = NO\n\n# If the QHG_LOCATION tag is specified, the QCH_FILE tag can be used to specify\n# the file name of the resulting .qch file. The path specified is relative to\n# the HTML output folder.\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQCH_FILE               =\n\n# The QHP_NAMESPACE tag specifies the namespace to use when generating Qt Help\n# Project output. For more information please see Qt Help Project / Namespace\n# (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#namespace).\n# The default value is: org.doxygen.Project.\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQHP_NAMESPACE          = org.doxygen.Project\n\n# The QHP_VIRTUAL_FOLDER tag specifies the namespace to use when generating Qt\n# Help Project output. For more information please see Qt Help Project / Virtual\n# Folders (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#virtual-\n# folders).\n# The default value is: doc.\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQHP_VIRTUAL_FOLDER     = doc\n\n# If the QHP_CUST_FILTER_NAME tag is set, it specifies the name of a custom\n# filter to add. For more information please see Qt Help Project / Custom\n# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-\n# filters).\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQHP_CUST_FILTER_NAME   =\n\n# The QHP_CUST_FILTER_ATTRS tag specifies the list of the attributes of the\n# custom filter to add. For more information please see Qt Help Project / Custom\n# Filters (see: http://qt-project.org/doc/qt-4.8/qthelpproject.html#custom-\n# filters).\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQHP_CUST_FILTER_ATTRS  =\n\n# The QHP_SECT_FILTER_ATTRS tag specifies the list of the attributes this\n# project's filter section matches. Qt Help Project / Filter Attributes (see:\n# http://qt-project.org/doc/qt-4.8/qthelpproject.html#filter-attributes).\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQHP_SECT_FILTER_ATTRS  =\n\n# The QHG_LOCATION tag can be used to specify the location of Qt's\n# qhelpgenerator. If non-empty doxygen will try to run qhelpgenerator on the\n# generated .qhp file.\n# This tag requires that the tag GENERATE_QHP is set to YES.\n\nQHG_LOCATION           =\n\n# If the GENERATE_ECLIPSEHELP tag is set to YES, additional index files will be\n# generated, together with the HTML files, they form an Eclipse help plugin. To\n# install this plugin and make it available under the help contents menu in\n# Eclipse, the contents of the directory containing the HTML and XML files needs\n# to be copied into the plugins directory of eclipse. The name of the directory\n# within the plugins directory should be the same as the ECLIPSE_DOC_ID value.\n# After copying Eclipse needs to be restarted before the help appears.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nGENERATE_ECLIPSEHELP   = NO\n\n# A unique identifier for the Eclipse help plugin. When installing the plugin\n# the directory name containing the HTML and XML files should also have this\n# name. Each documentation set should have its own identifier.\n# The default value is: org.doxygen.Project.\n# This tag requires that the tag GENERATE_ECLIPSEHELP is set to YES.\n\nECLIPSE_DOC_ID         = org.doxygen.Project\n\n# If you want full control over the layout of the generated HTML pages it might\n# be necessary to disable the index and replace it with your own. The\n# DISABLE_INDEX tag can be used to turn on/off the condensed index (tabs) at top\n# of each HTML page. A value of NO enables the index and the value YES disables\n# it. Since the tabs in the index contain the same information as the navigation\n# tree, you can set this option to YES if you also set GENERATE_TREEVIEW to YES.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nDISABLE_INDEX          = YES\n\n# The GENERATE_TREEVIEW tag is used to specify whether a tree-like index\n# structure should be generated to display hierarchical information. If the tag\n# value is set to YES, a side panel will be generated containing a tree-like\n# index structure (just like the one that is generated for HTML Help). For this\n# to work a browser that supports JavaScript, DHTML, CSS and frames is required\n# (i.e. any modern browser). Windows users are probably better off using the\n# HTML help feature. Via custom style sheets (see HTML_EXTRA_STYLESHEET) one can\n# further fine-tune the look of the index. As an example, the default style\n# sheet generated by doxygen has an example that shows how to put an image at\n# the root of the tree instead of the PROJECT_NAME. Since the tree basically has\n# the same information as the tab index, you could consider setting\n# DISABLE_INDEX to YES when enabling this option.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nGENERATE_TREEVIEW      = YES\n\n# The ENUM_VALUES_PER_LINE tag can be used to set the number of enum values that\n# doxygen will group on one line in the generated HTML documentation.\n#\n# Note that a value of 0 will completely suppress the enum values from appearing\n# in the overview section.\n# Minimum value: 0, maximum value: 20, default value: 4.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nENUM_VALUES_PER_LINE   = 4\n\n# If the treeview is enabled (see GENERATE_TREEVIEW) then this tag can be used\n# to set the initial width (in pixels) of the frame in which the tree is shown.\n# Minimum value: 0, maximum value: 1500, default value: 250.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nTREEVIEW_WIDTH         = 250\n\n# If the EXT_LINKS_IN_WINDOW option is set to YES, doxygen will open links to\n# external symbols imported via tag files in a separate window.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nEXT_LINKS_IN_WINDOW    = NO\n\n# Use this tag to change the font size of LaTeX formulas included as images in\n# the HTML documentation. When you change the font size after a successful\n# doxygen run you need to manually remove any form_*.png images from the HTML\n# output directory to force them to be regenerated.\n# Minimum value: 8, maximum value: 50, default value: 10.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nFORMULA_FONTSIZE       = 10\n\n# Use the FORMULA_TRANPARENT tag to determine whether or not the images\n# generated for formulas are transparent PNGs. Transparent PNGs are not\n# supported properly for IE 6.0, but are supported on all modern browsers.\n#\n# Note that when changing this option you need to delete any form_*.png files in\n# the HTML output directory before the changes have effect.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nFORMULA_TRANSPARENT    = YES\n\n# Enable the USE_MATHJAX option to render LaTeX formulas using MathJax (see\n# http://www.mathjax.org) which uses client side Javascript for the rendering\n# instead of using pre-rendered bitmaps. Use this if you do not have LaTeX\n# installed or if you want to formulas look prettier in the HTML output. When\n# enabled you may also need to install MathJax separately and configure the path\n# to it using the MATHJAX_RELPATH option.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\n# Slows page loading too much\nUSE_MATHJAX            = NO\n\n# When MathJax is enabled you can set the default output format to be used for\n# the MathJax output. See the MathJax site (see:\n# http://docs.mathjax.org/en/latest/output.html) for more details.\n# Possible values are: HTML-CSS (which is slower, but has the best\n# compatibility), NativeMML (i.e. MathML) and SVG.\n# The default value is: HTML-CSS.\n# This tag requires that the tag USE_MATHJAX is set to YES.\n\nMATHJAX_FORMAT         = HTML-CSS\n\n# When MathJax is enabled you need to specify the location relative to the HTML\n# output directory using the MATHJAX_RELPATH option. The destination directory\n# should contain the MathJax.js script. For instance, if the mathjax directory\n# is located at the same level as the HTML output directory, then\n# MATHJAX_RELPATH should be ../mathjax. The default value points to the MathJax\n# Content Delivery Network so you can quickly see the result without installing\n# MathJax. However, it is strongly recommended to install a local copy of\n# MathJax from http://www.mathjax.org before deployment.\n# The default value is: http://cdn.mathjax.org/mathjax/latest.\n# This tag requires that the tag USE_MATHJAX is set to YES.\n\nMATHJAX_RELPATH        = http://cdn.mathjax.org/mathjax/latest\n\n# The MATHJAX_EXTENSIONS tag can be used to specify one or more MathJax\n# extension names that should be enabled during MathJax rendering. For example\n# MATHJAX_EXTENSIONS = TeX/AMSmath TeX/AMSsymbols\n# This tag requires that the tag USE_MATHJAX is set to YES.\n\nMATHJAX_EXTENSIONS     =\n\n# The MATHJAX_CODEFILE tag can be used to specify a file with javascript pieces\n# of code that will be used on startup of the MathJax code. See the MathJax site\n# (see: http://docs.mathjax.org/en/latest/output.html) for more details. For an\n# example see the documentation.\n# This tag requires that the tag USE_MATHJAX is set to YES.\n\nMATHJAX_CODEFILE       =\n\n# When the SEARCHENGINE tag is enabled doxygen will generate a search box for\n# the HTML output. The underlying search engine uses javascript and DHTML and\n# should work on any modern browser. Note that when using HTML help\n# (GENERATE_HTMLHELP), Qt help (GENERATE_QHP), or docsets (GENERATE_DOCSET)\n# there is already a search function so this one should typically be disabled.\n# For large projects the javascript based search engine can be slow, then\n# enabling SERVER_BASED_SEARCH may provide a better solution. It is possible to\n# search using the keyboard; to jump to the search box use <access key> + S\n# (what the <access key> is depends on the OS and browser, but it is typically\n# <CTRL>, <ALT>/<option>, or both). Inside the search box use the <cursor down\n# key> to jump into the search results window, the results can be navigated\n# using the <cursor keys>. Press <Enter> to select an item or <escape> to cancel\n# the search. The filter options can be selected when the cursor is inside the\n# search box by pressing <Shift>+<cursor down>. Also here use the <cursor keys>\n# to select a filter and <Enter> or <escape> to activate or cancel the filter\n# option.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_HTML is set to YES.\n\nSEARCHENGINE           = YES\n\n# When the SERVER_BASED_SEARCH tag is enabled the search engine will be\n# implemented using a web server instead of a web client using Javascript. There\n# are two flavors of web server based searching depending on the EXTERNAL_SEARCH\n# setting. When disabled, doxygen will generate a PHP script for searching and\n# an index file used by the script. When EXTERNAL_SEARCH is enabled the indexing\n# and searching needs to be provided by external tools. See the section\n# \"External Indexing and Searching\" for details.\n# The default value is: NO.\n# This tag requires that the tag SEARCHENGINE is set to YES.\n\nSERVER_BASED_SEARCH    = NO\n\n# When EXTERNAL_SEARCH tag is enabled doxygen will no longer generate the PHP\n# script for searching. Instead the search results are written to an XML file\n# which needs to be processed by an external indexer. Doxygen will invoke an\n# external search engine pointed to by the SEARCHENGINE_URL option to obtain the\n# search results.\n#\n# Doxygen ships with an example indexer (doxyindexer) and search engine\n# (doxysearch.cgi) which are based on the open source search engine library\n# Xapian (see: http://xapian.org/).\n#\n# See the section \"External Indexing and Searching\" for details.\n# The default value is: NO.\n# This tag requires that the tag SEARCHENGINE is set to YES.\n\nEXTERNAL_SEARCH        = NO\n\n# The SEARCHENGINE_URL should point to a search engine hosted by a web server\n# which will return the search results when EXTERNAL_SEARCH is enabled.\n#\n# Doxygen ships with an example indexer (doxyindexer) and search engine\n# (doxysearch.cgi) which are based on the open source search engine library\n# Xapian (see: http://xapian.org/). See the section \"External Indexing and\n# Searching\" for details.\n# This tag requires that the tag SEARCHENGINE is set to YES.\n\nSEARCHENGINE_URL       =\n\n# When SERVER_BASED_SEARCH and EXTERNAL_SEARCH are both enabled the unindexed\n# search data is written to a file for indexing by an external tool. With the\n# SEARCHDATA_FILE tag the name of this file can be specified.\n# The default file is: searchdata.xml.\n# This tag requires that the tag SEARCHENGINE is set to YES.\n\nSEARCHDATA_FILE        = searchdata.xml\n\n# When SERVER_BASED_SEARCH and EXTERNAL_SEARCH are both enabled the\n# EXTERNAL_SEARCH_ID tag can be used as an identifier for the project. This is\n# useful in combination with EXTRA_SEARCH_MAPPINGS to search through multiple\n# projects and redirect the results back to the right project.\n# This tag requires that the tag SEARCHENGINE is set to YES.\n\nEXTERNAL_SEARCH_ID     =\n\n# The EXTRA_SEARCH_MAPPINGS tag can be used to enable searching through doxygen\n# projects other than the one defined by this configuration file, but that are\n# all added to the same external search index. Each project needs to have a\n# unique id set via EXTERNAL_SEARCH_ID. The search mapping then maps the id of\n# to a relative location where the documentation can be found. The format is:\n# EXTRA_SEARCH_MAPPINGS = tagname1=loc1 tagname2=loc2 ...\n# This tag requires that the tag SEARCHENGINE is set to YES.\n\nEXTRA_SEARCH_MAPPINGS  =\n\n#---------------------------------------------------------------------------\n# Configuration options related to the LaTeX output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_LATEX tag is set to YES, doxygen will generate LaTeX output.\n# The default value is: YES.\n\nGENERATE_LATEX         = YES\n\n# The LATEX_OUTPUT tag is used to specify where the LaTeX docs will be put. If a\n# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of\n# it.\n# The default directory is: latex.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_OUTPUT           = latex\n\n# The LATEX_CMD_NAME tag can be used to specify the LaTeX command name to be\n# invoked.\n#\n# Note that when enabling USE_PDFLATEX this option is only used for generating\n# bitmaps for formulas in the HTML output, but not in the Makefile that is\n# written to the output directory.\n# The default file is: latex.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_CMD_NAME         = latex\n\n# The MAKEINDEX_CMD_NAME tag can be used to specify the command name to generate\n# index for LaTeX.\n# The default file is: makeindex.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nMAKEINDEX_CMD_NAME     = makeindex\n\n# If the COMPACT_LATEX tag is set to YES, doxygen generates more compact LaTeX\n# documents. This may be useful for small projects and may help to save some\n# trees in general.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nCOMPACT_LATEX          = YES\n\n# The PAPER_TYPE tag can be used to set the paper type that is used by the\n# printer.\n# Possible values are: a4 (210 x 297 mm), letter (8.5 x 11 inches), legal (8.5 x\n# 14 inches) and executive (7.25 x 10.5 inches).\n# The default value is: a4.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nPAPER_TYPE             = letter\n\n# The EXTRA_PACKAGES tag can be used to specify one or more LaTeX package names\n# that should be included in the LaTeX output. To get the times font for\n# instance you can specify\n# EXTRA_PACKAGES=times\n# If left blank no extra packages will be included.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nEXTRA_PACKAGES         =\n\n# The LATEX_HEADER tag can be used to specify a personal LaTeX header for the\n# generated LaTeX document. The header should contain everything until the first\n# chapter. If it is left blank doxygen will generate a standard header. See\n# section \"Doxygen usage\" for information on how to let doxygen write the\n# default header to a separate file.\n#\n# Note: Only use a user-defined header if you know what you are doing! The\n# following commands have a special meaning inside the header: $title,\n# $datetime, $date, $doxygenversion, $projectname, $projectnumber,\n# $projectbrief, $projectlogo. Doxygen will replace $title with the empty\n# string, for the replacement values of the other commands the user is referred\n# to HTML_HEADER.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_HEADER           =\n\n# The LATEX_FOOTER tag can be used to specify a personal LaTeX footer for the\n# generated LaTeX document. The footer should contain everything after the last\n# chapter. If it is left blank doxygen will generate a standard footer. See\n# LATEX_HEADER for more information on how to generate a default footer and what\n# special commands can be used inside the footer.\n#\n# Note: Only use a user-defined footer if you know what you are doing!\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_FOOTER           =\n\n# The LATEX_EXTRA_STYLESHEET tag can be used to specify additional user-defined\n# LaTeX style sheets that are included after the standard style sheets created\n# by doxygen. Using this option one can overrule certain style aspects. Doxygen\n# will copy the style sheet files to the output directory.\n# Note: The order of the extra style sheet files is of importance (e.g. the last\n# style sheet in the list overrules the setting of the previous ones in the\n# list).\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_EXTRA_STYLESHEET =\n\n# The LATEX_EXTRA_FILES tag can be used to specify one or more extra images or\n# other source files which should be copied to the LATEX_OUTPUT output\n# directory. Note that the files will be copied as-is; there are no commands or\n# markers available.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_EXTRA_FILES      =\n\n# If the PDF_HYPERLINKS tag is set to YES, the LaTeX that is generated is\n# prepared for conversion to PDF (using ps2pdf or pdflatex). The PDF file will\n# contain links (just like the HTML output) instead of page references. This\n# makes the output suitable for online browsing using a PDF viewer.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nPDF_HYPERLINKS         = YES\n\n# If the USE_PDFLATEX tag is set to YES, doxygen will use pdflatex to generate\n# the PDF file directly from the LaTeX files. Set this option to YES, to get a\n# higher quality PDF documentation.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nUSE_PDFLATEX           = YES\n\n# If the LATEX_BATCHMODE tag is set to YES, doxygen will add the \\batchmode\n# command to the generated LaTeX files. This will instruct LaTeX to keep running\n# if errors occur, instead of asking the user for help. This option is also used\n# when generating formulas in HTML.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_BATCHMODE        = YES\n\n# If the LATEX_HIDE_INDICES tag is set to YES then doxygen will not include the\n# index chapters (such as File Index, Compound Index, etc.) in the output.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_HIDE_INDICES     = NO\n\n# If the LATEX_SOURCE_CODE tag is set to YES then doxygen will include source\n# code with syntax highlighting in the LaTeX output.\n#\n# Note that which sources are shown also depends on other settings such as\n# SOURCE_BROWSER.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_SOURCE_CODE      = NO\n\n# The LATEX_BIB_STYLE tag can be used to specify the style to use for the\n# bibliography, e.g. plainnat, or ieeetr. See\n# http://en.wikipedia.org/wiki/BibTeX and \\cite for more info.\n# The default value is: plain.\n# This tag requires that the tag GENERATE_LATEX is set to YES.\n\nLATEX_BIB_STYLE        = plain\n\n#---------------------------------------------------------------------------\n# Configuration options related to the RTF output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_RTF tag is set to YES, doxygen will generate RTF output. The\n# RTF output is optimized for Word 97 and may not look too pretty with other RTF\n# readers/editors.\n# The default value is: NO.\n\nGENERATE_RTF           = NO\n\n# The RTF_OUTPUT tag is used to specify where the RTF docs will be put. If a\n# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of\n# it.\n# The default directory is: rtf.\n# This tag requires that the tag GENERATE_RTF is set to YES.\n\nRTF_OUTPUT             = rtf\n\n# If the COMPACT_RTF tag is set to YES, doxygen generates more compact RTF\n# documents. This may be useful for small projects and may help to save some\n# trees in general.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_RTF is set to YES.\n\nCOMPACT_RTF            = NO\n\n# If the RTF_HYPERLINKS tag is set to YES, the RTF that is generated will\n# contain hyperlink fields. The RTF file will contain links (just like the HTML\n# output) instead of page references. This makes the output suitable for online\n# browsing using Word or some other Word compatible readers that support those\n# fields.\n#\n# Note: WordPad (write) and others do not support links.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_RTF is set to YES.\n\nRTF_HYPERLINKS         = NO\n\n# Load stylesheet definitions from file. Syntax is similar to doxygen's config\n# file, i.e. a series of assignments. You only have to provide replacements,\n# missing definitions are set to their default value.\n#\n# See also section \"Doxygen usage\" for information on how to generate the\n# default style sheet that doxygen normally uses.\n# This tag requires that the tag GENERATE_RTF is set to YES.\n\nRTF_STYLESHEET_FILE    =\n\n# Set optional variables used in the generation of an RTF document. Syntax is\n# similar to doxygen's config file. A template extensions file can be generated\n# using doxygen -e rtf extensionFile.\n# This tag requires that the tag GENERATE_RTF is set to YES.\n\nRTF_EXTENSIONS_FILE    =\n\n# If the RTF_SOURCE_CODE tag is set to YES then doxygen will include source code\n# with syntax highlighting in the RTF output.\n#\n# Note that which sources are shown also depends on other settings such as\n# SOURCE_BROWSER.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_RTF is set to YES.\n\nRTF_SOURCE_CODE        = NO\n\n#---------------------------------------------------------------------------\n# Configuration options related to the man page output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_MAN tag is set to YES, doxygen will generate man pages for\n# classes and files.\n# The default value is: NO.\n\nGENERATE_MAN           = YES\n\n# The MAN_OUTPUT tag is used to specify where the man pages will be put. If a\n# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of\n# it. A directory man3 will be created inside the directory specified by\n# MAN_OUTPUT.\n# The default directory is: man.\n# This tag requires that the tag GENERATE_MAN is set to YES.\n\nMAN_OUTPUT             = man\n\n# The MAN_EXTENSION tag determines the extension that is added to the generated\n# man pages. In case the manual section does not start with a number, the number\n# 3 is prepended. The dot (.) at the beginning of the MAN_EXTENSION tag is\n# optional.\n# The default value is: .3.\n# This tag requires that the tag GENERATE_MAN is set to YES.\n\nMAN_EXTENSION          = .3\n\n# The MAN_SUBDIR tag determines the name of the directory created within\n# MAN_OUTPUT in which the man pages are placed. If defaults to man followed by\n# MAN_EXTENSION with the initial . removed.\n# This tag requires that the tag GENERATE_MAN is set to YES.\n\nMAN_SUBDIR             =\n\n# If the MAN_LINKS tag is set to YES and doxygen generates man output, then it\n# will generate one additional man file for each entity documented in the real\n# man page(s). These additional files only source the real man page, but without\n# them the man command would be unable to find the correct page.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_MAN is set to YES.\n\nMAN_LINKS              = YES\n\n#---------------------------------------------------------------------------\n# Configuration options related to the XML output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_XML tag is set to YES, doxygen will generate an XML file that\n# captures the structure of the code including all documentation.\n# The default value is: NO.\n\nGENERATE_XML           = NO\n\n# The XML_OUTPUT tag is used to specify where the XML pages will be put. If a\n# relative path is entered the value of OUTPUT_DIRECTORY will be put in front of\n# it.\n# The default directory is: xml.\n# This tag requires that the tag GENERATE_XML is set to YES.\n\nXML_OUTPUT             = xml\n\n# If the XML_PROGRAMLISTING tag is set to YES, doxygen will dump the program\n# listings (including syntax highlighting and cross-referencing information) to\n# the XML output. Note that enabling this will significantly increase the size\n# of the XML output.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_XML is set to YES.\n\nXML_PROGRAMLISTING     = YES\n\n#---------------------------------------------------------------------------\n# Configuration options related to the DOCBOOK output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_DOCBOOK tag is set to YES, doxygen will generate Docbook files\n# that can be used to generate PDF.\n# The default value is: NO.\n\nGENERATE_DOCBOOK       = NO\n\n# The DOCBOOK_OUTPUT tag is used to specify where the Docbook pages will be put.\n# If a relative path is entered the value of OUTPUT_DIRECTORY will be put in\n# front of it.\n# The default directory is: docbook.\n# This tag requires that the tag GENERATE_DOCBOOK is set to YES.\n\nDOCBOOK_OUTPUT         = docbook\n\n# If the DOCBOOK_PROGRAMLISTING tag is set to YES, doxygen will include the\n# program listings (including syntax highlighting and cross-referencing\n# information) to the DOCBOOK output. Note that enabling this will significantly\n# increase the size of the DOCBOOK output.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_DOCBOOK is set to YES.\n\nDOCBOOK_PROGRAMLISTING = NO\n\n#---------------------------------------------------------------------------\n# Configuration options for the AutoGen Definitions output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_AUTOGEN_DEF tag is set to YES, doxygen will generate an\n# AutoGen Definitions (see http://autogen.sf.net) file that captures the\n# structure of the code including all documentation. Note that this feature is\n# still experimental and incomplete at the moment.\n# The default value is: NO.\n\nGENERATE_AUTOGEN_DEF   = NO\n\n#---------------------------------------------------------------------------\n# Configuration options related to the Perl module output\n#---------------------------------------------------------------------------\n\n# If the GENERATE_PERLMOD tag is set to YES, doxygen will generate a Perl module\n# file that captures the structure of the code including all documentation.\n#\n# Note that this feature is still experimental and incomplete at the moment.\n# The default value is: NO.\n\nGENERATE_PERLMOD       = NO\n\n# If the PERLMOD_LATEX tag is set to YES, doxygen will generate the necessary\n# Makefile rules, Perl scripts and LaTeX code to be able to generate PDF and DVI\n# output from the Perl module output.\n# The default value is: NO.\n# This tag requires that the tag GENERATE_PERLMOD is set to YES.\n\nPERLMOD_LATEX          = NO\n\n# If the PERLMOD_PRETTY tag is set to YES, the Perl module output will be nicely\n# formatted so it can be parsed by a human reader. This is useful if you want to\n# understand what is going on. On the other hand, if this tag is set to NO, the\n# size of the Perl module output will be much smaller and Perl will parse it\n# just the same.\n# The default value is: YES.\n# This tag requires that the tag GENERATE_PERLMOD is set to YES.\n\nPERLMOD_PRETTY         = YES\n\n# The names of the make variables in the generated doxyrules.make file are\n# prefixed with the string contained in PERLMOD_MAKEVAR_PREFIX. This is useful\n# so different doxyrules.make files included by the same Makefile don't\n# overwrite each other's variables.\n# This tag requires that the tag GENERATE_PERLMOD is set to YES.\n\nPERLMOD_MAKEVAR_PREFIX =\n\n#---------------------------------------------------------------------------\n# Configuration options related to the preprocessor\n#---------------------------------------------------------------------------\n\n# If the ENABLE_PREPROCESSING tag is set to YES, doxygen will evaluate all\n# C-preprocessor directives found in the sources and include files.\n# The default value is: YES.\n\nENABLE_PREPROCESSING   = YES\n\n# If the MACRO_EXPANSION tag is set to YES, doxygen will expand all macro names\n# in the source code. If set to NO, only conditional compilation will be\n# performed. Macro expansion can be done in a controlled way by setting\n# EXPAND_ONLY_PREDEF to YES.\n# The default value is: NO.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nMACRO_EXPANSION        = NO\n\n# If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES then\n# the macro expansion is limited to the macros specified with the PREDEFINED and\n# EXPAND_AS_DEFINED tags.\n# The default value is: NO.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nEXPAND_ONLY_PREDEF     = NO\n\n# If the SEARCH_INCLUDES tag is set to YES, the include files in the\n# INCLUDE_PATH will be searched if a #include is found.\n# The default value is: YES.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nSEARCH_INCLUDES        = YES\n\n# The INCLUDE_PATH tag can be used to specify one or more directories that\n# contain include files that are not input files but should be processed by the\n# preprocessor.\n# This tag requires that the tag SEARCH_INCLUDES is set to YES.\n\nINCLUDE_PATH           =\n\n# You can use the INCLUDE_FILE_PATTERNS tag to specify one or more wildcard\n# patterns (like *.h and *.hpp) to filter out the header-files in the\n# directories. If left blank, the patterns specified with FILE_PATTERNS will be\n# used.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nINCLUDE_FILE_PATTERNS  =\n\n# The PREDEFINED tag can be used to specify one or more macro names that are\n# defined before the preprocessor is started (similar to the -D option of e.g.\n# gcc). The argument of the tag is a list of macros of the form: name or\n# name=definition (no spaces). If the definition and the \"=\" are omitted, \"=1\"\n# is assumed. To prevent a macro definition from being undefined via #undef or\n# recursively expanded use the := operator instead of the = operator.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nPREDEFINED             = APOP_NO_VARIADIC\n\n# If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this\n# tag can be used to specify a list of macro names that should be expanded. The\n# macro definition that is found in the sources will be used. Use the PREDEFINED\n# tag if you want to use a different macro definition that overrules the\n# definition found in the source code.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nEXPAND_AS_DEFINED      =\n\n# If the SKIP_FUNCTION_MACROS tag is set to YES then doxygen's preprocessor will\n# remove all references to function-like macros that are alone on a line, have\n# an all uppercase name, and do not end with a semicolon. Such function macros\n# are typically used for boiler-plate code, and will confuse the parser if not\n# removed.\n# The default value is: YES.\n# This tag requires that the tag ENABLE_PREPROCESSING is set to YES.\n\nSKIP_FUNCTION_MACROS   = NO\n\n#---------------------------------------------------------------------------\n# Configuration options related to external references\n#---------------------------------------------------------------------------\n\n# The TAGFILES tag can be used to specify one or more tag files. For each tag\n# file the location of the external documentation should be added. The format of\n# a tag file without this location is as follows:\n# TAGFILES = file1 file2 ...\n# Adding location for the tag files is done as follows:\n# TAGFILES = file1=loc1 \"file2 = loc2\" ...\n# where loc1 and loc2 can be relative or absolute paths or URLs. See the\n# section \"Linking to external documentation\" for more information about the use\n# of tag files.\n# Note: Each tag file must have a unique name (where the name does NOT include\n# the path). If a tag file is not located in the directory in which doxygen is\n# run, you must also specify the path to the tagfile here.\n\nTAGFILES               =\n\n# When a file name is specified after GENERATE_TAGFILE, doxygen will create a\n# tag file that is based on the input files it reads. See section \"Linking to\n# external documentation\" for more information about the usage of tag files.\n\nGENERATE_TAGFILE       =\n\n# If the ALLEXTERNALS tag is set to YES, all external class will be listed in\n# the class index. If set to NO, only the inherited external classes will be\n# listed.\n# The default value is: NO.\n\nALLEXTERNALS           = NO\n\n# If the EXTERNAL_GROUPS tag is set to YES, all external groups will be listed\n# in the modules index. If set to NO, only the current project's groups will be\n# listed.\n# The default value is: YES.\n\nEXTERNAL_GROUPS        = YES\n\n# If the EXTERNAL_PAGES tag is set to YES, all external pages will be listed in\n# the related pages index. If set to NO, only the current project's pages will\n# be listed.\n# The default value is: YES.\n\nEXTERNAL_PAGES         = YES\n\n# The PERL_PATH should be the absolute path and name of the perl script\n# interpreter (i.e. the result of 'which perl').\n# The default file (with absolute path) is: /usr/bin/perl.\n\nPERL_PATH              = /usr/bin/perl\n\n#---------------------------------------------------------------------------\n# Configuration options related to the dot tool\n#---------------------------------------------------------------------------\n\n# If the CLASS_DIAGRAMS tag is set to YES, doxygen will generate a class diagram\n# (in HTML and LaTeX) for classes with base or super classes. Setting the tag to\n# NO turns the diagrams off. Note that this option also works with HAVE_DOT\n# disabled, but it is recommended to install and use dot, since it yields more\n# powerful graphs.\n# The default value is: YES.\n\nCLASS_DIAGRAMS         = NO\n\n# You can define message sequence charts within doxygen comments using the \\msc\n# command. Doxygen will then run the mscgen tool (see:\n# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the\n# documentation. The MSCGEN_PATH tag allows you to specify the directory where\n# the mscgen tool resides. If left empty the tool is assumed to be found in the\n# default search path.\n\nMSCGEN_PATH            =\n\n# You can include diagrams made with dia in doxygen documentation. Doxygen will\n# then run dia to produce the diagram and insert it in the documentation. The\n# DIA_PATH tag allows you to specify the directory where the dia binary resides.\n# If left empty dia is assumed to be found in the default search path.\n\nDIA_PATH               =\n\n# If set to YES the inheritance and collaboration graphs will hide inheritance\n# and usage relations if the target is undocumented or is not a class.\n# The default value is: YES.\n\nHIDE_UNDOC_RELATIONS   = YES\n\n# If you set the HAVE_DOT tag to YES then doxygen will assume the dot tool is\n# available from the path. This tool is part of Graphviz (see:\n# http://www.graphviz.org/), a graph visualization toolkit from AT&T and Lucent\n# Bell Labs. The other options in this section have no effect if this option is\n# set to NO\n# The default value is: NO.\n\nHAVE_DOT               = NO\n\n# The DOT_NUM_THREADS specifies the number of dot invocations doxygen is allowed\n# to run in parallel. When set to 0 doxygen will base this on the number of\n# processors available in the system. You can set it explicitly to a value\n# larger than 0 to get control over the balance between CPU load and processing\n# speed.\n# Minimum value: 0, maximum value: 32, default value: 0.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_NUM_THREADS        = 0\n\n# When you want a differently looking font in the dot files that doxygen\n# generates you can specify the font name using DOT_FONTNAME. You need to make\n# sure dot is able to find the font, which can be done by putting it in a\n# standard location or by setting the DOTFONTPATH environment variable or by\n# setting DOT_FONTPATH to the directory containing the font.\n# The default value is: Helvetica.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_FONTNAME           = Helvetica\n\n# The DOT_FONTSIZE tag can be used to set the size (in points) of the font of\n# dot graphs.\n# Minimum value: 4, maximum value: 24, default value: 10.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_FONTSIZE           = 10\n\n# By default doxygen will tell dot to use the default font as specified with\n# DOT_FONTNAME. If you specify a different font using DOT_FONTNAME you can set\n# the path where dot can find it using this tag.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_FONTPATH           =\n\n# If the CLASS_GRAPH tag is set to YES then doxygen will generate a graph for\n# each documented class showing the direct and indirect inheritance relations.\n# Setting this tag to YES will force the CLASS_DIAGRAMS tag to NO.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nCLASS_GRAPH            = NO\n\n# If the COLLABORATION_GRAPH tag is set to YES then doxygen will generate a\n# graph for each documented class showing the direct and indirect implementation\n# dependencies (inheritance, containment, and class references variables) of the\n# class with other documented classes.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nCOLLABORATION_GRAPH    = NO\n\n# If the GROUP_GRAPHS tag is set to YES then doxygen will generate a graph for\n# groups, showing the direct groups dependencies.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nGROUP_GRAPHS           = YES\n\n# If the UML_LOOK tag is set to YES, doxygen will generate inheritance and\n# collaboration diagrams in a style similar to the OMG's Unified Modeling\n# Language.\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nUML_LOOK               = NO\n\n# If the UML_LOOK tag is enabled, the fields and methods are shown inside the\n# class node. If there are many fields or methods and many nodes the graph may\n# become too big to be useful. The UML_LIMIT_NUM_FIELDS threshold limits the\n# number of items for each type to make the size more manageable. Set this to 0\n# for no limit. Note that the threshold may be exceeded by 50% before the limit\n# is enforced. So when you set the threshold to 10, up to 15 fields may appear,\n# but if the number exceeds 15, the total amount of fields shown is limited to\n# 10.\n# Minimum value: 0, maximum value: 100, default value: 10.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nUML_LIMIT_NUM_FIELDS   = 10\n\n# If the TEMPLATE_RELATIONS tag is set to YES then the inheritance and\n# collaboration graphs will show the relations between templates and their\n# instances.\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nTEMPLATE_RELATIONS     = NO\n\n# If the INCLUDE_GRAPH, ENABLE_PREPROCESSING and SEARCH_INCLUDES tags are set to\n# YES then doxygen will generate a graph for each documented file showing the\n# direct and indirect include dependencies of the file with other documented\n# files.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nINCLUDE_GRAPH          = NO\n\n# If the INCLUDED_BY_GRAPH, ENABLE_PREPROCESSING and SEARCH_INCLUDES tags are\n# set to YES then doxygen will generate a graph for each documented file showing\n# the direct and indirect include dependencies of the file with other documented\n# files.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nINCLUDED_BY_GRAPH      = NO\n\n# If the CALL_GRAPH tag is set to YES then doxygen will generate a call\n# dependency graph for every global function or class method.\n#\n# Note that enabling this option will significantly increase the time of a run.\n# So in most cases it will be better to enable call graphs for selected\n# functions only using the \\callgraph command.\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nCALL_GRAPH             = NO\n\n# If the CALLER_GRAPH tag is set to YES then doxygen will generate a caller\n# dependency graph for every global function or class method.\n#\n# Note that enabling this option will significantly increase the time of a run.\n# So in most cases it will be better to enable caller graphs for selected\n# functions only using the \\callergraph command.\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nCALLER_GRAPH           = NO\n\n# If the GRAPHICAL_HIERARCHY tag is set to YES then doxygen will graphical\n# hierarchy of all classes instead of a textual one.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nGRAPHICAL_HIERARCHY    = YES\n\n# If the DIRECTORY_GRAPH tag is set to YES then doxygen will show the\n# dependencies a directory has on other directories in a graphical way. The\n# dependency relations are determined by the #include relations between the\n# files in the directories.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDIRECTORY_GRAPH        = YES\n\n# The DOT_IMAGE_FORMAT tag can be used to set the image format of the images\n# generated by dot.\n# Note: If you choose svg you need to set HTML_FILE_EXTENSION to xhtml in order\n# to make the SVG files visible in IE 9+ (other browsers do not have this\n# requirement).\n# Possible values are: png, jpg, gif and svg.\n# The default value is: png.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_IMAGE_FORMAT       = png\n\n# If DOT_IMAGE_FORMAT is set to svg, then this option can be set to YES to\n# enable generation of interactive SVG images that allow zooming and panning.\n#\n# Note that this requires a modern browser other than Internet Explorer. Tested\n# and working are Firefox, Chrome, Safari, and Opera.\n# Note: For IE 9+ you need to set HTML_FILE_EXTENSION to xhtml in order to make\n# the SVG files visible. Older versions of IE do not have SVG support.\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nINTERACTIVE_SVG        = NO\n\n# The DOT_PATH tag can be used to specify the path where the dot tool can be\n# found. If left blank, it is assumed the dot tool can be found in the path.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_PATH               =\n\n# The DOTFILE_DIRS tag can be used to specify one or more directories that\n# contain dot files that are included in the documentation (see the \\dotfile\n# command).\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOTFILE_DIRS           =\n\n# The MSCFILE_DIRS tag can be used to specify one or more directories that\n# contain msc files that are included in the documentation (see the \\mscfile\n# command).\n\nMSCFILE_DIRS           =\n\n# The DIAFILE_DIRS tag can be used to specify one or more directories that\n# contain dia files that are included in the documentation (see the \\diafile\n# command).\n\nDIAFILE_DIRS           =\n\n# When using plantuml, the PLANTUML_JAR_PATH tag should be used to specify the\n# path where java can find the plantuml.jar file. If left blank, it is assumed\n# PlantUML is not used or called during a preprocessing step. Doxygen will\n# generate a warning when it encounters a \\startuml command in this case and\n# will not generate output for the diagram.\n\nPLANTUML_JAR_PATH      =\n\n# When using plantuml, the specified paths are searched for files specified by\n# the !include statement in a plantuml block.\n\nPLANTUML_INCLUDE_PATH  =\n\n# The DOT_GRAPH_MAX_NODES tag can be used to set the maximum number of nodes\n# that will be shown in the graph. If the number of nodes in a graph becomes\n# larger than this value, doxygen will truncate the graph, which is visualized\n# by representing a node as a red box. Note that doxygen if the number of direct\n# children of the root node in a graph is already larger than\n# DOT_GRAPH_MAX_NODES then the graph will not be shown at all. Also note that\n# the size of a graph can be further restricted by MAX_DOT_GRAPH_DEPTH.\n# Minimum value: 0, maximum value: 10000, default value: 50.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_GRAPH_MAX_NODES    = 50\n\n# The MAX_DOT_GRAPH_DEPTH tag can be used to set the maximum depth of the graphs\n# generated by dot. A depth value of 3 means that only nodes reachable from the\n# root by following a path via at most 3 edges will be shown. Nodes that lay\n# further from the root node will be omitted. Note that setting this option to 1\n# or 2 may greatly reduce the computation time needed for large code bases. Also\n# note that the size of a graph can be further restricted by\n# DOT_GRAPH_MAX_NODES. Using a depth of 0 means no depth restriction.\n# Minimum value: 0, maximum value: 1000, default value: 0.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nMAX_DOT_GRAPH_DEPTH    = 0\n\n# Set the DOT_TRANSPARENT tag to YES to generate images with a transparent\n# background. This is disabled by default, because dot on Windows does not seem\n# to support this out of the box.\n#\n# Warning: Depending on the platform used, enabling this option may lead to\n# badly anti-aliased labels on the edges of a graph (i.e. they become hard to\n# read).\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_TRANSPARENT        = YES\n\n# Set the DOT_MULTI_TARGETS tag to YES to allow dot to generate multiple output\n# files in one run (i.e. multiple -o and -T options on the command line). This\n# makes dot run faster, but since only newer versions of dot (>1.8.10) support\n# this, this feature is disabled by default.\n# The default value is: NO.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_MULTI_TARGETS      = NO\n\n# If the GENERATE_LEGEND tag is set to YES doxygen will generate a legend page\n# explaining the meaning of the various boxes and arrows in the dot generated\n# graphs.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nGENERATE_LEGEND        = YES\n\n# If the DOT_CLEANUP tag is set to YES, doxygen will remove the intermediate dot\n# files that are used to generate the various graphs.\n# The default value is: YES.\n# This tag requires that the tag HAVE_DOT is set to YES.\n\nDOT_CLEANUP            = YES\n"
  },
  {
    "path": "docs/edit_globals.sed",
    "content": "/index_x/s/- x -/ /\n/index_a/s/- a -/ /\n/Here is a list of all/d\n/<span>[a-z]<\\/span><\\/a><\\/li>/d\n/div class=\"contents\"/aSee also the <a class=\"el\" href=\"group__models.html\">models</a> and <a class=\"el\" href=\"group__settings.html\">settings</a> pages.\n"
  },
  {
    "path": "docs/edit_group.sed",
    "content": "s/Enumeration Type Documentation/Model Documentation/\ns/<h2>Enumerations/<h2>Models/\n/<h2>Models/,/<h2>Functions/{\n    s/[{}]//\n    s/<br\\/>$//\n    s/<li>enum/<li>apop_model/\n    s/.*>(Overview|Name|Input_format|Estimate_results|Predict|RNG|CDF|Exampe|Settings)[+\\]*_[+\\]*x[0-9].*x[+\\]*_//\n}\ns/<b>Enumerator: <\\/b>//\ns/.item\\[Enumerator\\].par//\ns/^ enum//\ns/\\\\setlength{\\\\rightskip}{0pt plus 5cm}enum/\\\\setlength{\\\\rightskip}{0pt plus 5cm}/\ns|<tr><th colspan=\"2\">Enumerator</th></tr>||\ns/<td class=\"memname\">enum <a class=\"el\"/<td class=\"memname\">apop_model <a class=\"el\"/\ns/[\\+]*_[\\+]*x[0-9]*x[_\\+]*[_+]\\+//g\n#s/model_specific/Methods are (D)efault<br> or (M)odel-specific/g\n#s/model[+\\]*_[+\\]*specific/\\\\hbox{(D)efault\\/(M)odel-specific}/g\ns/[eE]stimate[+\\]*_[+\\]*results/Post-estimate/g\ns/[iI]nput[+\\]*_[+\\]*format/Input format/g\ns/[pP]ostestimate[+\\]*_[+\\]*\\(data\\|parameters\\|settings\\|info\\)/Post-estimate \\1/g\ns/[pP]ostestimate[+\\]*_[+\\]*parameter_model/Post-estimate parameter model/g\ns/[pP]arameter[+\\]*_[+\\]*format/Parameter format/g\ns/[pP]rep[+\\]*_routine/Prep routine/g\n#delete all between the two markers, but not the second marker\n#/name=\"enum-members\"/,/name=\"func-members\"/{/name=\"func-members\"/!d}\n#/<div class=\"summary\">/,/\\#func-members/d\n\ns/Log likelihood &/LL \\&/\ns/Prep routine &/Prep \\&/\n"
  },
  {
    "path": "docs/edit_outline.sed",
    "content": "s|Outlineheader \\([^ ]*\\)\\(.*\\)</p>|<h2><a class=\"anchor\" name=\"\\1\"><div class=\"trigger\" onClick=\"showBranch('\\1d');swapFolder('\\1f')\"><img src=\"right.png\" border=\"0\" id=\"\\1f\" alt=\"pip\">\\2</div></a></h2><div class=\"branch\" id=\"\\1d\">|\ns|endofdiv</p>|</div>|\ns|ALLBUTTON|<span class=\"trigger\" onClick=\"showAll();\"<a>Expand all </a></span> \\| <span class=\"trigger\" onClick=\"closeAll();\"<a>Collapse all </a></span>|\n"
  },
  {
    "path": "docs/edit_width.sed",
    "content": "s|<td width=\"100%\"></td>||g\n"
  },
  {
    "path": "docs/foot.html",
    "content": "</body></html>\n"
  },
  {
    "path": "docs/head.html",
    "content": "<!-- HTML header for doxygen 1.8.9.1-->\n<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\n<html xmlns=\"http://www.w3.org/1999/xhtml\">\n<head>\n<meta http-equiv=\"Content-Type\" content=\"text/xhtml;charset=UTF-8\"/>\n<meta http-equiv=\"X-UA-Compatible\" content=\"IE=9\"/>\n<meta name=\"generator\" content=\"Doxygen $doxygenversion\"/>\n     <title>Apophenia: a library for scientific computing</title>\n<link href=\"$relpath^tabs.css\" rel=\"stylesheet\" type=\"text/css\"/>\n     <link rel=\"stylesheet\" href=\"typical.css\" type=\"text/css\" >\n<script type=\"text/javascript\" src=\"$relpath^jquery.js\"></script>\n<script type=\"text/javascript\" src=\"$relpath^dynsections.js\"></script>\n\n<!-- Google is watching. -->\n<script src=\"http://www.google-analytics.com/urchin.js\" type=\"text/javascript\">\n</script>\n<script type=\"text/javascript\">\n_uacct = \"UA-134313-2\";\nurchinTracker();\n</script>\n\n$treeview\n$search\n$mathjax\n<link href=\"$relpath^$stylesheet\" rel=\"stylesheet\" type=\"text/css\" />\n$extrastylesheet\n</head>\n<body>\n<div id=\"top\"><!-- do not remove this div, it is closed by doxygen! -->\n\n<!--BEGIN TITLEAREA-->\n<div id=\"titlearea\">\n\n<center><table cellpadding=10pt>\n<tr> <td><img width=140px src=flake.gif alt=\"Patterns in static\"></td>\n<td><table>\n    <tr> <td><center><h2><a href=\"http://apophenia.info\">Apophenia</a></h2></center></td></tr>\n    <tr><td><div class=\"qindex\"> <a class=\"qindex\" href=\"outline.html\">Outline</a> | <a class=\"qindex\" href=\"group__all__public.html\">Index </a>  | $searchbox \n                <!-- <a class=\"qindex\" href=\"files.html\">File&nbsp;List&nbsp;</a> -->  </div></td></tr></table>\n                                  </td></tr></table></center>\n\n   </div>\n   <!--BEGIN PROJECT_BRIEF--><div id=\"projectbrief\">$projectbrief</div><!--END PROJECT_BRIEF-->\n  </td>\n  <!--END PROJECT_NAME-->\n  <!--BEGIN !PROJECT_NAME-->\n   <!--BEGIN PROJECT_BRIEF-->\n    <td style=\"padding-left: 0.5em;\">\n    <div id=\"projectbrief\">$projectbrief</div>\n    </td>\n   <!--END PROJECT_BRIEF-->\n  <!--END !PROJECT_NAME-->\n  <!--END DISABLE_INDEX-->\n </tr>\n </tbody>\n</table>\n</div>\n<!--END TITLEAREA-->\n<!-- end header part -->\n"
  },
  {
    "path": "docs/make_model_doc.awk",
    "content": "#!/usr/bin/awk\n\n# The goal: uniform model documentation for all the models that ship with Apophenia.\n# \n# Doxygen provides pretty limited support for new types of documentation. It\n# has the set of documentation types it does well, and that's that. So, this hack takes\n# marks in the code and converts them into one documented enum block for each model,\n# which Doxygen will then make beautiful.\n# \n# We'll start with something like\n# /* \\amodel apop_beta The Beta distribution.\n# \n#          The beta distribution has two parameters and is restricted between zero and\n#          one. You may also find \\ref apop_beta_from_mean_var to be useful.\n# \n# \\adoc    Input_format  Any arrangement of scalar values. \n# \\adoc    Parameter_format   a vector, v[0]=\\f$\\alpha\\f$; v[1]=\\f$\\beta\\f$    \n# \\adoc    Settings None. \n# */\n# [...]\n# /* \\adoc Name  <tt>Beta distribution</tt>  */\n# \n# and write out something like:\n# \n# /** \n#  The Beta distribution.\n# \n#          The beta distribution has two parameters and is restricted between zero and\n#          one. You may also find \\ref apop_beta_from_mean_var to be useful.\n# \n#  \\hideinitializer \\ingroup models */\n# enum apop_beta {\n# Name_x2x_, /**<   <tt>Beta distribution</tt> */\n# Input_format_x2x_, /**<   Any arrangement of scalar values. */\n# Parameter_format_x2x_, /**<    a vector, v[0]=\\f$\\alpha\\f$; v[1]=\\f$\\beta\\f$    */\n# RNG_x2x_, /**<   Produces a scalar \\f$\\in[0,1]\\f$. */\n# };\n# \n# I added the _x#_x because doxygen merges enums together otherwise. There's a sed script\n# (edit_group) that turns the enum documentation back into apop_model struct documentation.\n# \n# The awk script can enforce some uniformity, forcing the same order in things, and writing\n# file named missing, that will list documentation bits that aren't present.\n\nBEGIN {\n    IGNORECASE=1\n    doc_parts[1]=\"Name\";\n    doc_parts[2]=\"Input_format\"\n    doc_parts[3]=\"Prep_routine\"\n    doc_parts[4]=\"Parameter_format\"\n    doc_parts[5]=\"Estimate_results\"\n    doc_parts[6]=\"Predict\"\n    doc_parts[7]=\"RNG\"\n    doc_parts[8]=\"CDF\"\n    doc_parts[9]=\"Settings\"\n    doc_parts[10]=\"Examples\"\n    doc_part_count=10\n    model_count = 0\n       #print > \"missing\"\n}\n\nin_doc==1 && !/\\\\a[model|doc]/ {\n        if (sub(\"\\\\*/\",\"\"))\n            in_doc=0\n        items[current_model \":\" current_item] = items[current_model \":\" current_item] \"\\n\" $0 \n    }\n\n/\\\\amodel/ {\n    sub(\"/\\\\*[ \\\\t]*\",\"\", $0) #cut /* (if any).\n    sub(\".*\\\\\\\\amodel\",\"\", $0) #cut \\amodel, now that I know what it is.\n    current_model = $1\n    in_doc=1\n    oh = $0\n    if (!models[current_model]) {\n        models[current_model]=current_model\n        model_count ++\n    }\n    sub(current_model,\"\", oh)\n    sub(\"\\\\*/\",\"\", oh)\n    sub(\"^[ \\\\t]\",\"\", oh)\n    current_item = \"intro\"\n    items[current_model \":\" current_item] =  oh\n    }\n\n/\\\\adoc/ {\n    sub(\"/\\\\*[ \\\\t]*\",\"\", $0) #cut /* (if any).\n    sub(\"\\\\\\\\adoc[ \\\\t]*\",\"\", $0) #cut \\adoc, now that I know what it is.\n    oh = $0\n    current_item = $1\n    sub($1,\"\", oh)\n    if (!sub(\"\\\\*/\",\"\", oh))\n        in_doc = 1\n    items[current_model \":\" current_item] = oh\n    }\n\n/\\*\\// { in_doc = 0 }\n\n\n#the declaration [like apop_model new_model = {\"new model\", .p=prob, .score=deriv}; ] tells us which struct elements are actually used.\n/apop_model[ \\t]*[^ \\t]*[ \\t]*=.*{/ { in_decl=1 }\n\nin_decl == 1 { cp = $0;\n    if (match(cp, \"\\\"([^\\\"]*)\\\"\")){\n        title= substr(cp,RSTART, RLENGTH)\n        gsub( \"\\\"\",\"\",title)\n        items[current_model \":\" \"Name\"] = \"<tt>\" title \"</tt>\" \n    }\n\n    while (match(cp, \"\\\\.[^ \\\\t=]+[ \\\\t]*=\")){\n        cp = substr(cp, RSTART+1, length(cp)-RSTART)\n        match(cp, \"[ \\\\t=]\")\n        items[current_model \":has\" substr(cp, 0, RSTART-1)] = 1\n        cp = substr(cp, RSTART+1) #cut at the =, keep processing.\n    }\n}\n\n/\\\\}/ {in_decl=0;}\n/}/ {in_decl=0;}\n\nfunction onedot(name, isdefault){\n    dot = \"<td class=\\\"altcolor\\\">M</td>\"\n    defaultdot = \"<td class=\\\"memitem\\\">D</td>\"\n    if (isdefault)\n        class = \"altcolor\"\n    else \n        class = \"memitem\"\n    print \"<tr><td class=\\\"\"class\"\\\">\" name \"</td>\"; \n    if (isdefault) \n        print dot;\n    else\n        print defaultdot;\n    print \"</tr>\"\n}\n\nEND {print \"/** \\\\file */  /**\\\\defgroup models */\"\n    model_no=0\n    for (m in models){\n        model_no++\n        #m = sorted_models[model_no];\n        print \"\\n/** \"\n        if (items[m \":intro\"])\n            print items[m \":intro\"]\n        else\n            print \"!!! Without an intro, \" m \" won't print\\n\" >> \"/dev/stderr\"\n        print \" \\\\hideinitializer  \\\\ingroup models */\"\n        print \"enum \" m \" {\"\n\n#apop_model apop_normal = {\"Normal distribution\", 2, 0, 0, .dsize=1, \n# .estimate = normal_estimate, .log_likelihood = normal_log_likelihood, \n# .score = normal_dlog_likelihood,    \n# .constraint = beta_1_greater_than_x_constraint, .draw = normal_rng,\n# .cdf = normal_cdf, .predict = normal_predict}; \n\n            #dot = \"<td class=\\\"memitem\\\" >\\\\f$\\\\bullet\\\\f$</td>\"\n#            print \"model_specific_x\" model_no \"x_, /**< <table cellpadding=3px><tr><td><table class=\\\"memproto\\\">\"\n#\n#            onedot(\"Estimation\", items[m \":hasestimate\"])\n#            onedot(\"Prob.\", items[m \":hasp\"])\n#            onedot(\"Log likelihood\", items[m \":haslog_likelihood\"])\n#            onedot(\"RNG\", items[m \":hasdraw\"])\n#\n#            print \"</table></td><td><table class=\\\"memproto\\\">\"\n#\n#            onedot(\"Predict\", items[m \":haspredict\"])\n#            onedot(\"CDF\", items[m \":hascdf\"])\n#            onedot(\"Score\", items[m \":hasscore\"])\n#            onedot(\"Prep routine\", items[m \":hasprep\"])\n#\n#            print \"</table> </td></tr></table>*/\"\n        for (i=1; i<=doc_part_count;i++){\n            part=doc_parts[i]\n            #print \"processing\", m, part, \"\\n\" >> \"/dev/stderr\"\n            if (doc_parts[i]==\"Estimate_results\"){\n                print \"postestimate_data_x\" model_no \"x_, /**< \"\n                if (items[m \":estimated_data\"])\n                    print items[m \":estimated_data\"] \n                else\n                    print \"Unchanged.\"\n                print \"*/\"\n\n                if (items[m \":estimated_parameters\"]){\n                    print \"postestimate_parameters_x\" model_no \"x_, /**< \"\n                    print items[m \":estimated_parameters\"] \n                    print \"*/\"\n                }\n\n                if (items[m \":estimated_parameter_model\"]){\n                    print \"postestimate_parameter_model_x\" model_no \"x_, /**< \"\n                    print items[m \":estimated_parameter_model\"] \"</td></tr>\"\n                    print \"*/\"\n                }\n                if (items[m \":estimated_info\"]){\n                    print \"postestimate_info_x\" model_no \"x_, /**< \"\n                    print items[m \":estimated_info\"] \"</td></tr>\"\n                    print \"*/\"\n                }\n                if (items[m \":estimated_settings\"]){\n                    print \"postestimate_settings_\" model_no \"x_, /**< \"\n                    print items[m \":estimated_settings\"] \"</td></tr>\"\n                    print \"*/\"\n                }\n            }\n            else if (items[m \":\" part]) print part \"_x\" model_no \"x_, /**< \" items[m \":\" part] \"*/\"\n            else print m, part >> \"missing_model_parts\"  #not at the moment important.\n        }\n        print \"};\"\n    }\n}\n"
  },
  {
    "path": "docs/model.dot",
    "content": "\n\ndigraph {\n    node [shape=\"rect\"];\n\n\n\"info\" [\n        label = \"{Info |{name|data}|{params| vbase}|{...}}\"\n        shape = \"record\"\n        ];\n\n\"functions\" [\n        label = \"{Functions |{estimate|p}|{log_likelihood| draw}|{...}}\"\n        shape = \"record\"\n        ];\n\n\"settings\" [\n        label = \"{Settings |{name|*}|{name| *}|{...}}\"\n        shape = \"record\"\n        ];\n\n\n    model -> info;\n    model -> functions;\n    model -> settings;\n\n}\n"
  },
  {
    "path": "docs/structs.dot",
    "content": "digraph {\n    node [shape=\"rect\"];\n\n    \"model\" [\n        label = <\n        <table border=\"0\" cellborder=\"0\">\n        <tr>\n        <td align=\"center\" colspan=\"3\"> <font color=\"blue\">apop_model</font> </td></tr>\n   <tr><td>\n        <table border=\"1\" cellborder=\"0\">\n        <tr><td colspan =\"2\" align=\"center\"> [Info] </td></tr>\n        <tr><td colspan =\"2\"><font color=\"blue\"> int</font> vsize, msize1, msize2, dsize </td></tr>\n        <tr>\n                <td><font color=\"blue\"> char</font> name[101]</td>\n        <td><font color=\"blue\"> char</font> error</td>\n        </tr>\n        <tr><td port=\"dd\"><font color=\"blue\"> apop_data *</font> data, </td>\n        <td port=\"params\"><font color=\"blue\">*</font>parameters, <font color=\"blue\">*</font>info</td>\n        </tr>\n        </table>\n    </td><td>\n        <table align=\"left\" border=\"1\" cellborder=\"0\">\n        <tr><td colspan =\"2\" align=\"center\"> [Functions] </td></tr>\n        <tr><td colspan =\"2\">estimate (<font color=\"blue\">apop_data *</font> data, <font color=\"blue\">apop_model *</font>model) </td></tr>\n        <tr><td> log_likelihood(data, model) </td><td>cdf (data, model) </td></tr>\n        <tr><td>p(data, model) </td><td>constraint(data, model) </td></tr>\n        <tr><td>draw(<font color=\"blue\">double *</font> out, <font\n                color=\"blue\">gsl_rng *</font> r, model)</td><td>prep(data, model) </td></tr>\n        </table>\n    </td><td>\n        <table border=\"1\" cellborder=\"0\">\n        <tr><td align=\"center\"> [Settings] </td></tr>\n        <tr><td> <font color=\"blue\">void *</font>more </td></tr>\n        <tr><td > <font color=\"blue\">int</font> more_size </td></tr>\n        <tr><td port=\"ls\"> <font color=\"blue\">*</font>settings</td></tr>\n        </table>\n    </td></tr></table>\n>\nstyle=\"rounded\"\n];\n\n\"settings\" [ label  = <\n        <table border=\"0\" cellborder=\"0\">\n        <tr><td align=\"center\"><font color=\"blue\"> apop_settings_type </font></td></tr>\n            <tr><td> <font color=\"blue\">char *</font>name </td></tr> \n            <tr><td><font color=\"blue\">void *</font>asst </td></tr>\n            <tr><td> free(settings) </td></tr>\n            <tr><td> copy(settings) </td></tr>\n</table>>\nstyle=\"rounded\"\n];\n\n    subgraph {rank=\"same\"\n\"data\" [\n        label  = <\n        <table border=\"0\" cellborder=\"0\">\n        <tr><td colspan =\"2\" align=\"center\"><font color=\"blue\"> apop_data </font></td></tr>\n        <tr><td port=\"vv\"> <font color=\"blue\">gsl_vector *</font> vector</td>\n                <td port=\"dm\"> <font color=\"blue\">apop_data *</font>more</td>                \n                </tr>\n        <tr><td port=\"ww\"> <font color=\"blue\">gsl_vector *</font>weights</td>\n                <td port=\"mm\"> <font color=\"blue\">gsl_matrix *</font>matrix</td>\n                </tr>\n        <tr>        <td > <font color=\"blue\">char ***</font>text</td>\n            <td > <font color=\"blue\">int</font> text_size[2]</td>\n            </tr>\n        <tr><td port=\"nn\"> <font color=\"blue\">apop_name *</font>names </td> \n        <td><font color=\"blue\"> char</font> error</td>\n         </tr>\n</table>>\nstyle=\"rounded\"\n];\n\n\"info\" [ label  = < <table border=\"0\" cellborder=\"0\"> \n    <tr><td  colspan =\"2\" align=\"center\"> <font color=\"blue\">apop_data </font> </td></tr>\n    <tr><td  colspan =\"2\" align=\"center\"> ... </td></tr> \n    <tr><td  colspan =\"2\" align=\"center\" port=\"more1\"> more </td></tr> \n    </table>>\nstyle=\"rounded\"\n        ];\n\n\"infonext\" [ label  = < <table border=\"0\" cellborder=\"0\"> \n    <tr><td  colspan =\"2\" align=\"center\"> <font color=\"blue\">apop_data </font> </td></tr>\n    <tr><td  colspan =\"2\" align=\"center\"> ... </td></tr> \n    <tr><td  colspan =\"2\" align=\"center\" port=\"more1\"> more </td></tr> \n    </table>>\nstyle=\"rounded\"\n        ];\n    }\n\n\n\"gsl_matrix\" [ label  = < <table border=\"0\" cellborder=\"0\"> \n    <tr><td  align=\"center\"> <font color=\"blue\">gsl_matrix </font> </td></tr>\n    <tr><td  align=\"center\"> <font color=\"blue\">double *</font>data</td></tr> \n    <tr><td> <font color=\"blue\">int</font> size1, size2</td></tr>\n        <tr><td align=\"center\"> ... </td></tr>\n    </table>>\nstyle=\"rounded\"\n        ];\n\n\"names\" [ label  = < <table border=\"0\" cellborder=\"0\"> \n    <tr><td align=\"center\"> <font color=\"blue\">apop_name</font> </td></tr>\n    <tr><td align=\"left\"> <font color=\"blue\">char *</font>title</td></tr>\n    <tr><td align=\"left\"> <font color=\"blue\">int</font> colct, rowct, textct</td></tr>\n    <tr><td align=\"left\"> <font color=\"blue\">char *</font>vector</td></tr>\n    <tr><td align=\"left\"> <font color=\"blue\">char **</font>col, <font color=\"blue\">**</font>row, <font color=\"blue\">**</font>text</td></tr>\n    </table>>\nstyle=\"rounded\"\n        ];\n\n\n\"gsl_vector\" [ label  = < <table border=\"0\" cellborder=\"0\"> \n    <tr><td> <font color=\"blue\">gsl_vector </font> </td></tr>\n    <tr><td> <font color=\"blue\">double *</font>data</td></tr> \n    <tr><td> <font color=\"blue\">int </font> size </td></tr>\n        <tr><td colspan =\"2\" align=\"center\"> ... </td></tr>\n    </table>>\nstyle=\"rounded\"\n        ];\n\n\n#model:params:s -> data;\n    model:dd:s -> data;\n    model:ls -> settings;\ndata:mm:se->gsl_matrix;\ndata:vv:sw->gsl_vector;\ndata:ww:sw->gsl_vector;\ndata:nn->names;\n#data:dm->data:n;\ndata:dm:e->info:nw [len=0.2];\ninfo:more1:ne->infonext:nw;\n}\n\n"
  },
  {
    "path": "docs/tree.js",
    "content": "var openImg = new Image();\nopenImg.src = \"down.png\";\nvar closedImg = new Image();\nclosedImg.src = \"right.png\";\n\nfunction showBranch(branch){\n\tvar objBranch = document.getElementById(branch).style;\n\tif(objBranch.display==\"block\")\n\t\tobjBranch.display=\"none\";\n\telse\n\t\tobjBranch.display=\"block\";\n}\n\nfunction swapFolder(img){\n\tobjImg = document.getElementById(img);\n\tif(objImg.src.indexOf('right.png')>-1)\n\t\tobjImg.src = openImg.src;\n\telse\n\t\tobjImg.src = closedImg.src;\n}\n\nfunction changeSheets(whichSheet){\n    var c = document.Elements.length;\n    for(var i=0;i<c;i++)\n        var objBranch = document.getElementById(branch).style.display=\"block\";\n}\n\nfunction showAll(){\nvar arrElements = document.getElementsByClassName(\"branch\");\n        for(var i=0; i<arrElements.length; i++){\n            arrElements[i].style.display=\"block\";\n        }\n    }\n\nfunction closeAll(){\nvar arrElements = document.getElementsByClassName(\"branch\");\n        for(var i=0; i<arrElements.length; i++){\n            arrElements[i].style.display=\"none\";\n        }\n    }\n\n"
  },
  {
    "path": "docs/triangle.c",
    "content": "#include <apop.h>\n\nint main(){\n    apop_data *d = apop_data_falloc((8,3),   \n            1,  0,  0,\n           .8, .1,  0,\n           .9,  0, .1,\n           12,  4,  1,\n            0,  1,  0,\n            1,  2,  2,\n            2,  1,  2,\n            2,  2,  1);\n    apop_name_add(d->names, \"first\", 'c');\n    apop_name_add(d->names, \"second\", 'c');\n    apop_name_add(d->names, \"third\", 'c');\n    apop_plot_triangle(d, \"out.gnup\");\n}\n"
  },
  {
    "path": "docs/typical.css",
    "content": "/*.memitem:before {content:\"\\00A\\00A\\00A\\00A\"}*/\n\n.memitem {\nborder: 2px solid #a6cb5e;\nborder-radius: 8px;\npadding: 2px;\nmargin: 3px;\n } \n\n.memproto table {\nwhite-space:nowrap;\n/*border: 1px solid #a6cb5e;*/\nborder-radius: 8px;\npadding: 1px;\nbottom-margin= 2px;\n } =\n.memproto table td{\n    padding:4px;\nwhite-space:nowrap;\nborder: 1px solid black; } =\n.memdoc table {\nborder: 1px solid black; /* Change to suit your preferences */ } =\n.memdoc table td {\nborder: 1px solid black; /* Change to suit your preferences */ }\n\n.altcolor {\nborder: 1px solid #262b5e;\nborder-radius: 8px;\npadding: 2px;\n}\n\n.params td{\n       border: 1px solid #a6cb5e; padding: 4px;\n}\n\n.paramtype {text-align: right}\n\n.memdoc { padding-left: 40px; padding-right: 40px; }\n\nh6{ display: inline; font-size:100%; font-style:italic; }\nh5{ display: inline; font-size:100% }\nh4{ display: inline;  }\n\nbody{padding:10px;  margin-left: 10% ; margin-right: 10% }\n\n.bordered {border: ridge 3px;}\n.withmargins{padding:10px;  margin-left: 10% ;\nmargin-right: 10% }\n\nul{line-height: 135%}\n\nH1,H2,H3,H4,H5,H6/*,P,CENTER,TD,TH,UL,DL,DIV */{\n\tfont-family: Geneva, Arial, Helvetica, sans-serif;\n\tbackground-color: #FFFFDE;\n}\nH1 {\n\ttext-align: center;\n       font-size: 160%;\n}\nH2 {\n       font-size: 120%;\n}\nH3 {\n       font-size: 100%;\n}\n\ndl.section { font-weight: bold }\n\ndiv.headertitle { font-size: 120%;\n                font-weight: bold }\n\nCAPTION { font-weight: bold }\nDIV.qindex {\n\tbackground-color: #eeeeff;\n\tborder: 1px solid #b0b0b0;\n\ttext-align: center;\n\tmargin: 2px;\n\tpadding: 2px;\n\tline-height: 140%;\n}\nDIV.nav {\n\tbackground-color: #eeeeff;\n\tborder: 1px solid #b0b0b0;\n\ttext-align: center;\n\tmargin: 2px;\n\tpadding: 2px;\n\tline-height: 140%;\n}\nA.qindex {\n       text-decoration: none;\n       font-weight: bold;\n       color: #1A419D;\n}\nA.qindex:visited {\n       text-decoration: none;\n       font-weight: bold;\n       color: #1A419D\n}\nA.qindex:hover {\n\ttext-decoration: none;\n\tbackground-color: #ddddff;\n}\nA.qindexHL {\n\ttext-decoration: none;\n\tfont-weight: bold;\n\tbackground-color: #6666cc;\n\tcolor: #ffffff;\n\tborder: 1px double #9295C2;\n}\nA.qindexHL:hover {\n\ttext-decoration: none;\n\tbackground-color: #6666cc;\n\tcolor: #ffffff;\n}\nA.qindexHL:visited { text-decoration: none; background-color: #6666cc; color: #ffffff }\nA.el { text-decoration: none; font-weight: bold }\nA.elRef { font-weight: bold }\nA.code:link { text-decoration: none; font-weight: normal; color: #0000FF}\nA.code:visited { text-decoration: none; font-weight: normal; color: #0000FF}\nA.codeRef:link { font-weight: normal; color: #0000FF}\nA.codeRef:visited { font-weight: normal; color: #0000FF}\nA:hover { text-decoration: none; background-color: #f2f2ff }\nDL.el { margin-left: -1cm }\n.fragment {\n       font-family: monospace\n}\nPRE.fragment,div.fragment {\n\tborder: 1px solid #CCCCCC;\n\tbackground-color: #f5f5f5;\n\tmargin-top: 4px;\n\tmargin-bottom: 4px;\n\tmargin-left: 2px;\n\tmargin-right: 8px;\n\tpadding-left: 6px;\n\tpadding-right: 6px;\n\tpadding-top: 4px;\n\tpadding-bottom: 4px;\n    white-space: pre;\n    line-height:125%;\n}\nDIV.ah { background-color: black; font-weight: bold; color: #ffffff; margin-bottom: 3px; margin-top: 3px }\nTD.md { background-color: #F4F4FB; font-weight: bold; }\nTD.mdPrefix {\n       background-color: #F4F4FB;\n       color: #606060;\n\tfont-size: 80%;\n}\nTD.mdname1 { background-color: #F4F4FB; font-weight: bold; color: #602020; }\nTD.mdname { background-color: #F4F4FB; font-weight: bold; color: #602020; width: 600px; }\nDIV.groupHeader {\n       margin-left: 16px;\n       margin-top: 12px;\n       margin-bottom: 6px;\n       font-weight: bold;\n}\nDIV.groupText { margin-left: 16px; font-style: italic; font-size: 90% }\nBODY {\n\tbackground: white;\n\tcolor: black;\n\tmargin-right: 20px;\n\tmargin-left: 20px;\n}\nTD.indexkey {\n\tbackground-color: #eeeeff;\n\tfont-weight: bold;\n\tpadding-right  : 10px;\n\tpadding-top    : 2px;\n\tpadding-left   : 10px;\n\tpadding-bottom : 2px;\n\tmargin-left    : 0px;\n\tmargin-right   : 0px;\n\tmargin-top     : 2px;\n\tmargin-bottom  : 2px;\n\tborder: 1px solid #CCCCCC;\n}\nTD.indexvalue {\n\tbackground-color: #eeeeff;\n\tfont-style: italic;\n\tpadding-right  : 10px;\n\tpadding-top    : 2px;\n\tpadding-left   : 10px;\n\tpadding-bottom : 2px;\n\tmargin-left    : 0px;\n\tmargin-right   : 0px;\n\tmargin-top     : 2px;\n\tmargin-bottom  : 2px;\n\tborder: 1px solid #CCCCCC;\n}\nTR.memlist {\n   background-color: #f0f0f0; \n}\nP.formulaDsp { text-align: center; }\nIMG.formulaDsp { }\nIMG.formulaInl { vertical-align: middle; }\nSPAN.keyword       { color: #008000 }\nSPAN.keywordtype   { color: #604020 }\nSPAN.keywordflow   { color: #e08000 }\nSPAN.comment       { color: #800000 }\nSPAN.preprocessor  { color: #806020 }\nSPAN.stringliteral { color: #002080 }\nSPAN.charliteral   { color: #008080 }\n.mdTable {\n\tborder: 1px solid #868686;\n\tbackground-color: #F4F4FB;\n}\n.mdRow {\n\tpadding: 8px 10px;\n}\n.mdescLeft {\n       padding: 0px 8px 4px 8px;\n\tfont-size: 80%;\n\tfont-style: italic;\n\tbackground-color: #FAFAFA;\n\tborder-top: 1px none #E0E0E0;\n\tborder-right: 1px none #E0E0E0;\n\tborder-bottom: 1px none #E0E0E0;\n\tborder-left: 1px none #E0E0E0;\n\tmargin: 0px;\n}\n.mdescRight {\n       padding: 0px 8px 4px 8px;\n\tfont-size: 80%;\n\tfont-style: italic;\n\tbackground-color: #FAFAFA;\n\tborder-top: 1px none #E0E0E0;\n\tborder-right: 1px none #E0E0E0;\n\tborder-bottom: 1px none #E0E0E0;\n\tborder-left: 1px none #E0E0E0;\n\tmargin: 0px;\n}\n.memItemLeft {\n\tpadding: 1px 0px 0px 8px;\n\tmargin: 4px;\n\tborder-top-width: 1px;\n\tborder-right-width: 1px;\n\tborder-bottom-width: 1px;\n\tborder-left-width: 1px;\n\tborder-top-color: #E0E0E0;\n\tborder-right-color: #E0E0E0;\n\tborder-bottom-color: #E0E0E0;\n\tborder-left-color: #E0E0E0;\n\tborder-top-style: solid;\n\tborder-right-style: none;\n\tborder-bottom-style: none;\n\tborder-left-style: none;\n\tbackground-color: #FAFAFA;\n\tfont-size: 80%;\n}\n.memItemRight {\n\tpadding: 1px 8px 0px 8px;\n\tmargin: 4px;\n\tborder-top-width: 1px;\n\tborder-right-width: 1px;\n\tborder-bottom-width: 1px;\n\tborder-left-width: 1px;\n\tborder-top-color: #E0E0E0;\n\tborder-right-color: #E0E0E0;\n\tborder-bottom-color: #E0E0E0;\n\tborder-left-color: #E0E0E0;\n\tborder-top-style: solid;\n\tborder-right-style: none;\n\tborder-bottom-style: none;\n\tborder-left-style: none;\n\tbackground-color: #FAFAFA;\n\tfont-size: 80%;\n}\n.memTemplItemLeft {\n\tpadding: 1px 0px 0px 8px;\n\tmargin: 4px;\n\tborder-top-width: 1px;\n\tborder-right-width: 1px;\n\tborder-bottom-width: 1px;\n\tborder-left-width: 1px;\n\tborder-top-color: #E0E0E0;\n\tborder-right-color: #E0E0E0;\n\tborder-bottom-color: #E0E0E0;\n\tborder-left-color: #E0E0E0;\n\tborder-top-style: none;\n\tborder-right-style: none;\n\tborder-bottom-style: none;\n\tborder-left-style: none;\n\tbackground-color: #FAFAFA;\n\tfont-size: 80%;\n}\n.memTemplItemRight {\n\tpadding: 1px 8px 0px 8px;\n\tmargin: 4px;\n\tborder-top-width: 1px;\n\tborder-right-width: 1px;\n\tborder-bottom-width: 1px;\n\tborder-left-width: 1px;\n\tborder-top-color: #E0E0E0;\n\tborder-right-color: #E0E0E0;\n\tborder-bottom-color: #E0E0E0;\n\tborder-left-color: #E0E0E0;\n\tborder-top-style: none;\n\tborder-right-style: none;\n\tborder-bottom-style: none;\n\tborder-left-style: none;\n\tbackground-color: #FAFAFA;\n\tfont-size: 80%;\n}\n.memTemplParams {\n\tpadding: 1px 0px 0px 8px;\n\tmargin: 4px;\n\tborder-top-width: 1px;\n\tborder-right-width: 1px;\n\tborder-bottom-width: 1px;\n\tborder-left-width: 1px;\n\tborder-top-color: #E0E0E0;\n\tborder-right-color: #E0E0E0;\n\tborder-bottom-color: #E0E0E0;\n\tborder-left-color: #E0E0E0;\n\tborder-top-style: solid;\n\tborder-right-style: none;\n\tborder-bottom-style: none;\n\tborder-left-style: none;\n       color: #606060;\n\tbackground-color: #FAFAFA;\n\tfont-size: 80%;\n}\n.search     { color: #003399;\n              font-weight: bold;\n}\nFORM.search {\n              margin-bottom: 0px;\n              margin-top: 0px;\n}\nINPUT.search { font-size: 75%;\n               color: #000080;\n               font-weight: normal;\n               background-color: #eeeeff;\n}\nTD.tiny      { font-size: 75%;\n}\n\n.tiny      { font-size: 75%;\n}\na {\n\tcolor: #252E78;\n}\na:visited {\n\tcolor: #3D2185;\n}\n.dirtab { padding: 4px;\n          border-collapse: collapse;\n          border: 1px solid #b0b0b0;\n}\nTH.dirtab { background: #eeeeff;\n            font-weight: bold;\n}\nHR {/* height: 1px;\n     border: none;\n     border-top: 1px solid black;*/\nwidth: 80%;\n}\nBODY {\n/*font-family: \"Lucida Grande\", \"Lucida Sans Unicode\", \"Verdana\", \"Geneva\", sans-serif;*/\nbackground-color: #FFFFDE;\n}\n\n\n/*For the tree controls */\n.trigger{\n\tcursor: pointer;\n\tcursor: hand;\n}\n.branch{\n\tdisplay: none;\n\tmargin-left: 32px;\n}\ndd {font-weight: normal}\n"
  },
  {
    "path": "eg/Makefile.am",
    "content": "\nEXTRA_DIST = \\\n\tapop_map_row.c \\\n\tbanana.c \\\n\tbinning.c \\\n\tcross_models.c \\\n\tdb_fns.c \\\n\tdconstrain.c \\\n\tdot_products.c \\\n\tdraw_some_normals.c \\\n\tdraw_to_db.c \\\n\tfake_logit.c \\\n\tfix_params.c \\\n\thills2.c \\\n\tiv.c \\\n\tjacobian.c \\\n\tkernel.c \\\n\tks_tests.c \\\n\tls_tables.c \\\n\tols2.c \\\n\tols.c \\\n\tparameterization.c \\\n\tpmf_test.c \\\n\tsimple_subsets.c \\\n\tsome_cdfs.c \\\n\tsql_to_html.c \\\n\tt_test_by_rows.c \\\n\ttest_distances.c \\\n\ttest_fisher.c \\\n\ttest_harmonic.c \\\n\ttest_kl_divergence.c \\\n\ttest_pruning.c \\\n\ttest_ranks.c \\\n\ttest_regex.c \\\n\ttest_updating.c \\\n\ttext_demo.c \\\n\ttransform.c\n"
  },
  {
    "path": "eg/apop_map_row.c",
    "content": "#include <apop.h>\n/* This sample code sets the elements of a data set's vector to one\n   if the index is even.  Then, via the weights vector, it adds up\n   the even indices.\n\n   There is really no need to use the weights vector; this code\n   snippet is an element of Apophenia's test suite, and goes the long\n   way to test that the weights are correctly handled. */\n\ndouble set_vector_to_even(apop_data * r, int index){\n    apop_data_set(r, 0, -1, .val=1-(index %2));\n    return 0;\n}\n\ndouble set_weight_to_index(apop_data * r, int index){ \n    gsl_vector_set(r->weights, 0, index); \n    return 0;\n}\n\ndouble weight_given_even(apop_data *r){ \n    return gsl_vector_get(r->vector, 0) ? gsl_vector_get(r->weights, 0) : 0; \n}\n\nint main(){\n    apop_data *d = apop_data_alloc(100);\n    d->weights = gsl_vector_alloc(100);\n    apop_map(d, .fn_ri=set_vector_to_even, .inplace='v'); //'v=void. Throw out return values.\n    apop_map(d, .fn_ri=set_weight_to_index, .inplace='v');\n    double sum = apop_map_sum(d, .fn_r = weight_given_even);\n    assert(sum == 49*25*2);\n}\n"
  },
  {
    "path": "eg/banana.c",
    "content": "#include <apop.h>\n\ntypedef struct {\n    double scaling;\n} coeff_struct;\n\nlong double banana (double *params, coeff_struct *in){\n    return (gsl_pow_2(1-params[0]) \n               + in->scaling*gsl_pow_2(params[1]-gsl_pow_2(params[0])));\n}\n\nlong double ll (apop_data *d, apop_model *in){\n    return - banana(in->parameters->vector->data, in->more);\n}\n\nint main(){\n    coeff_struct co = {.scaling=100};\n    apop_model *b = &(apop_model) {\"¡Bananas!\", .log_likelihood= ll,\n                     .vsize=2, .more = &co, .more_size=sizeof(coeff_struct)};\n    Apop_model_add_group(b, apop_mle, .verbose='y', .method=\"NM simplex\");\n    Apop_model_add_group(b, apop_parts_wanted);\n    apop_model *e1 = apop_estimate(NULL, b);\n    apop_model_print(e1);\n\n    //for printing the path below\n    apop_data *bfgs_path = NULL;\n    Apop_settings_set(b, apop_mle, path, &bfgs_path);\n\n    Apop_settings_set(b, apop_mle, method, \"BFGS cg\");\n    apop_model *e2 = apop_estimate(NULL, b);\n    apop_model_print(e2);\n\n    apop_data_show(bfgs_path);\n\n    gsl_vector *one = apop_vector_fill(gsl_vector_alloc(2), 1, 1);\n    assert(apop_vector_distance(e1->parameters->vector, one) < 1e-2);\n    assert(apop_vector_distance(e2->parameters->vector, one) < 1e-2);\n}\n"
  },
  {
    "path": "eg/binning.c",
    "content": "#define _GNU_SOURCE\n#include <apop.h>\n\n#define printdata(dataset)           \\\n        printf(\"\\n-----------\\n\\n\"); \\\n        apop_data_print(dataset);   \n\nint main(){\n    apop_data *d = apop_text_alloc(apop_data_alloc(6), 6, 1);\n    apop_data_fill(d,   1,   2,   3,   3,   1,   2);\n    apop_text_fill(d,  \"A\", \"A\", \"A\", \"A\", \"A\", \"B\");\n\n    asprintf(&d->names->title, \"Original data set\");\n    printdata(d);\n\n        //binned, where bin ends are equidistant but not necessarily in the data\n    apop_data *binned = apop_data_to_bins(d);\n    asprintf(&binned->names->title, \"Post binning\");\n    printdata(binned);\n    assert(fabs(//equal distance between bins\n              (apop_data_get(binned, 1) - apop_data_get(binned, 0))\n            - (apop_data_get(binned, 2) - apop_data_get(binned, 1))) < 1e-5);\n\n        //compressed, where the data is as in the original, but weights \n        //are redone to accommodate repeated observations.\n    apop_data_pmf_compress(d);\n    asprintf(&d->names->title, \"Post compression\");\n    printdata(d);\n    assert(apop_sum(d->weights)==6);\n\n    apop_model *d_as_pmf = apop_estimate(d, apop_pmf);\n    apop_data *firstrow = Apop_r(d, 0); //1A\n    assert(fabs(apop_p(firstrow, d_as_pmf) - 2./6 < 1e-5));\n}\n"
  },
  {
    "path": "eg/boot_clt.c",
    "content": "#include <apop.h>\n\n// Find the μ/σ  of a set of 10 draws from a Uniform(-1, 1)\nvoid sim_step(apop_data *none, apop_model *m){\n    int sub_draws = 20;\n    static apop_model *unif;\n    if (!unif) unif = apop_model_set_parameters(apop_uniform, -1, 1);\n    apop_data *draws= apop_model_draws(unif, sub_draws);\n\n    apop_data_set(m->parameters, 0, .val=apop_mean(Apop_cv(draws, 0)));\n    apop_data_set(m->parameters, 1, .val=sqrt(apop_var(Apop_cv(draws, 0))));\n    apop_data_add_names(m->parameters, 'r', \"μ\", \"σ\");\n    apop_data_free(draws);\n}\n\napop_model *clt_sim = &(apop_model){.name=\"CLT simulation\", .vsize=2, .estimate=sim_step};\n\nint main(){\n    apop_data *boots;\n    apop_data * boot_cov = apop_bootstrap_cov(NULL, clt_sim, .iterations=1000, .boot_store=&boots);\n    apop_data_print(boot_cov);\n    apop_data *means = Apop_c(boots, 0);\n\n    printf(\"\\nStats via Normal model:\\n\");\n    apop_data *np = apop_estimate(means, apop_normal)->parameters;\n    np->more = NULL; //rm covariance of statistics.\n    apop_data_print(np);\n\n    //σ from the Normal should == sqrt(cov(μ_boot))\n    assert(fabs(sqrt(apop_data_get(boot_cov,0,0)) - apop_data_get(np, 1)) < 1e-4);\n}\n"
  },
  {
    "path": "eg/cross_models.c",
    "content": "#include <apop.h>\n\n/* In this initial example, build a cross product of two Normal(2,.1) distributions.\nMake 10,000 draws from it.\n \nThen, build a cross product of two unparameterized Normals and estimate the parameters\nof the combined model; check that they match the (2, .1) we started with.\n*/\nvoid cross_normals(){\n    double mu = 2;\n    double sigma = .1;\n    apop_model *n1 = apop_model_set_parameters(apop_normal, mu, sigma);\n    apop_model *n2 = apop_model_copy(n1);\n    apop_model *two_independent_normals = apop_model_cross(n1, n2);\n    //\n    //We don't use it, but the cross product of three is just as easy:\n    apop_model *n3 = apop_model_copy(n1);\n    apop_model *three_independent_normals = apop_model_cross(n1, n2, n3);\n\n    apop_data *draws = apop_model_draws(two_independent_normals, .count=10000);\n\n    //The unparameterized cross product:\n    apop_model *two_n = apop_model_cross(\n                    apop_model_copy(apop_normal),\n                    apop_model_copy(apop_normal)\n                    );\n    apop_model *estimated_norms = apop_estimate(draws, two_n);\n\n    apop_model_print(estimated_norms);\n    apop_data *estp1 = Apop_settings_get(estimated_norms, apop_cross, model1)->parameters;\n    apop_data *estp2 = Apop_settings_get(estimated_norms, apop_cross, model2)->parameters;\n    assert(fabs(apop_data_get(estp1, 0) - mu)    < 2e-3);\n    assert(fabs(apop_data_get(estp2, 0) - mu)    < 2e-3);\n    assert(fabs(apop_data_get(estp1, 1) - sigma) < 2e-3);\n    assert(fabs(apop_data_get(estp2, 1) - sigma) < 2e-3);\n}\n\n//bind together a Poisson and a Normal\nvoid norm_cross_poisson(){\n    apop_model *m1 = apop_model_set_parameters(apop_poisson, 3);\n    apop_model *m2 = apop_model_set_parameters(apop_normal, -5, 1);\n    apop_model *mm = apop_model_cross(m1, m2);\n    int len = 1e5;\n    apop_data *draws = apop_model_draws(mm, len);\n    for (int i=0; i< len; i++){\n        Apop_row_v(draws, i, onev);\n        assert((int)onev->data[0] == onev->data[0]);\n        assert(onev->data[1]<0);\n    }\n\n    /*The rest of the test script recovers the parameters.\n    Input data to an apop_cross model can take two formats. In cross_normals, the\n    draws are in a single matrix. Here, the data for the Poisson (col 0 of the draws)\n    will be put in an apop_data set, and the data for the Normal (col 1 of the draws)\n    on a second page appended to the first. Then, set the .splitpage element of the\n    apop_cross settings group to the name of the second page.\n    */\n    apop_data *comeback = apop_data_alloc();\n    comeback->vector = apop_vector_copy(Apop_cv(draws, 0));\n    apop_data_add_page(comeback, apop_data_alloc(), \"p2\");\n    comeback->more->vector = apop_vector_copy(Apop_cv(draws, 1));\n\n    //set up the un-parameterized crossed model, including\n    //the name at which to split the data set\n    apop_model *estme = apop_model_cross(apop_model_copy(apop_poisson), apop_model_copy(apop_normal));\n    Apop_settings_add(estme, apop_cross, splitpage, \"p2\");\n    apop_model *ested = apop_estimate(comeback, estme);\n\n    //test that the parameters are as promised.\n    apop_model *m1back = apop_settings_get(ested, apop_cross, model1);\n    apop_model *m2back = apop_settings_get(ested, apop_cross, model2);\n    assert(fabs(apop_data_get(m1back->parameters, .col=-1) - 3) < 5e-1);\n    assert(fabs(apop_data_get(m2back->parameters, .col=-1) - -5) < 5e-1);\n    assert(fabs(apop_data_get(m2back->parameters, .col=-1, .row=1) - 1) < 5e-1);\n\n    //You can cross as many models as you'd like.\n    apop_model *m3 = apop_model_set_parameters(apop_poisson, 8);\n    apop_model *mmm = apop_model_cross(m1, m2, m3);\n    apop_data *sum = apop_data_summarize(apop_model_draws(mmm, 1e5));\n    assert(fabs(apop_data_get(sum, .row=0, .colname=\"mean\") - 3) < 2e-2);\n    assert(fabs(apop_data_get(sum, .row=1, .colname=\"mean\") - -5) < 2e-2);\n    assert(fabs(apop_data_get(sum, .row=2, .colname=\"mean\") - 8) < 4e-2);\n    assert(apop_data_get(sum, .row=0, .colname=\"median\") == 3);\n    assert(apop_data_get(sum, .row=2, .colname=\"median\") == 8);\n}\n\nint main(){\n    cross_normals();\n    norm_cross_poisson();\n}\n"
  },
  {
    "path": "eg/data_fill.c",
    "content": "#include <apop.h>\n\nvoid with_fixed_numbers(){\n    apop_data *a =apop_data_alloc(2,2,2);\n    double    eight   = 8.0;\n    apop_data_fill(a, 8, 2.2, eight/2,\n                      0, 6.0, eight);\n    apop_data_show(a);\n}\n\nvoid with_a_list(){\n  apop_data *a =apop_data_alloc(2,2,2);\n  double    eight   = 8.0;\n  double list[] = {8, 2.2, eight/2,\n                   0, 6.0, eight};\n    apop_data_fill_base(a, list);\n    apop_data_show(a);\n}\n\nint main(){\n    with_fixed_numbers();\n    printf(\"-----\\n\");\n    with_a_list();\n}\n"
  },
  {
    "path": "eg/db_fns.c",
    "content": "#include <apop.h>\n\n#define Diff(L, R) assert(fabs((L)-(R)<1e-4));\n#define Diff2(L, R) assert(fabs((L)-(R)<1e-3));\n#define getrow(rowname) apop_data_get(row, .colname=#rowname)\n\ndouble test_all(apop_data *row){\n    Diff(gsl_pow_2(getrow(root)), getrow(rr))\n    Diff(getrow(ln), getrow(L10)*log(10))\n    Diff(getrow(rr), getrow(rragain))\n    Diff(getrow(one), 1)\n    return 0;\n}\n\nint main(){\n    apop_opts.db_engine='s'; //SQLite only.\n\n    //create a table with two rows.\n    //We didn't explicitly open a db with apop_db_open,\n    //so this will be an in-memory SQLite db.\n    apop_query(\"create table a(b); \"\n               \"insert into a values(1); \"\n               \"insert into a values(1); \"\n\n                \"create table randoms as \"\n                \"select ran() as rr \"\n                /* join to create 2^13=8192 rows*/\n                \"from a,a,a,a,a,a,a,a,a,a,a,a,a;\");\n    apop_data *d = apop_query_to_data(\n            \"select rr, sqrt(rr) as root, \"\n            \"log(rr) as ln, log10(rr) as L10, \"\n            \"exp(log(rr)) as rragain, \"\n            \"pow(sin(rr),2)+pow(cos(rr),2) as one \"\n            \"from randoms\");\n    apop_map(d, .fn_r=test_all);\n\n    //the pop variance of a Uniform[0,1]=1/12; kurtosis=1/80.\n    Apop_col_tv(d, \"rr\", rrow);\n    Diff(apop_var(rrow)*8191./8192., 1/12. );\n    Diff(apop_vector_kurtosis(rrow)*8191./8192., 1/80.);//approx.\n\n    Diff(apop_query_to_float(\"select stddev(rr) from randoms\"), \n                sqrt(1/12.)*8192./8191);\n\n\n    //compare the std dev of a uniform as reported by the \n    //database routine, the matrix routine, and math.\n    apop_query(\"create table atab (a numeric)\");\n    for (int i=0; i< 2e5; i++)\n        apop_query(\"insert into atab values(ran())\");\n    apop_query(\"create table powa as \"\n            \"select a, pow(a, 2) as sq, pow(a, 0.5) as sqrt \"\n            \"from atab\");\n\n    double db_pop_stddev = apop_query_to_float(\"select stddev_pop(a) from powa\");\n    d = apop_query_to_data(\"select * from powa\");\n    //get the full covariance matrix, but just use the (0,0)th elmt.\n    apop_data *cov = apop_data_covariance(d);\n    double matrix_pop_stddev = sqrt(apop_data_get(cov)*(d->matrix->size1/(d->matrix->size1-1.)));\n    Diff(db_pop_stddev, matrix_pop_stddev);\n    double actual_stddev = sqrt(2*gsl_pow_3(.5)/3);\n    Diff2(db_pop_stddev, actual_stddev);\n\n    float sq_mean = apop_query_to_float(\"select avg(sq) from powa\");\n    float actual_sq_mean = 1./3;\n    Diff2(sq_mean, actual_sq_mean);\n\n    float sqrt_mean = apop_query_to_float(\"select avg(sqrt) from powa\");\n    float actual_sqrt_mean = 2./3;\n    Diff2(sqrt_mean, actual_sqrt_mean);\n}\n"
  },
  {
    "path": "eg/dconstrain.c",
    "content": "#include <apop.h>\n\n//The constraint function.\ndouble over_zero(apop_data *in, apop_model *m){\n    return apop_data_get(in) > 0;\n}\n\n//The optional scaling function.\ndouble in_bounds(apop_model *m){\n    double z = 0;\n    gsl_vector_view vv = gsl_vector_view_array(&z, 1);\n    return 1- apop_cdf(&((apop_data){.vector=&vv.vector}), m);\n}\n\nint main(){\n    /*Set up a Normal distribution, with data truncated to be nonnegative.\n      This version doesn't use the in_bounds function above, and so the\n      default scaling function is used.*/\n    gsl_rng *r = apop_rng_alloc(213);\n    apop_model *norm = apop_model_set_parameters(apop_normal, 1.2, 0.8);\n    apop_model *trunc = apop_model_set_settings(apop_dconstrain,\n                            .base_model=apop_model_copy(norm), \n                            .constraint=over_zero, .draw_ct=5e4, .rng=r);\n\n    //make draws. Currently, you need to prep the model first.\n    apop_prep(NULL, trunc);\n    apop_data *d = apop_model_draws(trunc, 1e5);\n\n    //Estimate the parameters given the just-produced data:\n    apop_model *est = apop_estimate(d, trunc);\n    apop_model_print(est);\n    assert(apop_vector_distance(est->parameters->vector, norm->parameters->vector)<1e-1);\n\n    //Generate a data set that is truncated at zero using alternate means\n    apop_data *normald = apop_model_draws(apop_model_set_parameters(apop_normal, 0, 1), 5e4);\n    for (int i=0; i< normald->matrix->size1; i++){\n        double *d = apop_data_ptr(normald, i);\n        if (*d < 0) *d *= -1;\n    }\n\n    //this time, use an unparameterized model, and the in_bounds fn\n    apop_model *re_trunc = apop_model_set_settings(apop_dconstrain,\n                            .base_model=apop_normal, \n                            .constraint=over_zero, .scaling=in_bounds);\n\n    apop_model *re_est = apop_estimate(normald, re_trunc);\n    apop_model_print(re_est);\n    assert(apop_vector_distance(re_est->parameters->vector,\n                apop_vector_fill(gsl_vector_alloc(2), 0, 1))<1e-1);\n    apop_model_free(trunc);\n}\n"
  },
  {
    "path": "eg/dot_products.c",
    "content": "/* A demonstration of dot products and various useful \n   transformations among types. */\n\n#include <apop.h>\n\ndouble eps=1e-3;//slow to converge series-->large tolerance.\n#define Diff(L, R) Apop_assert(fabs((L)-(R)<(eps)), \"%g is too different from %g (abitrary limit=%g).\", (double)(L), (double)(R), eps);\n\nint main(){\n    int len = 3000;\n    gsl_vector *v = gsl_vector_alloc(len);\n    for (double i=0; i< len; i++) gsl_vector_set(v, i, 1./(i+1));\n    double square;\n    gsl_blas_ddot(v, v, &square);\n    printf(\"1 + (1/2)^2 + (1/3)^2 + ...= %g\\n\", square);\n\n    double pi_over_six = gsl_pow_2(M_PI)/6.;\n    Diff(square, pi_over_six);\n\n    /* Now using apop_dot, in a few forms.\n       First, vector-as-data dot itself.\n       If one of the inputs is a vector,\n       apop_dot puts the output in a vector-as-data:*/\n    apop_data *v_as_data = &(apop_data){.vector=v};\n    apop_data *vdotv = apop_dot(v_as_data, v_as_data);\n    Diff(gsl_vector_get(vdotv->vector, 0), pi_over_six);\n\n    /* Wrap matrix in an apop_data set. */\n    gsl_matrix *v_as_matrix = apop_vector_to_matrix(v);\n    apop_data dm = (apop_data){.matrix=v_as_matrix};\n\n    // (1 X len) vector dot (len X 1) matrix --- produce a scalar (one item vector).\n    apop_data *mdotv = apop_dot(v_as_data, &dm);\n    double scalarval = apop_data_get(mdotv);\n    Diff(scalarval, pi_over_six);\n\n    //(len X 1) dot (len X 1) --- bad dimensions.\n    apop_opts.verbose=-1; //don't print an error.\n    apop_data *mdotv2 = apop_dot(&dm, v_as_data);\n    apop_opts.verbose=0; //back to safety.\n    assert(mdotv2->error);\n\n    // If we want (len X 1) dot (1 X len) --> (len X len),\n    // use apop_vector_to_matrix.\n    apop_data dmr = (apop_data){.matrix=apop_vector_to_matrix(v, .row_col='r')};\n    apop_data *product_matrix = apop_dot(&dm, &dmr);\n    //The trace is the sum of squares:\n    gsl_vector_view trace = gsl_matrix_diagonal(product_matrix->matrix);\n    double tracesum = apop_sum(&trace.vector);\n    Diff(tracesum, pi_over_six);\n\n    apop_data_free(product_matrix);\n    gsl_matrix_free(dmr.matrix);\n}\n"
  },
  {
    "path": "eg/draw_some_normals.c",
    "content": "#include <apop.h>\n#include <time.h>\n\nint main(){\n    apop_opts.rng_seed = time(NULL);\n    apop_data_print(\n            apop_model_draws(\n                apop_model_set_parameters(apop_normal, 0, 1), \n                .count=10, \n            )\n    );\n}\n"
  },
  {
    "path": "eg/draw_to_db.c",
    "content": "#include <apop.h>\n\n//Your processes are probably a bit more complex.\ndouble process_one(gsl_rng *r){\n    return gsl_rng_uniform(r) * gsl_rng_uniform(r) ;\n}\n\ndouble process_two(gsl_rng *r){\n    return gsl_rng_uniform(r);\n}\n\nint main(){\n    gsl_rng *r = apop_rng_alloc(123);\n\n    //create the database and the data table.\n    apop_db_open(\"runs.db\");\n    apop_table_exists(\"samples\", 'd'); //If the table already exists, delete it.\n    apop_query(\"create table samples(iteration, process, value); begin;\");\n\n    //populate the data table with runs.\n    for (int i=0; i<1000; i++){\n        double p1 = process_one(r);\n        double p2 = process_two(r);\n        apop_query(\"insert into samples values(%i, %i, %g);\", i, 1, p1);\n        apop_query(\"insert into samples values(%i, %i, %g);\", i, 2, p2);\n    }\n    apop_query(\"commit;\"); //the begin-commit wrapper saves writes to the drive.\n\n    //pull the data from the database, converting it into a table along the way. \n    apop_data *m  = apop_db_to_crosstab(\"samples\", \"iteration\",\"process\", \"value\");\n\n    gsl_vector *v1 = Apop_cv(m, 0); //get vector views of the two table columns.\n    gsl_vector *v2 = Apop_cv(m, 1);\n\n    //Output a table of means/variances, and t-test results.\n    printf(\"\\t   mean\\t\\t   var\\n\");\n    printf(\"process 1: %f\\t%f\\n\", apop_mean(v1), apop_var(v1));\n    printf(\"process 2: %f\\t%f\\n\\n\", apop_mean(v2), apop_var(v2));\n    printf(\"t test\\n\");\n    apop_data_show(apop_t_test(v1, v2));\n    apop_data_print(m, \"the_data.txt\");\n}\n"
  },
  {
    "path": "eg/entropy_model.c",
    "content": "#include <apop.h>\n#define Diff(left, right, eps) Apop_stopif(fabs((left)-(right))>(eps), \\\n        abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", \\\n        (double)(left), (double)(right), eps)\n\n\n/* The entropy function, like some other functions (including apop_update) has a lookup\n table for known models like the Normal distribution. If the input model has\n \\c log_likelihood, \\c p, and \\c draw functions that are the ones found in \\ref\n apop_nomrmal, then use a known calculation to report entropy; else report based on\n random draws from the model.\n\nIf we make a copy of the \\ref apop_normal model and replace the log likelihood with\na new function that produces identical values, the lookup table will not find the\nmodified model, and the calculation via random draws will be done. Of course, the\nfinal entropy as calculated using both methods should differ only by a small amount.\n*/\n\nlong double mask(apop_data *d, apop_model *m){\n    return apop_normal->log_likelihood(d, m);\n}\n\nint main(){\n    for (double i=0.1; i< 10; i+=.2){\n        apop_model *n = apop_model_set_parameters(apop_normal, 8, i);\n        long double v= apop_model_entropy(n);\n        n->log_likelihood = mask;\n        long double w= apop_model_entropy(n, 50000);\n        Diff(v, w, 5e-2);\n    }\n}\n"
  },
  {
    "path": "eg/entropy_vector.c",
    "content": "#include <apop.h>\n\n#define Diff(left, right, eps) Apop_stopif(fabs((left)-(right))>(eps), abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", (double)(left), (double)(right), eps)\n\nlong double entropy_base_2(gsl_vector *x) {\n    return apop_vector_entropy(x)/log(2);\n}\n\nint main(){\n    apop_model *flip = apop_model_set_parameters(apop_bernoulli, .5);\n\n    //zero data => entropy zero\n    gsl_vector *v = gsl_vector_calloc(1);\n    assert(apop_vector_entropy(v) == 0);\n\n    //negative data => NaN\n    gsl_vector_set(v, 0, -1);\n    int v1 = apop_opts.verbose;\n    apop_opts.verbose = -1;\n    assert(isnan(apop_vector_entropy(v)));\n    apop_opts.verbose = v1;\n\n    //N equiprobable bins => entropy = log(N)\n    v = apop_vector_realloc(v, 100);\n    gsl_vector_set_all(v, 1./100);\n    Diff(log(100), apop_vector_entropy(v), 1e-5);\n\n    //Normalization is optional. You may send a vector of counts.\n    gsl_vector_set_all(v, 1);\n    Diff(log(100), apop_vector_entropy(v), 1e-5);\n\n    //flip two coins.\n    apop_data *coin_flips = apop_model_draws(flip, .count=10000);\n    apop_data *c2         = apop_model_draws(flip, .count=10000);\n    apop_data_stack(c2, coin_flips, 'c', .inplace='y');\n\n    //entropy of one coin flip in base2 == 1\n    apop_data_pmf_compress(coin_flips);\n    Diff(entropy_base_2(coin_flips->weights), 1, 1e-3);\n\n    //entropy of two coin flips in base2 == 2\n    apop_data_pmf_compress(c2);\n    Diff(entropy_base_2(c2->weights), 2, 1e-3);\n\n    //flip three coins, via model cross products\n    Diff(entropy_base_2(apop_data_pmf_compress(apop_model_draws(\n            apop_model_cross(flip, flip, flip) ,.count=10000))->weights), 3, 1e-3);\n\n    apop_data_free(coin_flips);\n    apop_data_free(c2);\n    gsl_vector_free(v);\n}\n"
  },
  {
    "path": "eg/f_test.c",
    "content": "#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n\n#define Diff(L, R, eps) {double left=(L), right=(R); Apop_stopif(isnan(left-right) || fabs((left)-(right))>(eps), abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", (double)(left), (double)(right), eps);}\n\n/** I claim that the F test calculated via apop_F_test(est, NULL, NULL)\n equals a transformation of R^2 (after a normalization step).\n*/\nvoid test_f(apop_model *est){\n    apop_data *rsq  = apop_estimate_coefficient_of_determination(est);\n    apop_data *constr= apop_data_calloc(est->parameters->vector->size-1, est->parameters->vector->size);\n    int i;\n    for (i=1; i< est->parameters->vector->size; i++)\n        apop_data_set(constr, i-1, i, 1);\n    apop_data *ftab = apop_F_test(est, constr);\n    apop_data *ftab2 = apop_F_test(est, NULL);\n    //apop_data_show(ftab);\n    //apop_data_show(ftab2);\n    double n = est->data->matrix->size1;\n    double K = est->parameters->vector->size-1;\n    double r = apop_data_get(rsq, .rowname=\"R squared\");\n    double f = apop_data_get(ftab, .rowname=\"F statistic\");\n    double f2 = apop_data_get(ftab2, .rowname=\"F statistic\");\n    Diff (f , r*(n-K)/((1-r)*K) , 1e-3);\n    Diff (f2 , r*(n-K)/((1-r)*K) , 1e-3);\n}\n\nint main(){\n    apop_data *d = apop_text_to_data( DATADIR \"/\" \"test_data2\" );\n    apop_model *an_ols_model = apop_model_copy(apop_ols);\n    Apop_model_add_group(an_ols_model, apop_lm, .want_expected_value= 1);\n    apop_model *e  = apop_estimate(d, an_ols_model);\n    test_f(e);\n}\n"
  },
  {
    "path": "eg/faithful.c",
    "content": "\n#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n\n/* This replacement for apop_model_print(in) demonstrates retrieval of the useful\nsettings: the weights (λ) and list of estimated models. It is here only for\ndemonstration purposes---it is what apop_model_print(your_mix) will do.\n*/\nvoid show_mix(apop_model *in){\n    apop_mixture_settings *ms = Apop_settings_get_group(in, apop_mixture);\n    printf(\"The weights:\\n\");\n    apop_vector_print(ms->weights);\n    printf(\"\\nThe models:\\n\");\n    for (apop_model **m = ms->model_list; *m; m++) //model_list is a NULL-terminated list.\n        apop_model_print(*m, stdout);\n}\n\nint main(){\n    apop_text_to_db( DATADIR \"/\" \"faith.data\", \"ff\");\n    apop_data *dd = apop_query_to_data(\"select waiting from ff\");\n    apop_model *mf = apop_model_mixture(apop_model_copy(apop_normal), apop_model_copy(apop_normal));\n    Apop_settings_set(mf, apop_mixture, find_weights, 'y');//Use the EM algorithm to search for optimal weights.\n\n    /* The process is famously sensitive to starting points. Try many random points, or\n       eyeball the distribution's plot and guess at the starting values. */\n    Apop_model_add_group(mf, apop_mle, .starting_pt=(double[]){.5, .5, 50, 5, 80, 5},\n                                       .step_size=3, .tolerance=1e-6);\n    apop_model *mfe = apop_estimate(dd, mf);\n    apop_model_print(mfe, stdout);\n    printf(\"LL=%g\\n\", apop_log_likelihood(dd, mfe));\n\n\n    printf(\"\\n\\nValues calculated in the source paper, for comparison.\\n\");\n    apop_model *r_ed = apop_model_mixture(\n                         apop_model_set_parameters(apop_normal, 54.61364, 5.869089),\n                         apop_model_set_parameters(apop_normal, 80.09031, 5.869089));\n    apop_data *wts = apop_data_falloc((2), 0.3608498, 0.6391502);\n    Apop_settings_add(r_ed, apop_mixture, weights, wts->vector);\n    show_mix(r_ed);\n    printf(\"LL=%g\\n\", apop_log_likelihood(dd, r_ed));\n}\n"
  },
  {
    "path": "eg/fake_logit.c",
    "content": "#include <apop.h>\n#include <unistd.h>\n\nchar *testfile = \"logit_test_data\";\n\n//generate a fake data set.\n//Notice how the first column is the outcome, just as with standard regression.\nvoid write_data(){\n    FILE *f = fopen(testfile, \"w\");\n    fprintf(f, \"\\\n        outcome,A, B \\n\\\n        0, 0, 0     \\n\\\n        1, 1, 1     \\n\\\n        1, .7, .5   \\n\\\n        1, .7, .3   \\n\\\n        1, .3, .7   \\n\\\n        \\n\\\n        1, .5, .5   \\n\\\n        0, .4, .4   \\n\\\n        0, .3, .4   \\n\\\n        1, .1, .3   \\n\\\n        1, .3, .1   \");\n    fclose(f);\n}\n\nint main(){\n    write_data();\n    apop_data *d = apop_text_to_data(testfile);\n    Apop_model_add_group(apop_logit, apop_mle, .tolerance=1e-5);\n    apop_model *est = apop_estimate(d, apop_logit);\n    unlink(testfile);\n\n    /* Apophenia's test suite checks that this code produces \n       values close to canned values. As a human, you probably \n       just want to print the results to the screen. */\n    apop_model_show(est);\n\n    assert(fabs(apop_data_get(est->parameters, .rowname=\"1\")- -1.155026) < 1e-6);\n    assert(fabs(apop_data_get(est->parameters, .rowname=\"A\")- 4.039903) < 1e-6);\n    assert(fabs(apop_data_get(est->parameters, .rowname=\"B\")- 1.494694) < 1e-6);\n}\n"
  },
  {
    "path": "eg/fix_params.c",
    "content": "#include <apop.h>\n\nint main(){\n    size_t ct = 5e4;\n\n    //set up the model & params\n    apop_data *params = apop_data_falloc((2,2,2), 8,  1, 0.5,\n                                                  2,  0.5, 1);\n    apop_model *pvm = apop_model_copy(apop_multivariate_normal);\n    pvm->parameters = apop_data_copy(params);\n    pvm->dsize = 2;\n    apop_data *d = apop_model_draws(pvm, ct);\n\n    //set up and estimate a model with fixed covariance matrix but free means\n    gsl_vector_set_all(pvm->parameters->vector, GSL_NAN);\n    apop_model *mep1 = apop_model_fix_params(pvm);\n    apop_model *e1 = apop_estimate(d, mep1);\n    \n    //compare results\n    printf(\"original params: \");\n    apop_vector_print(params->vector);\n    printf(\"estimated params: \");\n    apop_vector_print(e1->parameters->vector);\n    assert(apop_vector_distance(params->vector, e1->parameters->vector)<1e-2); \n}\n"
  },
  {
    "path": "eg/hills2.c",
    "content": "#include <apop.h>\n\n/* \nUse apop_model_mixture to generate a hump-filled distribution, then find \nthe most likely data points and check that they are near the humps.\n*/\n\n//Produce a 2-D multivariate normal model with unit covariance and given mean \napop_model *produce_fixed_mvn(double x, double y){\n    apop_model *out = apop_model_copy(apop_multivariate_normal);\n    out->parameters = apop_data_falloc((2, 2, 2),\n                        x, 1, 0,\n                        y, 0, 1);\n    out->dsize = 2;\n    return out;\n}\n\nint main(){\n    //here's a mean/covariance matrix for a standard multivariate normal.\n    apop_model *many_humps = apop_model_mixture(\n                        produce_fixed_mvn(5, 6),\n                        produce_fixed_mvn(-5, -4),\n                        produce_fixed_mvn(0, 1));\n    apop_prep(NULL, many_humps);\n\n    int len = 100000;\n    apop_data *d = apop_model_draws(many_humps, len);\n\n    gsl_vector *first = Apop_cv(d, 0);\n    printf(\"mu=%g\\n\", apop_mean(first));\n    assert(fabs(apop_mean(first)- 0) < 5e-2);\n\n    gsl_vector *second = Apop_cv(d, 1);\n    printf(\"mu=%g\\n\", apop_mean(second));\n    assert(fabs(apop_mean(second)- 1) < 5e-2);\n\n/*  Use the ML imputation routine to search for the input value with the highest\n    log likelihood. Do the search via simulated annealing. */\n\n    apop_data *x = apop_data_alloc(1,2);\n    gsl_matrix_set_all(x->matrix, NAN);\n\n    apop_opts.stop_on_warning='v';\n    apop_ml_impute(x, many_humps);\n\n    printf(\"Optimum found at:\\n\");\n    apop_data_show(x);\n    assert(fabs(apop_data_get(x, .col=0)- 0) + fabs(apop_data_get(x, .col=1) - 1) < 1e-2);\n}\n"
  },
  {
    "path": "eg/iv.c",
    "content": "/* Instrumental variables are often used to deal with variables measured with noise, so\nthis example produces a data set with a column of noisy data, and a separate instrument\nmeasured with greater precision, then sets up and runs an instrumental variable regression.\n\nTo guarantee that the base data set has noise and the instrument is cleaner, the\nprocedure first generates the clean data set, then copies the first column to the\ninstrument set, then the add_noise function inserts Gaussian noise into the base\ndata set. Once the base set and the instrument set have been generated, the setup for\nthe IV consists of adding the relevant names and using Apop_model_add_group to add a\nlm (linear model) settings group with an .instrument=instrument_data element.\n\nIn fact, the example sets up a sequence of IV regressions, with more noise each\ntime.\n*/\n\n#include <apop.h>\n#define Diff(L, R, eps) Apop_stopif(fabs((L)-(R)>=(eps)), return, 0, \"%g is too different \\\n        from %g (abitrary limit=%g).\", (double)(L), (double)(R), eps);\n\nint datalen =1e4;\n\n//generate a vector that is the original vector + noise\nvoid add_noise(gsl_vector *in, gsl_rng *r, double size){\n    apop_model *nnoise = apop_model_set_parameters(apop_normal, 0, size);\n    apop_data *nd = apop_model_draws(nnoise, in->size);\n    gsl_vector_add(in, Apop_cv(nd, 0));\n    /*for (int i=0; i< in->size; i++){\n        double noise;\n        apop_draw(&noise, r, nnoise);\n        *gsl_vector_ptr(in, i) += noise;\n    }*/\n    apop_data_free(nd);\n    apop_model_free(nnoise);\n}\n\nvoid test_for_unbiased_parameter_estimates(apop_model *m, double tolerance){\n        Diff(apop_data_get(m->parameters, 0, -1), -1.4, tolerance);\n        Diff(apop_data_get(m->parameters, 1, -1), 2.3, tolerance);\n}\n\nint main(){\n    gsl_rng *r = apop_rng_alloc(234);\n\n    apop_data *data = apop_data_alloc(datalen, 2);\n    for(int i=0; i< datalen; i++){\n        apop_data_set(data, i, 1, 100*(gsl_rng_uniform(r)-0.5));\n        apop_data_set(data, i, 0, -1.4 + apop_data_get(data,i,1)*2.3);\n    }\n    apop_name_add(data->names, \"dependent\", 'c');\n    apop_name_add(data->names, \"independent\", 'c');\n    apop_model *oest = apop_estimate(data, apop_ols);\n    apop_model_show(oest);\n\n    //the data with no noise will be the instrument.\n    gsl_vector *col1 = Apop_cv(data, 1);\n    apop_data *instrument_data = apop_data_alloc(data->matrix->size1, 1);\n    gsl_vector_memcpy(Apop_cv(instrument_data, 0), col1);\n    apop_name_add(instrument_data->names, \"independent\", 'c');\n    Apop_model_add_group(apop_iv, apop_lm, .instruments = instrument_data);\n\n    //Now add noise to the base data four times, and estimate four IVs.\n    int tries = 4;\n    apop_model *ests[tries];\n    for (int nscale=0; nscale<tries; nscale++){\n        add_noise(col1, r, nscale==0 ? 0 : pow(10, nscale-tries));\n        ests[nscale] = apop_estimate(data, apop_iv);\n        if (nscale==tries-1){ //print the one with the largest error.\n            printf(\"\\nnow IV:\\n\");\n            apop_model_show(ests[nscale]);\n        }\n    }\n\n    /* Now test. The parameter estimates are unbiased.\n       As we add more noise, the covariances expand.\n       Test that the ratio of one covariance matrix to the next\n       is less than one, though these are typically very much\n       smaller than one (as the noise is an order of magnitude \n       larger in each case), and the ratios will be identical\n       for each j, k below. */\n    test_for_unbiased_parameter_estimates(ests[0], 1e-6);\n    for (int i=1; i<tries; i++){\n        test_for_unbiased_parameter_estimates(ests[i], 1e-3);\n\n        gsl_matrix *cov = apop_data_get_page(ests[i-1]->parameters, \"<Covariance>\")->matrix;\n        gsl_matrix *cov2 = apop_data_get_page(ests[i]->parameters, \"<Covariance>\")->matrix;\n        gsl_matrix_div_elements(cov, cov2);\n        for (int j =0; j< 2; j++)\n            for (int k =0; k< 2; k++)\n                assert(gsl_matrix_get(cov, j, k) < 1);\n    }\n}\n"
  },
  {
    "path": "eg/jack.c",
    "content": "#include <apop.h>\n\nint main(){\n    int draw_ct = 1000;\n    apop_model *m = apop_model_set_parameters(apop_normal, 1, 3);\n    double sigma = apop_data_get(m->parameters, 1);\n    apop_data *d = apop_model_draws(m, draw_ct);\n    apop_data *out = apop_jackknife_cov(d, m);\n    double error = fabs(apop_data_get(out, 0,0)-gsl_pow_2(sigma)/draw_ct) //var(mu)\n                + fabs(apop_data_get(out, 1,1)-gsl_pow_2(sigma)/(2*draw_ct))//var(sigma)\n                +fabs(apop_data_get(out, 0,1)) +fabs(apop_data_get(out, 1,0));//cov(mu,sigma); should be 0.\n    apop_data_free(d);\n    apop_data_free(out);\n    assert(error < 1e-2);//Not very accurate.\n}\n\n\n"
  },
  {
    "path": "eg/jacobian.c",
    "content": "/* A Lognormal distribution is a transform of the Normal distribution, where \n the data space of the Normal is exponentiated. Thus, to get back to the original data space, take the log of the current data.\n */\n#include <apop.h>\n#define Diff(a, b) assert(fabs((a)-(b)) < 1e-2);\n\n//Use this function to produce test data below.\napop_data *draw_exponentiated_normal(double mu, double sigma, double draws){\n    apop_model *n01 = apop_model_set_parameters(apop_normal, mu, sigma);\n    apop_data *d = apop_data_alloc(draws);\n    gsl_rng *r = apop_rng_alloc(13);\n    for (int i=0; i< draws; i++) apop_draw(gsl_vector_ptr(d->vector,i), r, n01);\n    apop_vector_exp(d->vector);\n    return d;\n}\n\n// The transformed-to-base function and its derivative for the Jacobian:\napop_data *rev(apop_data *in){ return apop_map(in, .fn_d=log, .part='a'); }\n\n/*The derivative of the transformed-to-base function. */\ndouble inv(double in){return 1./in;} \ndouble rev_j(apop_data *in){ return fabs(apop_map_sum(in, .fn_d=inv, .part='a')); }\n\nint main(){\n    apop_model *ct = apop_model_coordinate_transform(\n                        .transformed_to_base= rev, .jacobian_to_base=rev_j,\n                        .base_model=apop_normal);\n    //Apop_model_add_group(ct, apop_parts_wanted);//Speed up the MLE.\n\n    //make fake data\n    double mu=2, sigma=1;\n    apop_data *d = draw_exponentiated_normal(mu, sigma, 2e5);\n\n    //If we correctly replicated a Lognormal, mu and sigma will be right:\n    apop_model *est = apop_estimate(d, ct);\n    apop_model_free(ct);\n    Diff(apop_data_get(est->parameters, 0), mu);\n    Diff(apop_data_get(est->parameters, 1), sigma);\n\n    /*The K-L divergence between our Lognormal and the stock Lognormal\n      should be small. Try it with both the original params and the estimated ones. */\n    apop_model *ln = apop_model_set_parameters(apop_lognormal, mu, sigma);\n    apop_model *ln2 = apop_model_copy(apop_lognormal);\n    ln2->parameters = est->parameters;\n    Diff(apop_kl_divergence(ln, ln2,.draw_ct=1000), 0);\n    Diff(apop_kl_divergence(ln, est,.draw_ct=1000), 0);\n}\n"
  },
  {
    "path": "eg/kernel.c",
    "content": "/* This program draws ten random data points, and then produces two kernel density\nestimates: one based on the Normal distribution and one based on the Uniform.\n\nIt produces three outputs:\n--stderr shows the random draws\n--kerneldata is a file written with plot data for both KDEs\n--stdout shows instructions to gnuplot, so you can pipe:\n./kernel | gnuplot -persist\n\nMost of the code is taken up by the plot() and draw_some_data() functions, which are\nstraightforward. Notice how plot() pulls the values of the probability distributions \nat each point along the scale.\n\nThe set_uniform_edges function sets the max and min of a Uniform distribution so that the\ngiven point is at the center of the distribution.\n\nThe first KDE uses the defaults, which are based on a Normal distribution with std dev 1;\nthe second explicitly sets the .kernel and .set_fn for a Uniform.\n*/\n\n#include <apop.h>\n\nvoid set_uniform_edges(apop_data * r, apop_model *unif){\n    apop_data_set(unif->parameters, 0, -1, r->matrix->data[0]-0.5);\n    apop_data_set(unif->parameters, 1, -1, r->matrix->data[0]+0.5);\n}\n\nvoid plot(apop_model *k, apop_model *k2){\n    apop_data *onept = apop_data_alloc(1,1);\n    FILE *outtab = fopen(\"kerneldata\", \"w\");\n    for (double i=0; i<20; i+=0.01){\n        apop_data_set(onept, .val=i);\n        fprintf(outtab, \"%g %g %g\\n\", i, apop_p(onept, k), apop_p(onept, k2));\n    }\n    fclose(outtab);\n    printf(\"plot 'kerneldata' using 1:2\\n\"\n           \"replot 'kerneldata' using 1:3\\n\");\n}\n\napop_data *draw_some_data(){\n    apop_model *uniform_0_20 = apop_model_set_parameters(apop_uniform, 0, 20);\n    apop_data *d = apop_model_draws(uniform_0_20, 10);\n    apop_data_print(apop_data_sort(d), .output_pipe=stderr);\n    return d;\n}\n\nint main(){\n    apop_data *d = draw_some_data();    \n    apop_model *k = apop_estimate(d, apop_kernel_density);\n    apop_model *k2 = apop_model_set_settings(apop_kernel_density,\n                                         .base_data=d,\n                                         .set_fn = set_uniform_edges,\n                                         .kernel = apop_uniform);\n    plot(k, k2);\n}\n"
  },
  {
    "path": "eg/ks_tests.c",
    "content": "#include <apop.h>\n//This program finds the p-value of a K-S test between\n//500 draws from a N(0, 1) and a N(x, 1), where x grows from 0 to 1.\n\napop_model * model_to_pmfs(apop_model *m1, int size){\n    apop_data *outd1 = apop_model_draws(m1, size);\n    return apop_estimate(apop_data_sort(outd1), apop_pmf);\n}\n\nint main(){\n    apop_model *n1 = apop_model_set_parameters(apop_normal, 0, 1);\n    apop_model *pmf1 = model_to_pmfs(n1, 5e2);\n    apop_data *ktest;\n\n    //first, there should be zero divergence between a PMF and itself:\n    apop_model *pmf2 = apop_model_copy(pmf1);\n    ktest = apop_test_kolmogorov(pmf1, pmf2);\n    double pval = apop_data_get(ktest, .rowname=\"p value, 2 tail\");\n    assert(pval > .999);\n\n    //as the mean m drifts, the pval for a comparison\n    //between a N(0, 1) and N(m, 1) gets smaller.\n    printf(\"mean\\tpval\\n\");\n    double prior_pval = 18;\n    for(double i=0; i<= .6; i+=0.2){\n        apop_model *n11 = apop_model_set_parameters(apop_normal, i, 1);\n        ktest = apop_test_kolmogorov(pmf1, n11);\n        apop_data_print(ktest, NULL);\n        double pval = apop_data_get(ktest, .rowname=\"p value, 2 tail\");\n        assert(pval < prior_pval);\n        printf(\"%g\\t%g\\n\", i, pval);\n        prior_pval = pval;\n    }\n    apop_model_free(pmf1);\n}\n"
  },
  {
    "path": "eg/logit.c",
    "content": "// See http://modelingwithdata.org/arch/00000160.htm for context and analysis.\n\n#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n\nint main(){\n    //read the data to db, get the desired columns,\n    //prep the two categorical variables\n    apop_text_to_db( DATADIR \"/\" \"amash_vote_analysis.csv\" , .tabname=\"amash\");\n    apop_data *d = apop_query_to_mixed_data(\"mmmtt\", \"select 0, ideology,log(contribs+10) as contribs, vote, party from amash\");\n    apop_data_to_factors(d); //0th text col -> 0th matrix col\n    apop_data_to_dummies(d, .col=1, .type='t', .append='y');\n\n    //Estimate a logit model, get covariances,\n    //calculate p values under popular Normality assumptions\n    Apop_model_add_group(apop_logit, apop_parts_wanted, .covariance='y');\n    apop_model *out = apop_estimate(d, apop_logit);\n    apop_model_show(out);\n    for (int i=0; i< out->parameters->matrix->size1; i++){\n        printf(\"%s pval:\\t%g\\n\",out->parameters->names->row[i], \n        apop_test(apop_data_get(out->parameters, i), \"normal\", 0, sqrt(apop_data_get(out->parameters->more, i, i))));\n    }\n}\n"
  },
  {
    "path": "eg/ls_tables.c",
    "content": "#include <apop.h>\n\nvoid print_table_list(char *db_file){\n    apop_db_open(db_file);\n    apop_data *tab_list= apop_query_to_text(\"select name \"\n                    \"from sqlite_master where type=='table'\");\n    for(int i=0; i< tab_list->textsize[0]; i++)\n        printf(\"%s\\n\", tab_list->text[i][0]);\n}\n\nint main(int argc, char **argv){\n    if (argc == 1){\n        printf(\"Give me a database name, and I will print out \"\n               \"the list of tables contained therein.\\n\");\n        return 0; \n    }\n    print_table_list(argv[1]);\n}\n"
  },
  {
    "path": "eg/ml_imputation.c",
    "content": "#include <apop.h>\n\nstatic void compare_mvn_estimates(apop_model *L, apop_model *R, double tolerance){\n    gsl_vector_sub(L->parameters->vector, R->parameters->vector);\n    gsl_matrix_sub(L->parameters->matrix, R->parameters->matrix);\n    assert(fabs(apop_sum(L->parameters->vector)) + fabs (apop_matrix_sum(L->parameters->matrix)) < tolerance);\n}\n\nvoid test_ml_imputation(gsl_rng *r){\n    size_t len = 4e4;\n    int i,j;\n    apop_data *fillme = apop_data_alloc(len, 3);\n    apop_model *mvn = apop_model_copy(apop_multivariate_normal);\n    mvn->parameters = apop_data_alloc(3, 3, 3);\n    for(i=0; i < 3; i ++)\n        for(j=-1; j < 3; j ++)\n            apop_data_set(mvn->parameters, i, j, gsl_rng_uniform(r));\n    //now make your random garbage symmetric\n    for(i=0; i < 3; i ++)\n        for(j=i+1; j < 3; j ++)\n            apop_data_set(mvn->parameters, j, i, apop_data_get(mvn->parameters, i, j));\n    apop_matrix_to_positive_semidefinite(mvn->parameters->matrix);\n    apop_model_draws(mvn, .draws=fillme);\n    //apop_data_show(mvn->parameters);\n    apop_model *est = apop_estimate(fillme, apop_multivariate_normal);\n    //apop_data_show(est->parameters);\n    compare_mvn_estimates(est, mvn, 1e-1);\n\n    double pct_to_delete = 0.01;\n    int max_to_delete = 7, ctr = 0;\n    for(i=0; i < len && ctr < max_to_delete; i ++)\n        for(j=0; j < 3; j ++)\n            if (gsl_rng_uniform(r) < pct_to_delete){\n                apop_data_set(fillme, i, j, GSL_NAN);\n                ctr++;\n            }\n    apop_ml_impute(fillme, mvn); \n    apop_model *est2 = apop_estimate(fillme, apop_multivariate_normal);\n    //apop_data_show(est2->parameters);\n    compare_mvn_estimates(est2, mvn, 1e-1);\n    apop_data_free(fillme);\n}\n\nint main(){\n    test_ml_imputation(apop_rng_alloc(42));\n}\n"
  },
  {
    "path": "eg/normalization_demo.c",
    "content": "#include <apop.h>\n\nint main(void){\ngsl_vector  *in, *out;\n\nin = gsl_vector_calloc(3);\napop_vector_fill(in, 0, 1, 2);\n\nprintf(\"The original vector:\\n\");\napop_vector_print(in);\n\napop_vector_normalize(in, &out, 's');\nprintf(\"Standardized with mean zero and variance one:\\n\");\napop_vector_print(out);\nassert(apop_vector_sum(out)<1e-5);\nassert(fabs((apop_vector_var(out))- 1)<1e-5);\n\napop_vector_normalize(in, &out, 'r');\nprintf(\"Normalized range with max one and min zero:\\n\");\napop_vector_print(out);\nassert(gsl_vector_max(out)==1);\nassert(gsl_vector_min(out)==0);\n\napop_vector_normalize(in, NULL, 'p');\nprintf(\"Normalized into percentages:\\n\");\napop_vector_print(in);\n}\n"
  },
  {
    "path": "eg/ols.c",
    "content": "#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n\nint main(){\n    apop_text_to_db(.text_file= DATADIR \"/\" \"data\" , .tabname=\"d\");\n    apop_data *data = apop_query_to_data(\"select * from d\");\n    apop_model *est = apop_estimate(data, apop_ols);\n    apop_model_print(est);\n}\n"
  },
  {
    "path": "eg/ols2.c",
    "content": "#include <apop.h>\n#include <unistd.h>\n\nint main(void){\n    char *datafile = (access(\"ss08pdc.csv\", R_OK)!=-1) ? \"ss08pdc.csv\" : \"data\";\n    apop_text_to_db(.text_file=datafile, .tabname=\"dc\");\n    apop_data *data = apop_query_to_data(\"select log(pincp+10), agep, sex \"\n                                    \"from dc where agep+ pincp+sex is not null and pincp>=0\");\n    apop_model *est = apop_estimate(data, apop_ols);\n    apop_model_show(est);\n\n    Apop_settings_add_group(est, apop_pm, .index =1);  \n    apop_model *first_param_distribution = apop_parameter_model(data, est);\n\n    Apop_row(est->parameters, 1, param);\n    double area_under_p = apop_cdf(param, first_param_distribution);\n\n    apop_data_set(param, 0, -1, .val=0);\n    double area_under_zero = apop_cdf(param, first_param_distribution);\n    printf(\"reject the null for agep with %g percent confidence.\\n\",\n                                 100*(2*fabs(area_under_p-area_under_zero)));\n}\n"
  },
  {
    "path": "eg/ols_oneliner.c",
    "content": "#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\nint main(){ apop_model_print(apop_estimate(apop_text_to_data( DATADIR \"/\" \"data\" ), apop_ols)); }\n"
  },
  {
    "path": "eg/parameterization.c",
    "content": "#include <apop.h>\n\n#define print_draws(mm) apop_data_print(apop_model_draws(mm, 20),\\\n                                        .output_name= \"draws-\" #mm);\n\nint main(){\n    apop_model *uniform_20 = apop_model_set_parameters(apop_uniform, 0, 20);\n    apop_data *d = apop_model_draws(uniform_20, 10);\n\n    //Estimate a Normal distribution from the data:\n    apop_model *N = apop_estimate(d, apop_normal);\n    print_draws(N);\n\n    //estimate a one-dimensional multivariate Normal from the data:\n    apop_model *mvN = apop_estimate(d, apop_multivariate_normal);\n    print_draws(mvN);\n\n    //fixed parameter list:\n    apop_model *std_normal = apop_model_set_parameters(apop_normal, 0, 1);\n    print_draws(std_normal);\n\n    //variable-size parameter list:\n    apop_model *std_multinormal = apop_model_copy(apop_multivariate_normal);\n    std_multinormal->msize1 =\n    std_multinormal->msize2 =\n    std_multinormal->vsize =\n    std_multinormal->dsize = 3;\n    std_multinormal->parameters = apop_data_falloc((3, 3, 3),\n                                1,  1, 0, 0, \n                                1,  0, 1, 0,\n                                1,  0, 0, 1);\n    print_draws(std_multinormal);\n\n    //estimate a KDE using the defaults:\n    apop_model *k = apop_estimate(d, apop_kernel_density);\n    print_draws(k);\n\n    /*A KDE estimation consists of filling an apop_kernel_density_settings group,\n      so we can set it to use a Normal(μ, 2) kernel via: */\n    apop_model *k2 = apop_model_set_settings(apop_kernel_density,\n                         .base_data=d,\n                         .kernel = apop_model_set_parameters(apop_normal, 0, 2));\n    print_draws(k2);\n}\n"
  },
  {
    "path": "eg/pmf_test.c",
    "content": "#include <apop.h>\n\nlong double pack_p (apop_data *d, apop_model *m){\n    double loss = 0;\n    gsl_vector *v = apop_data_pack(m->parameters, NULL, 'y');\n    int i;\n    for (i=0; i< v->size; i++)\n        loss += fabs(v->data[i] - i);\n    gsl_vector_free(v);\n    return 1/(1+loss);\n}\n\nvoid pack_prep(apop_data *d, apop_model *m){\n    m->parameters = apop_data_alloc(0, 2, 2);\n    apop_data_add_page(m->parameters, apop_data_alloc(0, 2, 2), \"page two\");\n    if (!Apop_settings_get_group(m, apop_mle))\n        Apop_model_add_group(m, apop_mle, .tolerance=1e-6, .step_size=3);\n}\n\nlong double pack_constraint(apop_data *d, apop_model *m){\n    return apop_linear_constraint(apop_data_pack(m->parameters, .more_pages='y'))*1e-5;\n    //penalty size must be smaller than p().\n}\n\napop_model *pack_counter = &(apop_model){\"Optimum is that each element equals its pack order\", .p = pack_p, \n                .prep=pack_prep, .constraint = pack_constraint };\n\nint main(){\n    apop_model *list = apop_estimate(NULL, pack_counter);\n    apop_data_print(list->parameters);\n    printf(\"%g\", fabs( 1- 1/apop_p(NULL, list)));\n    assert(fabs( 1- 1/apop_p(NULL, list))< 4e-2); //lousy.\n}\n"
  },
  {
    "path": "eg/simple_subsets.c",
    "content": "#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n\nint main(){\n    apop_table_exists( DATADIR \"/\" \"data\" , 'd');\n    apop_data *d = apop_text_to_data( DATADIR \"/\" \"data\" );\n\n    //tally row zero of the data set's matrix by viewing it as a vector:\n    gsl_vector *one_row = Apop_rv(d, 0);\n    double sigma = apop_vector_sum(one_row);\n    printf(\"Sum of row zero: %g\\n\", sigma);\n    assert(sigma==14);\n\n    //view column zero as a vector; take its mean\n    double mu = apop_vector_mean(Apop_cv(d, 0));\n    printf(\"Mean of col zero: %g\\n\", mu);\n    assert(fabs(mu - 19./6)<1e-5);\n\n    //get a sub-data set (with names) of two rows beginning at row 3; print to screen\n    apop_data *six_elmts = Apop_rs(d, 3, 2);\n    apop_data_print(six_elmts);\n}\n"
  },
  {
    "path": "eg/some_cdfs.c",
    "content": "#include <apop.h>\n\nint main(){\n    //Set up an apop_data set with only one number.\n    //Most of these functions will only look at the first data point encountered.\n    apop_data *onept = apop_data_falloc((1), 23);\n\n    apop_model *norm = apop_model_set_parameters(apop_normal, 23, 138.8);\n    double val = apop_cdf(onept, norm);\n    assert(fabs(val - 0.5) < 1e-4);\n\n    double tolerance = 1e-8;\n    //Macroizing the sample routine above:\n    #define model_val_cdf(model, value, cdf_result) {   \\\n        apop_data_set(onept, .val=(value));             \\\n        assert(fabs((apop_cdf(onept, model))-(cdf_result))< tolerance);   \\\n    }\n\n    apop_model *uni = apop_model_set_parameters(apop_uniform, 20, 26);\n    model_val_cdf(uni, 0, 0);\n    model_val_cdf(uni, 20, 0);\n    model_val_cdf(uni, 21, 1./6);\n    model_val_cdf(uni, 23, 0.5);\n    model_val_cdf(uni, 25, 5./6);\n    model_val_cdf(uni, 26, 1);\n    model_val_cdf(uni, 260, 1);\n\n    //Improper uniform always returns 1/2.\n    model_val_cdf(apop_improper_uniform, 0, 0.5);\n    model_val_cdf(apop_improper_uniform, 228, 0.5);\n    model_val_cdf(apop_improper_uniform, INFINITY, 0.5);\n\n    apop_model *binom = apop_model_set_parameters(apop_binomial, 2001, 0.5);\n    model_val_cdf(binom, 0, 0);\n    model_val_cdf(binom, 1000, .5);\n    model_val_cdf(binom, 2000, 1);\n\n    apop_model *bernie = apop_model_set_parameters(apop_bernoulli, 0.75);\n    //p(0)=.25; p(1)=.75; that determines the CDF.\n    //Notice that the CDF's integral is over a closed interval.\n    model_val_cdf(bernie, -1, 0);\n    model_val_cdf(bernie, 0, 0.25);\n    model_val_cdf(bernie, 0.1, 0.25);\n    model_val_cdf(bernie, .99, 0.25);\n    model_val_cdf(bernie, 1, 1);\n    model_val_cdf(bernie, INFINITY, 1);\n\n    //alpha=beta -> symmetry\n    apop_model *beta = apop_model_set_parameters(apop_beta, 2, 2);\n    model_val_cdf(beta, -INFINITY, 0);\n    model_val_cdf(beta, 0.5, 0.5);\n    model_val_cdf(beta, INFINITY, 1);\n\n    //This beta distribution -> uniform\n    apop_model *beta_uni = apop_model_set_parameters(apop_beta, 1, 1);\n    model_val_cdf(beta_uni, 0, 0);\n    model_val_cdf(beta_uni, 1./6, 1./6);\n    model_val_cdf(beta_uni, 0.5, 0.5);\n    model_val_cdf(beta_uni, 1, 1);\n\n\n    beta_uni->cdf = NULL; //With no closed-form CDF; make random draws to estimate the CDF.\n    Apop_model_add_group(beta_uni, apop_cdf, .draws=1e6); //extra draws to improve accuracy, but we have to lower our tolerance anyway.\n    tolerance=1e-3;\n    model_val_cdf(beta_uni, 0, 0);\n    model_val_cdf(beta_uni, 1./6, 1./6);\n    model_val_cdf(beta_uni, 0.5, 0.5);\n    model_val_cdf(beta_uni, 1, 1);\n\n\n    //sum of three symmetric distributions: still symmetric.\n    apop_model *sum_of_three = apop_model_mixture(beta, apop_improper_uniform, beta_uni);\n    model_val_cdf(sum_of_three, 0.5, 0.5);\n\n\n    apop_data *threepts = apop_data_falloc((3,1), -1, 0, 1);\n    apop_model *kernels = apop_estimate(threepts, apop_kernel_density);\n    model_val_cdf(kernels, -5, 0);\n    model_val_cdf(kernels, 0, 0.5);\n    model_val_cdf(kernels, 10, 1);\n}\n"
  },
  {
    "path": "eg/sql_to_html.c",
    "content": "#include <apop.h>\n\nint main(){\n    apop_query(\"create table datatab(name, age, sex);\"\n                \"insert into datatab values ('Alex', 23, 'm');\"\n                \"insert into datatab values ('Alex', 32, 'f');\"\n                \"insert into datatab values ('Michael', 41, 'f');\"\n                \"insert into datatab values ('Michael', 14, 'm');\");\n\n    apop_data *cols = apop_text_alloc(NULL, 3, 1);\n    apop_text_set(cols, 0, 0, \"name\");\n    apop_text_set(cols, 1, 0, \"age\");\n    apop_text_set(cols, 2, 0, \"sex\");\n    char *query= apop_text_paste(cols, .before=\"select \", .between=\", \");\n    apop_data *d = apop_query_to_text(\"%s from datatab\", query);\n    char *html_head = apop_text_paste(cols, .before=\"<table><tr><td>\",\n                                .between=\"</td><td>\", .after=\"</tr>\\n<tr><td>\");\n    char *html_table = apop_text_paste(d, .before=html_head, .after=\"</td></tr></table>\\n\",\n                                .between=\"</tr>\\n<tr><td>\", .between_cols=\"</td><td>\");\n    FILE *outfile = fopen(\"yourdata.html\", \"w\");\n    fprintf(outfile, \"%s\", html_table);\n    fclose(outfile);\n}\n"
  },
  {
    "path": "eg/t_test_by_rows.c",
    "content": "#include <apop.h>\n\ndouble row_offset;\n\nvoid offset_rng(double *v){*v = gsl_rng_uniform(apop_rng_get_thread()) + row_offset;}\ndouble find_tstat(gsl_vector *in){ return apop_mean(in)/sqrt(apop_var(in));}\ndouble conf(double in, void *df){ return gsl_cdf_tdist_P(in, *(int *)df);}\n\n//apop_vector_mean is a macro, so we can't point a pointer to it.\ndouble mu(gsl_vector *in){ return apop_vector_mean(in);}\n\nint main(){\n    apop_data *d = apop_data_alloc(10, 100);\n    gsl_rng *r = apop_rng_alloc(3242);\n    for (int i=0; i< 10; i++){\n        row_offset = gsl_rng_uniform(r)*2 -1; //declared and used above.\n        apop_vector_apply(Apop_rv(d, i), offset_rng);\n    }\n\n    int df = d->matrix->size2-1;\n    apop_data *means = apop_map(d, .fn_v = mu, .part ='r');\n    apop_data *tstats = apop_map(d, .fn_v = find_tstat, .part ='r');\n    apop_data *confidences = apop_map(tstats, .fn_dp = conf, .param = &df);\n\n    printf(\"means:\\n\"); apop_data_show(means);\n    printf(\"\\nt stats:\\n\"); apop_data_show(tstats);\n    printf(\"\\nconfidences:\\n\"); apop_data_show(confidences);\n\n    //Some sanity checks, for Apophenia's test suite.\n    for (int i=0; i< 10; i++){\n        //sign of mean == sign of t stat.\n        assert(apop_data_get(means, i, -1) * apop_data_get(tstats, i, -1) >=0);\n\n        //inverse of P-value should be the t statistic.\n        assert(fabs(gsl_cdf_tdist_Pinv(apop_data_get(confidences, i, -1), 99) \n                    - apop_data_get(tstats, i, -1)) < 1e-5);\n    }\n}\n"
  },
  {
    "path": "eg/test_distances.c",
    "content": "#include <apop.h>\n\n/* Test distance calculations using a 3-4-5 triangle */\nint main(){\n    gsl_vector *v1 = gsl_vector_alloc(2);\n    gsl_vector *v2 = gsl_vector_alloc(2);\n    apop_vector_fill(v1, 2, 2);\n    apop_vector_fill(v2, 5, 6);\n\n    assert(apop_vector_distance(v1, v1, 'd') == 0);\n    assert(apop_vector_distance(v1, v2, 'd') == 1);\n    assert(apop_vector_distance(v1, .metric='m') == 4);\n    assert(apop_vector_distance(v2, .metric='s') == 6);\n    assert(apop_vector_distance(v1,v2) == 5.); //the hypotenuse of the 3-4-5 triangle\n    assert(apop_vector_distance(v1,v2, 'm') == 7.);\n    assert(apop_vector_distance(v1,v2, 'L', 2) == 5.);  //L_2 norm == Euclidean\n}\n"
  },
  {
    "path": "eg/test_fisher.c",
    "content": "#include <apop.h>\n\nint main() {\n    /* This test is thanks to Nick Eriksson, who sent it to me in the form of a bug report. */\n    apop_data * testdata = apop_data_falloc((2, 3),\n                              30, 50, 45, \n                              34, 12, 17 );\n    apop_data * t2 = apop_test_fisher_exact(testdata);\n    assert(fabs(apop_data_get(t2,.rowname=\"p value\") - 0.0001761) < 1e-6);\n}\n"
  },
  {
    "path": "eg/test_harmonic.c",
    "content": "#include <apop.h>\n\nint main(){\n    double out = apop_generalized_harmonic(270, 0.0);\n\tassert (out == 270);\n\tout\t= apop_generalized_harmonic(370, -1.0);\n\tassert (out == 370*371/2);\n\tout\t= apop_generalized_harmonic(12, -1.0);\n\tassert (out == 12*13/2);\n}\n"
  },
  {
    "path": "eg/test_kl_divergence.c",
    "content": "#include <apop.h>\n\nlong double fake_p (apop_data *d, apop_model *m){\n    return apop_pmf->p(d, m);\n}\n\nint main(){\n    gsl_rng *r = apop_rng_alloc(2312311);\n    int empirical_size = 5e3;\n    apop_model *expo = apop_model_set_parameters(apop_exponential, 1.7);\n    //divergence from self should be zero.\n    assert (fabs(apop_kl_divergence(expo, expo)) < 1e-4);\n\n    apop_data *empirical = apop_model_draws(expo, .count=empirical_size);\n    //Double the odds of half the data, so likelihoods aren't uniform.\n    int half =empirical_size/2;\n    apop_data *start = Apop_rs(empirical, 0, half);\n    empirical = apop_data_stack(empirical, start);\n    apop_data_pmf_compress(empirical);\n\n    //Compare the PMF calculator to the everything else calculator\n    apop_model *pmf = apop_estimate(empirical, apop_pmf);\n    double div= apop_kl_divergence(pmf,expo);\n    pmf->p = fake_p;\n    double div2= apop_kl_divergence(pmf,expo);\n    printf(\"%g %g\\n\", div, div2);\n    assert(fabs(div-div2)<9e-3);\n    apop_data_free(empirical);\n}\n"
  },
  {
    "path": "eg/test_pruning.c",
    "content": "#include <apop.h>\n\n// This sample produces a dummy times table, gets a summary, and prunes the summary table.\nint main(){\n    int i, j;\n    apop_data *d = apop_data_alloc(0, 10, 4);\n    for (i=0; i< 10; i++)\n        for (j=0; j< 4; j++)\n            apop_data_set(d, i, j, i*j);\n    apop_data *summary = apop_data_summarize(d);\n    apop_data_prune_columns(summary, \"mean\", \"median\");\n    assert(apop_name_find(summary->names, \"mean\", 'c')!=-2);\n    assert(apop_name_find(summary->names, \"median\", 'c')!=-2);\n    assert(apop_name_find(summary->names, \"max\", 'c')==-2); //not found\n    assert(apop_name_find(summary->names, \"variance\", 'c')==-2); //not found\n    assert(apop_data_get(summary, .row=0, .colname=\"mean\")==0);\n    assert(apop_data_get(summary, .row=1, .colname=\"median\")==4);\n    assert(apop_data_get(summary, .row=2, .colname=\"median\")==8);\n    apop_data_show(summary);\n}\n"
  },
  {
    "path": "eg/test_ranks.c",
    "content": "/* A round trip: generate Zipf-distributed draws, summarize them to a single list of\nrankings, then expand the rankings to a list of single entries. The sorted list at the end\nof this should be identical to the (sorted) original list. */\n#include <apop.h>\n\nint main(){\n    gsl_rng *r = apop_rng_alloc(2342);\n    int i, length = 1e4;\n    apop_model *a_zipf = apop_model_set_parameters(apop_zipf, 3.2);\n    apop_data *draws = apop_data_alloc(length);\n    for (i=0; i< length; i++)\n        apop_draw(apop_data_ptr(draws, i, -1), r, a_zipf);\n    apop_data *by_rankings = apop_data_rank_compress(draws);\n    //The first row of the matrix is suitable for plotting.\n    //apop_data_show(by_rankings);\n    assert(apop_matrix_sum(by_rankings->matrix) == length);\n\n    apop_data *re_expanded = apop_data_rank_expand(by_rankings);\n    gsl_sort_vector(draws->vector);\n    gsl_sort_vector(re_expanded->vector);\n    assert(apop_vector_distance(draws->vector, re_expanded->vector) < 1e-5);\n}\n"
  },
  {
    "path": "eg/test_regex.c",
    "content": "#include <apop.h>\nint main(){\n    char string1[] = \"Hello. I am a string.\";\n    assert(apop_regex(string1, \"hell\"));\n    apop_data *subs;\n    apop_regex(string1, \"(e).*I.*(xxx)*(am)\", .substrings = &subs);\n    //apop_data_show(subs);\n    assert(!strcmp(subs->text[0][0], \"e\"));\n    assert(!strlen(subs->text[0][1])); //The non-match to (xx)* has a zero-length blank\n    assert(!strcmp(subs->text[0][2], \"am\"));\n    apop_data_free(subs);\n\n    //Split a comma-delimited list, throwing out white space.\n    //Notice that the regex includes only one instance of a non-comma blob \n    //ending in a non-space followed by a comma, but the function keeps \n    //applying it until the end of string.\n    char string2[] = \" one, two , three ,four\";\n    apop_regex(string2, \" *([^,]*[^ ]) *(,|$) *\", &subs);\n    assert(!strcmp(*subs->text[0], \"one\"));\n    assert(!strcmp(*subs->text[1], \"two\"));\n    assert(!strcmp(*subs->text[2], \"three\"));\n    assert(!strcmp(*subs->text[3], \"four\"));\n    apop_data_free(subs);\n\n    //Get a parenthetical. For EREs, \\( \\) match plain parens in the text.\n    char string3[] = \" one (but secretly, two)\";\n    apop_regex(string3, \"(\\\\([^)]*\\\\))\", &subs);\n    assert(!strcmp(*subs->text[0], \"(but secretly, two)\"));\n    apop_data_free(subs);\n\n    //NULL input string ==> no-op.\n    int match_count = apop_regex(NULL, \" *([^,]*[^ ]) *(,|$) *\", &subs);\n    assert(!match_count);\n    assert(!subs);\n}\n"
  },
  {
    "path": "eg/test_updating.c",
    "content": "#include <apop.h>\n\n//For the test suite.\nvoid distances(gsl_vector *v1, gsl_vector *v2, double tol){\n    double error = apop_vector_distance(v1, v2, .metric='m');\n    double updated_size = apop_vector_sum(v1);\n    Apop_stopif(error/updated_size > tol, exit(1), 0, \"The error is %g, which is too big.\", error/updated_size);\n}\n\nint main(){\n    double binom_start = 0.6;\n    double beta_start_a = 0.3;\n    double beta_start_b = 0.5;\n    double n = 4000;\n    //First, the easy estimation using the conjugate distribution table.\n    apop_model *bin = apop_model_set_parameters(apop_binomial, n, binom_start);\n    apop_model *beta = apop_model_set_parameters(apop_beta, beta_start_a, beta_start_b);\n    apop_model *updated = apop_update(.prior= beta, .likelihood=bin);\n\n    //Now estimate via MCMC. \n    //Requires a one-parameter binomial, with n fixed,\n    //and a data set of n data points with the right p.\n    apop_model *bcopy = apop_model_set_parameters(apop_binomial, n, GSL_NAN);\n    apop_data *bin_draws = apop_data_falloc((1,2), n*(1-binom_start), n*binom_start);\n    bin = apop_model_fix_params(bcopy);\n    Apop_settings_add_group(beta, apop_mcmc, .burnin=.2, .periods=1e5);\n\n    apop_model *out_h = apop_update(bin_draws, beta, bin, NULL);\n    apop_model *out_beta = apop_estimate(out_h->data, apop_beta);\n\n    //Finally, we can compare the conjugate and Gibbs results:\n    distances(updated->parameters->vector, out_beta->parameters->vector, 0.02);\n\n    //The apop_update function used apop_model_metropolis to generate\n    //a batch of draws, so the draw method for out_h is apop_model_metropolis_draw.\n    //So, here we make more draws using metropolis, and compare the beta\n    //distribution that fits to those draws to the beta distribution output above.\n    int draws = 1.3e5;\n    apop_data *d = apop_model_draws(out_h, draws);\n    apop_model *drawn = apop_estimate(d, apop_beta);\n    distances(updated->parameters->vector, drawn->parameters->vector, 0.02);\n}\n"
  },
  {
    "path": "eg/text_demo.c",
    "content": "#include <apop.h>\n\nint main(){\n    apop_query(\"create table data (name, city, state);\"\n            \"insert into data values ('Mike Mills', 'Rockville', 'MD');\"\n            \"insert into data values ('Bill Berry', 'Athens', 'GA');\"\n            \"insert into data values ('Michael Stipe', 'Decatur', 'GA');\");\n    apop_data *tdata = apop_query_to_text(\"select name, city, state from data\");\n    printf(\"Customer #1: %s\\n\\n\", *tdata->text[0]);\n\n    printf(\"The data, via apop_data_print:\\n\");\n    apop_data_print(tdata);\n\n    //the text alloc can be used as a text realloc:\n    apop_text_alloc(tdata, 1+tdata->textsize[0], tdata->textsize[1]);\n    apop_text_set(tdata, *tdata->textsize-1, 0, \"Peter Buck\");\n    apop_text_set(tdata, *tdata->textsize-1, 1, \"Berkeley\");\n    apop_text_set(tdata, *tdata->textsize-1, 2, \"CA\");\n\n    printf(\"\\n\\nAugmented data, printed via for loop:\\n\");\n    for (int i=0; i< tdata->textsize[0]; i++){\n        for (int j=0; j< tdata->textsize[1]; j++)\n            printf(\"%s\\t\", tdata->text[i][j]);\n        printf(\"\\n\");\n    }\n\n    apop_data *states = apop_text_unique_elements(tdata, 2);\n    char *states_as_list = apop_text_paste(states, .between=\", \");\n    printf(\"\\n States covered: %s\\n\", states_as_list);\n}\n"
  },
  {
    "path": "eg/transform.c",
    "content": "#include <apop.h>\n\n// For defining the bounds the data-constraining function\n// needs to enforce.\ndouble greater_than_zero(apop_data *d, apop_model *m){\n    return apop_data_get(d) > 0;\n}\n\nint main(){\n    apop_model_print (\n        apop_estimate(\n             apop_update(\n                apop_model_draws(\n                    apop_model_mixture(\n                        apop_model_set_parameters(apop_poisson, 2.8),\n                        apop_model_set_parameters(apop_poisson, 2.0),\n                        apop_model_set_parameters(apop_poisson, 1.3)\n                    ), \n                    1e4\n                ),\n                apop_model_dconstrain(\n                    .base_model=apop_model_set_parameters(apop_normal, 2, 1), \n                    .constraint=greater_than_zero\n                ),\n                apop_poisson\n            )->data,\n            apop_normal\n        )\n    );\n}\n"
  },
  {
    "path": "install/COPYING",
    "content": "\t\t    GNU GENERAL PUBLIC LICENSE\n\t\t       Version 2, June 1991\n\n Copyright (C) 1989, 1991 Free Software Foundation, Inc.\n     51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA\n Everyone is permitted to copy and distribute verbatim copies\n of this license document, but changing it is not allowed.\n\n\t\t\t    Preamble\n\n  The licenses for most software are designed to take away your\nfreedom to share and change it.  By contrast, the GNU General Public\nLicense is intended to guarantee your freedom to share and change free\nsoftware--to make sure the software is free for all its users.  This\nGeneral Public License applies to most of the Free Software\nFoundation's software and to any other program whose authors commit to\nusing it.  (Some other Free Software Foundation software is covered by\nthe GNU Library General Public License instead.)  You can apply it to\nyour programs, too.\n\n  When we speak of free software, we are referring to freedom, not\nprice.  Our General Public Licenses are designed to make sure that you\nhave the freedom to distribute copies of free software (and charge for\nthis service if you wish), that you receive source code or can get it\nif you want it, that you can change the software or use pieces of it\nin new free programs; and that you know you can do these things.\n\n  To protect your rights, we need to make restrictions that forbid\nanyone to deny you these rights or to ask you to surrender the rights.\nThese restrictions translate to certain responsibilities for you if you\ndistribute copies of the software, or if you modify it.\n\n  For example, if you distribute copies of such a program, whether\ngratis or for a fee, you must give the recipients all the rights that\nyou have.  You must make sure that they, too, receive or can get the\nsource code.  And you must show them these terms so they know their\nrights.\n\n  We protect your rights with two steps: (1) copyright the software, and\n(2) offer you this license which gives you legal permission to copy,\ndistribute and/or modify the software.\n\n  Also, for each author's protection and ours, we want to make certain\nthat everyone understands that there is no warranty for this free\nsoftware.  If the software is modified by someone else and passed on, we\nwant its recipients to know that what they have is not the original, so\nthat any problems introduced by others will not reflect on the original\nauthors' reputations.\n\n  Finally, any free program is threatened constantly by software\npatents.  We wish to avoid the danger that redistributors of a free\nprogram will individually obtain patent licenses, in effect making the\nprogram proprietary.  To prevent this, we have made it clear that any\npatent must be licensed for everyone's free use or not licensed at all.\n\n  The precise terms and conditions for copying, distribution and\nmodification follow.\n\f\n\t\t    GNU GENERAL PUBLIC LICENSE\n   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION\n\n  0. This License applies to any program or other work which contains\na notice placed by the copyright holder saying it may be distributed\nunder the terms of this General Public License.  The \"Program\", below,\nrefers to any such program or work, and a \"work based on the Program\"\nmeans either the Program or any derivative work under copyright law:\nthat is to say, a work containing the Program or a portion of it,\neither verbatim or with modifications and/or translated into another\nlanguage.  (Hereinafter, translation is included without limitation in\nthe term \"modification\".)  Each licensee is addressed as \"you\".\n\nActivities other than copying, distribution and modification are not\ncovered by this License; they are outside its scope.  The act of\nrunning the Program is not restricted, and the output from the Program\nis covered only if its contents constitute a work based on the\nProgram (independent of having been made by running the Program).\nWhether that is true depends on what the Program does.\n\n  1. You may copy and distribute verbatim copies of the Program's\nsource code as you receive it, in any medium, provided that you\nconspicuously and appropriately publish on each copy an appropriate\ncopyright notice and disclaimer of warranty; keep intact all the\nnotices that refer to this License and to the absence of any warranty;\nand give any other recipients of the Program a copy of this License\nalong with the Program.\n\nYou may charge a fee for the physical act of transferring a copy, and\nyou may at your option offer warranty protection in exchange for a fee.\n\n  2. You may modify your copy or copies of the Program or any portion\nof it, thus forming a work based on the Program, and copy and\ndistribute such modifications or work under the terms of Section 1\nabove, provided that you also meet all of these conditions:\n\n    a) You must cause the modified files to carry prominent notices\n    stating that you changed the files and the date of any change.\n\n    b) You must cause any work that you distribute or publish, that in\n    whole or in part contains or is derived from the Program or any\n    part thereof, to be licensed as a whole at no charge to all third\n    parties under the terms of this License.\n\n    c) If the modified program normally reads commands interactively\n    when run, you must cause it, when started running for such\n    interactive use in the most ordinary way, to print or display an\n    announcement including an appropriate copyright notice and a\n    notice that there is no warranty (or else, saying that you provide\n    a warranty) and that users may redistribute the program under\n    these conditions, and telling the user how to view a copy of this\n    License.  (Exception: if the Program itself is interactive but\n    does not normally print such an announcement, your work based on\n    the Program is not required to print an announcement.)\n\f\nThese requirements apply to the modified work as a whole.  If\nidentifiable sections of that work are not derived from the Program,\nand can be reasonably considered independent and separate works in\nthemselves, then this License, and its terms, do not apply to those\nsections when you distribute them as separate works.  But when you\ndistribute the same sections as part of a whole which is a work based\non the Program, the distribution of the whole must be on the terms of\nthis License, whose permissions for other licensees extend to the\nentire whole, and thus to each and every part regardless of who wrote it.\n\nThus, it is not the intent of this section to claim rights or contest\nyour rights to work written entirely by you; rather, the intent is to\nexercise the right to control the distribution of derivative or\ncollective works based on the Program.\n\nIn addition, mere aggregation of another work not based on the Program\nwith the Program (or with a work based on the Program) on a volume of\na storage or distribution medium does not bring the other work under\nthe scope of this License.\n\n  3. You may copy and distribute the Program (or a work based on it,\nunder Section 2) in object code or executable form under the terms of\nSections 1 and 2 above provided that you also do one of the following:\n\n    a) Accompany it with the complete corresponding machine-readable\n    source code, which must be distributed under the terms of Sections\n    1 and 2 above on a medium customarily used for software interchange; or,\n\n    b) Accompany it with a written offer, valid for at least three\n    years, to give any third party, for a charge no more than your\n    cost of physically performing source distribution, a complete\n    machine-readable copy of the corresponding source code, to be\n    distributed under the terms of Sections 1 and 2 above on a medium\n    customarily used for software interchange; or,\n\n    c) Accompany it with the information you received as to the offer\n    to distribute corresponding source code.  (This alternative is\n    allowed only for noncommercial distribution and only if you\n    received the program in object code or executable form with such\n    an offer, in accord with Subsection b above.)\n\nThe source code for a work means the preferred form of the work for\nmaking modifications to it.  For an executable work, complete source\ncode means all the source code for all modules it contains, plus any\nassociated interface definition files, plus the scripts used to\ncontrol compilation and installation of the executable.  However, as a\nspecial exception, the source code distributed need not include\nanything that is normally distributed (in either source or binary\nform) with the major components (compiler, kernel, and so on) of the\noperating system on which the executable runs, unless that component\nitself accompanies the executable.\n\nIf distribution of executable or object code is made by offering\naccess to copy from a designated place, then offering equivalent\naccess to copy the source code from the same place counts as\ndistribution of the source code, even though third parties are not\ncompelled to copy the source along with the object code.\n\f\n  4. You may not copy, modify, sublicense, or distribute the Program\nexcept as expressly provided under this License.  Any attempt\notherwise to copy, modify, sublicense or distribute the Program is\nvoid, and will automatically terminate your rights under this License.\nHowever, parties who have received copies, or rights, from you under\nthis License will not have their licenses terminated so long as such\nparties remain in full compliance.\n\n  5. You are not required to accept this License, since you have not\nsigned it.  However, nothing else grants you permission to modify or\ndistribute the Program or its derivative works.  These actions are\nprohibited by law if you do not accept this License.  Therefore, by\nmodifying or distributing the Program (or any work based on the\nProgram), you indicate your acceptance of this License to do so, and\nall its terms and conditions for copying, distributing or modifying\nthe Program or works based on it.\n\n  6. Each time you redistribute the Program (or any work based on the\nProgram), the recipient automatically receives a license from the\noriginal licensor to copy, distribute or modify the Program subject to\nthese terms and conditions.  You may not impose any further\nrestrictions on the recipients' exercise of the rights granted herein.\nYou are not responsible for enforcing compliance by third parties to\nthis License.\n\n  7. If, as a consequence of a court judgment or allegation of patent\ninfringement or for any other reason (not limited to patent issues),\nconditions are imposed on you (whether by court order, agreement or\notherwise) that contradict the conditions of this License, they do not\nexcuse you from the conditions of this License.  If you cannot\ndistribute so as to satisfy simultaneously your obligations under this\nLicense and any other pertinent obligations, then as a consequence you\nmay not distribute the Program at all.  For example, if a patent\nlicense would not permit royalty-free redistribution of the Program by\nall those who receive copies directly or indirectly through you, then\nthe only way you could satisfy both it and this License would be to\nrefrain entirely from distribution of the Program.\n\nIf any portion of this section is held invalid or unenforceable under\nany particular circumstance, the balance of the section is intended to\napply and the section as a whole is intended to apply in other\ncircumstances.\n\nIt is not the purpose of this section to induce you to infringe any\npatents or other property right claims or to contest validity of any\nsuch claims; this section has the sole purpose of protecting the\nintegrity of the free software distribution system, which is\nimplemented by public license practices.  Many people have made\ngenerous contributions to the wide range of software distributed\nthrough that system in reliance on consistent application of that\nsystem; it is up to the author/donor to decide if he or she is willing\nto distribute software through any other system and a licensee cannot\nimpose that choice.\n\nThis section is intended to make thoroughly clear what is believed to\nbe a consequence of the rest of this License.\n\f\n  8. If the distribution and/or use of the Program is restricted in\ncertain countries either by patents or by copyrighted interfaces, the\noriginal copyright holder who places the Program under this License\nmay add an explicit geographical distribution limitation excluding\nthose countries, so that distribution is permitted only in or among\ncountries not thus excluded.  In such case, this License incorporates\nthe limitation as if written in the body of this License.\n\n  9. The Free Software Foundation may publish revised and/or new versions\nof the General Public License from time to time.  Such new versions will\nbe similar in spirit to the present version, but may differ in detail to\naddress new problems or concerns.\n\nEach version is given a distinguishing version number.  If the Program\nspecifies a version number of this License which applies to it and \"any\nlater version\", you have the option of following the terms and conditions\neither of that version or of any later version published by the Free\nSoftware Foundation.  If the Program does not specify a version number of\nthis License, you may choose any version ever published by the Free Software\nFoundation.\n\n  10. If you wish to incorporate parts of the Program into other free\nprograms whose distribution conditions are different, write to the author\nto ask for permission.  For software which is copyrighted by the Free\nSoftware Foundation, write to the Free Software Foundation; we sometimes\nmake exceptions for this.  Our decision will be guided by the two goals\nof preserving the free status of all derivatives of our free software and\nof promoting the sharing and reuse of software generally.\n\n\t\t\t    NO WARRANTY\n\n  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY\nFOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN\nOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES\nPROVIDE THE PROGRAM \"AS IS\" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED\nOR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF\nMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS\nTO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE\nPROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,\nREPAIR OR CORRECTION.\n\n  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\nWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR\nREDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,\nINCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING\nOUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED\nTO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY\nYOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER\nPROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE\nPOSSIBILITY OF SUCH DAMAGES.\n\n\t\t     END OF TERMS AND CONDITIONS\n\f\n\t    How to Apply These Terms to Your New Programs\n\n  If you develop a new program, and you want it to be of the greatest\npossible use to the public, the best way to achieve this is to make it\nfree software which everyone can redistribute and change under these terms.\n\n  To do so, attach the following notices to the program.  It is safest\nto attach them to the start of each source file to most effectively\nconvey the exclusion of warranty; and each file should have at least\nthe \"copyright\" line and a pointer to where the full notice is found.\n\n    <one line to give the program's name and a brief idea of what it does.>\n    Copyright (C) <year>  <name of author>\n\n    This program is free software; you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation; either version 2 of the License, or\n    (at your option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    GNU General Public License for more details.\n\n    You should have received a copy of the GNU General Public License\n    along with this program; if not, write to the Free Software\n    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA\n\n\nAlso add information on how to contact you by electronic and paper mail.\n\nIf the program is interactive, make it output a short notice like this\nwhen it starts in an interactive mode:\n\n    Gnomovision version 69, Copyright (C) year  name of author\n    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.\n    This is free software, and you are welcome to redistribute it\n    under certain conditions; type `show c' for details.\n\nThe hypothetical commands `show w' and `show c' should show the appropriate\nparts of the General Public License.  Of course, the commands you use may\nbe called something other than `show w' and `show c'; they could even be\nmouse-clicks or menu items--whatever suits your program.\n\nYou should also get your employer (if you work as a programmer) or your\nschool, if any, to sign a \"copyright disclaimer\" for the program, if\nnecessary.  Here is a sample; alter the names:\n\n  Yoyodyne, Inc., hereby disclaims all copyright interest in the program\n  `Gnomovision' (which makes passes at compilers) written by James Hacker.\n\n  <signature of Ty Coon>, 1 April 1989\n  Ty Coon, President of Vice\n\nThis General Public License does not permit incorporating your program into\nproprietary programs.  If your program is a subroutine library, you may\nconsider it more useful to permit linking proprietary applications with the\nlibrary.  If this is what you want to do, use the GNU Library General\nPublic License instead of this License.\n"
  },
  {
    "path": "install/Makefile.am",
    "content": "ACLOCAL_AMFLAGS = -I m4\n\nAUTOMAKE_OPTIONS = \\\n\tdist-xz \\\n\tdist-bzip2 \\\n\tdist-zip\n\nAM_DISTCHECK_CONFIGURE_FLAGS ?= \\\n\t--disable-maintainer-mode \\\n\t--enable-extended-tests\n\nAM_CFLAGS = -g -Wall -O3\n\n## Library versioning (C:R:A == current:revision:age)\n## 0.999b 0:0:0\n## 0.999c 1:0:0\n## 0.999e 2:0:0\nLIBAPOPHENIA_LT_VERSION = 2:0:0\n\nSUBDIRS = transform model . cmd eg tests docs\n\n\ninclude_HEADERS = apop.h\n\npkgconfigdir = $(libdir)/pkgconfig\npkgconfig_DATA= apophenia.pc\n\nlib_LTLIBRARIES = libapophenia.la\n\nlibapophenia_la_LD_VERSION_SCRIPT=\nif HAVE_LD_VERSION_SCRIPT\nlibapophenia_la_LD_VERSION_SCRIPT+= -Wl,--version-script=$(top_srcdir)/apophenia.map\nendif\n\nSUBLIBS = \\\n\tlibapopkernel.la \\\n\ttransform/libapoptransform.la \\\n\tmodel/libapopmodel.la\n\nlibapophenia_la_SOURCES = \\\n\tasprintf.c\n\nnoinst_LTLIBRARIES = libapopkernel.la\n\nnoinst_HEADERS = apop_internal.h\n\nlibapopkernel_la_SOURCES = \\\n\tapop_arms.c \\\n\tapop_asst.c \\\n\tapop_bootstrap.c \\\n\tapop_conversions.c \\\n\tapop_data.c \\\n\tapop_db.c \\\n\tapop_fexact.c \\\n\tapop_hist.c \\\n\tapop_linear_algebra.c \\\n\tapop_linear_constraint.c \\\n\tapop_mapply.c \\\n\tapop_mcmc.c \\\n\tapop_missing_data.c \\\n\tapop_mle.c apop_model.c \\\n\tapop_name.c \\\n\tapop_output.c \\\n\tapop_rake.c \\\n\tapop_regression.c \\\n\tapop_settings.c \\\n\tapop_sort.c \\\n\tapop_stats.c \\\n\tapop_tests.c \\\n\tapop_update.c\t\\\n\tapop_vtables.c\n\napop_db_INCLUDES = \\\n\tapop_db_mysql.c \\\n\tapop_db_sqlite.c\n\napop_db.c: $(apop_db_INCLUDES)\n\nlibapopkernel_la_CFLAGS = \\\n\t$(PTHREAD_CFLAGS) \\\n\t$(OPENMP_CFLAGS) \\\n\t$(MYSQL_CFLAGS) \\\n\t$(SQLITE3_CFLAGS) \\\n\t$(GSL_CFLAGS)\n\nlibapophenia_la_LDFLAGS = \\\n\t-version-info $(LIBAPOPHENIA_LT_VERSION) \\\n\t$(libapophenia_la_LD_VERSION_SCRIPT)\n\nlibapophenia_la_LIBADD = \\\n\t$(SUBLIBS) \\\n\t$(MYSQL_LDFLAGS) \\\n\t$(SQLITE3_LDFLAGS) \\\n\t$(GSL_LIBS) \\\n\t$(PTHREAD_LIBS) \\\n\t$(LIBM)\n\nEXTRA_DIST = \\\n\trpm.spec \\\n\tapophenia.pc.in \\\n\tapophenia.map\n\nEXTRA_DIST += \\\n\t$(apop_db_INCLUDES)\n\n## compatibility\ndoc:\n\t-$(MAKE) -C docs doc\n"
  },
  {
    "path": "install/Readme-pkg",
    "content": "Apophenia is an open statistical library for working with data sets and statistical or simulation models. It provides functions on the same level as those of the typical stats package (such as OLS, probit, or singular value decomposition) but gives the user more flexibility to be creative in model-building. Being in C, it is often an order of magnitude faster when searching for optima or running MCMC chains. The core functions are written in C, but experience has shown them to be easy to bind to Python/Julia/Perl/Ruby/&c.\n\nhttp://apophenia.info/gentle.html provides an overview of the basics of using the library. If you want to know more about the package, see the web site, http://apophenia.info, or have a look at the textbook from Princeton University Press that coevolved with Apophenia, downloadable from http://modelingwithdata.org .\n\n\nInstallation summary:\n\n∙ The library depends on the GNU Scientific Library and SQLite3. If you are using a system with a package manager of some sort, there is certainly a package for them. Be sure to include both the main package and the lib-, -dev, or -devel package. Sample package manager calls:\n\n    sudo apt-get install make gcc libgsl0-dev libsqlite3-dev \nor\n    sudo yum install make gcc gsl-devel libsqlite3x-devel\nor\n    sudo pacman -S make gcc gsl sqlite \n\n\n∙ Got the dependencies? Great, then run:\n\n    ./configure\n    make \n    sudo make install\n\n\n∙ Find detailed setup instructions and some troubleshooting notes at\nhttp://apophenia.info/setup.html .\n\n\nThanks for your interest. I do hope that Apophenia helps you learn more from your data.\n\n--BK\n"
  },
  {
    "path": "install/Readme-pkg-debian",
    "content": "The Debian [1] package for the Apophenia Statistical C Library\nis currently maintained by the Debian Science Team [2,3]:\nthe git repository for the apophenia Debian package is at Alioth [4].\n\nTo install the Apophenia standard suite on Debian based systems:\n\tapt-get update\n\tapt-get install apophenia-bin apophenia-doc\n\nFor a complete and up-to-date list of Apophenia related packages,\nyou may search at the distribution webpage dedicated to Packages [5,6].\n\n[1] https://www.debian.org/\n[2] https://wiki.debian.org/DebianScience\n[3] https://qa.debian.org/developer.php?login=debian-science-maintainers@lists.alioth.debian.org\n[4] http://anonscm.debian.org/cgit/debian-science/packages/apophenia.git/\n[5] https://packages.debian.org/search?keywords=apophenia\n[6] http://packages.ubuntu.com/search?keywords=apophenia\n"
  },
  {
    "path": "install/acinclude.m4",
    "content": "# Part one decides whether to use -pthread or -lpthread\n# Part two is for __attribute__\n# Part three is gl_LD_VERSION_SCRIPT\n\n# ===========================================================================\n#           http://www.nongnu.org/autoconf-archive/acx_pthread.html\n# ===========================================================================\n#\n# SYNOPSIS\n#\n#   ACX_PTHREAD([ACTION-IF-FOUND[, ACTION-IF-NOT-FOUND]])\n#\n# DESCRIPTION\n#\n#   This macro figures out how to build C programs using POSIX threads. It\n#   sets the PTHREAD_LIBS output variable to the threads library and linker\n#   flags, and the PTHREAD_CFLAGS output variable to any special C compiler\n#   flags that are needed. (The user can also force certain compiler\n#   flags/libs to be tested by setting these environment variables.)\n#\n#   Also sets PTHREAD_CC to any special C compiler that is needed for\n#   multi-threaded programs (defaults to the value of CC otherwise). (This\n#   is necessary on AIX to use the special cc_r compiler alias.)\n#\n#   NOTE: You are assumed to not only compile your program with these flags,\n#   but also link it with them as well. e.g. you should link with\n#   $PTHREAD_CC $CFLAGS $PTHREAD_CFLAGS $LDFLAGS ... $PTHREAD_LIBS $LIBS\n#\n#   If you are only building threads programs, you may wish to use these\n#   variables in your default LIBS, CFLAGS, and CC:\n#\n#          LIBS=\"$PTHREAD_LIBS $LIBS\"\n#          CFLAGS=\"$CFLAGS $PTHREAD_CFLAGS\"\n#          CC=\"$PTHREAD_CC\"\n#\n#   In addition, if the PTHREAD_CREATE_JOINABLE thread-attribute constant\n#   has a nonstandard name, defines PTHREAD_CREATE_JOINABLE to that name\n#   (e.g. PTHREAD_CREATE_UNDETACHED on AIX).\n#\n#   ACTION-IF-FOUND is a list of shell commands to run if a threads library\n#   is found, and ACTION-IF-NOT-FOUND is a list of commands to run it if it\n#   is not found. If ACTION-IF-FOUND is not specified, the default action\n#   will define HAVE_PTHREAD.\n#\n#   Please let the authors know if this macro fails on any platform, or if\n#   you have any other suggestions or comments. This macro was based on work\n#   by SGJ on autoconf scripts for FFTW (http://www.fftw.org/) (with help\n#   from M. Frigo), as well as ac_pthread and hb_pthread macros posted by\n#   Alejandro Forero Cuervo to the autoconf macro repository. We are also\n#   grateful for the helpful feedback of numerous users.\n#\n# LICENSE\n#\n#   Copyright (c) 2008 Steven G. Johnson <stevenj@alum.mit.edu>\n#\n#   This program is free software: you can redistribute it and/or modify it\n#   under the terms of the GNU General Public License as published by the\n#   Free Software Foundation, either version 3 of the License, or (at your\n#   option) any later version.\n#\n#   This program is distributed in the hope that it will be useful, but\n#   WITHOUT ANY WARRANTY; without even the implied warranty of\n#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General\n#   Public License for more details.\n#\n#   You should have received a copy of the GNU General Public License along\n#   with this program. If not, see <http://www.gnu.org/licenses/>.\n#\n#   As a special exception, the respective Autoconf Macro's copyright owner\n#   gives unlimited permission to copy, distribute and modify the configure\n#   scripts that are the output of Autoconf when processing the Macro. You\n#   need not follow the terms of the GNU General Public License when using\n#   or distributing such scripts, even though portions of the text of the\n#   Macro appear in them. The GNU General Public License (GPL) does govern\n#   all other use of the material that constitutes the Autoconf Macro.\n#\n#   This special exception to the GPL applies to versions of the Autoconf\n#   Macro released by the Autoconf Archive. When you make and distribute a\n#   modified version of the Autoconf Macro, you may extend this special\n#   exception to the GPL to apply to your modified version as well.\n\nAC_DEFUN([ACX_PTHREAD], [\nAC_REQUIRE([AC_CANONICAL_HOST])\nAC_LANG_SAVE\nAC_LANG_C\nacx_pthread_ok=no\n\n# We used to check for pthread.h first, but this fails if pthread.h\n# requires special compiler flags (e.g. on True64 or Sequent).\n# It gets checked for in the link test anyway.\n\n# First of all, check if the user has set any of the PTHREAD_LIBS,\n# etcetera environment variables, and if threads linking works using\n# them:\nif test x\"$PTHREAD_LIBS$PTHREAD_CFLAGS\" != x; then\n        save_CFLAGS=\"$CFLAGS\"\n        CFLAGS=\"$CFLAGS $PTHREAD_CFLAGS\"\n        save_LIBS=\"$LIBS\"\n        LIBS=\"$PTHREAD_LIBS $LIBS\"\n        AC_MSG_CHECKING([for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS])\n        AC_TRY_LINK_FUNC(pthread_join, acx_pthread_ok=yes)\n        AC_MSG_RESULT($acx_pthread_ok)\n        if test x\"$acx_pthread_ok\" = xno; then\n                PTHREAD_LIBS=\"\"\n                PTHREAD_CFLAGS=\"\"\n        fi\n        LIBS=\"$save_LIBS\"\n        CFLAGS=\"$save_CFLAGS\"\nfi\n\n# We must check for the threads library under a number of different\n# names; the ordering is very important because some systems\n# (e.g. DEC) have both -lpthread and -lpthreads, where one of the\n# libraries is broken (non-POSIX).\n\n# Create a list of thread flags to try.  Items starting with a \"-\" are\n# C compiler flags, and other items are library names, except for \"none\"\n# which indicates that we try without any flags at all, and \"pthread-config\"\n# which is a program returning the flags for the Pth emulation library.\n\nacx_pthread_flags=\"pthreads none -Kthread -kthread lthread -pthread -pthreads -mthreads pthread --thread-safe -mt pthread-config\"\n\n# The ordering *is* (sometimes) important.  Some notes on the\n# individual items follow:\n\n# pthreads: AIX (must check this before -lpthread)\n# none: in case threads are in libc; should be tried before -Kthread and\n#       other compiler flags to prevent continual compiler warnings\n# -Kthread: Sequent (threads in libc, but -Kthread needed for pthread.h)\n# -kthread: FreeBSD kernel threads (preferred to -pthread since SMP-able)\n# lthread: LinuxThreads port on FreeBSD (also preferred to -pthread)\n# -pthread: Linux/gcc (kernel threads), BSD/gcc (userland threads)\n# -pthreads: Solaris/gcc\n# -mthreads: Mingw32/gcc, Lynx/gcc\n# -mt: Sun Workshop C (may only link SunOS threads [-lthread], but it\n#      doesn't hurt to check since this sometimes defines pthreads too;\n#      also defines -D_REENTRANT)\n#      ... -mt is also the pthreads flag for HP/aCC\n# pthread: Linux, etcetera\n# --thread-safe: KAI C++\n# pthread-config: use pthread-config program (for GNU Pth library)\n\ncase \"${host_cpu}-${host_os}\" in\n        *solaris*)\n\n        # On Solaris (at least, for some versions), libc contains stubbed\n        # (non-functional) versions of the pthreads routines, so link-based\n        # tests will erroneously succeed.  (We need to link with -pthreads/-mt/\n        # -lpthread.)  (The stubs are missing pthread_cleanup_push, or rather\n        # a function called by this macro, so we could check for that, but\n        # who knows whether they'll stub that too in a future libc.)  So,\n        # we'll just look for -pthreads and -lpthread first:\n\n        acx_pthread_flags=\"-pthreads pthread -mt -pthread $acx_pthread_flags\"\n        ;;\nesac\n\nif test x\"$acx_pthread_ok\" = xno; then\nfor flag in $acx_pthread_flags; do\n\n        case $flag in\n                none)\n                AC_MSG_CHECKING([whether pthreads work without any flags])\n                ;;\n\n                -*)\n                AC_MSG_CHECKING([whether pthreads work with $flag])\n                PTHREAD_CFLAGS=\"$flag\"\n                ;;\n\n\t\tpthread-config)\n\t\tAC_CHECK_PROG(acx_pthread_config, pthread-config, yes, no)\n\t\tif test x\"$acx_pthread_config\" = xno; then continue; fi\n\t\tPTHREAD_CFLAGS=\"`pthread-config --cflags`\"\n\t\tPTHREAD_LIBS=\"`pthread-config --ldflags` `pthread-config --libs`\"\n\t\t;;\n\n                *)\n                AC_MSG_CHECKING([for the pthreads library -l$flag])\n                PTHREAD_LIBS=\"-l$flag\"\n                ;;\n        esac\n\n        save_LIBS=\"$LIBS\"\n        save_CFLAGS=\"$CFLAGS\"\n        LIBS=\"$PTHREAD_LIBS $LIBS\"\n        CFLAGS=\"$CFLAGS $PTHREAD_CFLAGS\"\n\n        # Check for various functions.  We must include pthread.h,\n        # since some functions may be macros.  (On the Sequent, we\n        # need a special flag -Kthread to make this header compile.)\n        # We check for pthread_join because it is in -lpthread on IRIX\n        # while pthread_create is in libc.  We check for pthread_attr_init\n        # due to DEC craziness with -lpthreads.  We check for\n        # pthread_cleanup_push because it is one of the few pthread\n        # functions on Solaris that doesn't have a non-functional libc stub.\n        # We try pthread_create on general principles.\n        AC_TRY_LINK([#include <pthread.h>],\n                    [pthread_t th; pthread_join(th, 0);\n                     pthread_attr_init(0); pthread_cleanup_push(0, 0);\n                     pthread_create(0,0,0,0); pthread_cleanup_pop(0); ],\n                    [acx_pthread_ok=yes])\n\n        LIBS=\"$save_LIBS\"\n        CFLAGS=\"$save_CFLAGS\"\n\n        AC_MSG_RESULT($acx_pthread_ok)\n        if test \"x$acx_pthread_ok\" = xyes; then\n                break;\n        fi\n\n        PTHREAD_LIBS=\"\"\n        PTHREAD_CFLAGS=\"\"\ndone\nfi\n\n# Various other checks:\nif test \"x$acx_pthread_ok\" = xyes; then\n        save_LIBS=\"$LIBS\"\n        LIBS=\"$PTHREAD_LIBS $LIBS\"\n        save_CFLAGS=\"$CFLAGS\"\n        CFLAGS=\"$CFLAGS $PTHREAD_CFLAGS\"\n\n        # Detect AIX lossage: JOINABLE attribute is called UNDETACHED.\n\tAC_MSG_CHECKING([for joinable pthread attribute])\n\tattr_name=unknown\n\tfor attr in PTHREAD_CREATE_JOINABLE PTHREAD_CREATE_UNDETACHED; do\n\t    AC_TRY_LINK([#include <pthread.h>], [int attr=$attr; return attr;],\n                        [attr_name=$attr; break])\n\tdone\n        AC_MSG_RESULT($attr_name)\n        if test \"$attr_name\" != PTHREAD_CREATE_JOINABLE; then\n            AC_DEFINE_UNQUOTED(PTHREAD_CREATE_JOINABLE, $attr_name,\n                               [Define to necessary symbol if this constant\n                                uses a non-standard name on your system.])\n        fi\n\n        AC_MSG_CHECKING([if more special flags are required for pthreads])\n        flag=no\n        case \"${host_cpu}-${host_os}\" in\n            *-aix* | *-freebsd* | *-darwin*) flag=\"-D_THREAD_SAFE\";;\n            *solaris* | *-osf* | *-hpux*) flag=\"-D_REENTRANT\";;\n        esac\n        AC_MSG_RESULT(${flag})\n        if test \"x$flag\" != xno; then\n            PTHREAD_CFLAGS=\"$flag $PTHREAD_CFLAGS\"\n        fi\n\n        LIBS=\"$save_LIBS\"\n        CFLAGS=\"$save_CFLAGS\"\n\n        # More AIX lossage: must compile with xlc_r or cc_r\n\tif test x\"$GCC\" != xyes; then\n          AC_CHECK_PROGS(PTHREAD_CC, xlc_r cc_r, ${CC})\n        else\n          PTHREAD_CC=$CC\n\tfi\nelse\n        PTHREAD_CC=\"$CC\"\nfi\n\nAC_SUBST(PTHREAD_LIBS)\nAC_SUBST(PTHREAD_CFLAGS)\nAC_SUBST(PTHREAD_CC)\n\n# Finally, execute ACTION-IF-FOUND/ACTION-IF-NOT-FOUND:\nif test x\"$acx_pthread_ok\" = xyes; then\n        ifelse([$1],,AC_DEFINE(HAVE_PTHREAD,1,[Define if you have POSIX threads libraries and header files.]),[$1])\n        :\nelse\n        acx_pthread_ok=no\n        $2\nfi\nAC_LANG_RESTORE\n])dnl ACX_PTHREAD\n# ===========================================================================\n#          http://www.nongnu.org/autoconf-archive/ax_lib_mysql.html\n# ===========================================================================\n#\n# SYNOPSIS\n#\n#   AX_LIB_MYSQL([MINIMUM-VERSION])\n#\n# DESCRIPTION\n#\n#   This macro provides tests of availability of MySQL client library of\n#   particular version or newer.\n#\n#   AX_LIB_MYSQL macro takes only one argument which is optional. If there\n#   is no required version passed, then macro does not run version test.\n#\n#   The --with-mysql option takes one of three possible values:\n#\n#   no - do not check for MySQL client library\n#\n#   yes - do check for MySQL library in standard locations (mysql_config\n#   should be in the PATH)\n#\n#   path - complete path to mysql_config utility, use this option if\n#   mysql_config can't be found in the PATH\n#\n#   This macro calls:\n#\n#     AC_SUBST(MYSQL_CFLAGS)\n#     AC_SUBST(MYSQL_LDFLAGS)\n#     AC_SUBST(MYSQL_VERSION)\n#\n#   And sets:\n#\n#     HAVE_MYSQL\n#\n# LICENSE\n#\n#   Copyright (c) 2008 Mateusz Loskot <mateusz@loskot.net>\n#\n#   Copying and distribution of this file, with or without modification, are\n#   permitted in any medium without royalty provided the copyright notice\n#   and this notice are preserved.\n\nAC_DEFUN([AX_LIB_MYSQL],\n[\nmysql_message=\"Compiling without mySQL/mariadb support. If desired, check that\nthe mysql-devel (or dev-mysql, mariadb-devel, ...)  package is installed.\"\n\n    AC_ARG_WITH([mysql],\n        AC_HELP_STRING([--with-mysql=@<:@ARG@:>@],\n            [use MySQL client library @<:@default=yes@:>@, optionally specify path to mysql_config]\n        ),\n        [\n        if test \"$withval\" = \"no\"; then\n            want_mysql=\"no\"\n        elif test \"$withval\" = \"yes\"; then\n            want_mysql=\"yes\"\n        else\n            want_mysql=\"yes\"\n            MYSQL_CONFIG=\"$withval\"\n        fi\n        ],\n        [want_mysql=\"yes\"]\n    )\n\n    MYSQL_CFLAGS=\"\"\n    MYSQL_LDFLAGS=\"\"\n    MYSQL_VERSION=\"\"\n\n    dnl\n    dnl Check MySQL libraries (libpq)\n    dnl\n\n    if test \"$want_mysql\" = \"yes\"; then\n\n        if test -z \"$MYSQL_CONFIG\" -o test; then\n            AC_PATH_PROG([MYSQL_CONFIG], [mysql_config], [no])\n        fi\n\n        if test \"$MYSQL_CONFIG\" != \"no\"; then\n            AC_MSG_CHECKING([for MySQL libraries])\n\n            MYSQL_CFLAGS=\"`$MYSQL_CONFIG --cflags`\"\n            MYSQL_LDFLAGS=\"`$MYSQL_CONFIG --libs`\"\n\n            MYSQL_VERSION=`$MYSQL_CONFIG --version`\n\n            #BK hack: the above doesn't verify that my_global.h is present.\n            mysql_config_path=`mysql_config --include | sed 's/-I//'`\n            AC_CHECK_FILE([$mysql_config_path/my_global.h], [\n                    AC_DEFINE([HAVE_MYSQL], [1], [Define to 1 if MySQL libraries are available])\n\n                    mysql_message=\"Compiling with mySQL/mariadb support.\"\n                ], [\n                    found_mysql=\"yes\"\n                    AC_MSG_RESULT([yes])\n                ], [\n                    unset MYSQL_CFLAGS\n                    unset MYSQL_LDFLAGS\n                    unset MYSQL_VERSION\n\n                    found_mysql=\"no\"\n                    AC_MSG_RESULT([no])\n                ])\n        else\n            found_mysql=\"no\"\n            AC_MSG_RESULT([no])\n        fi\n    fi\n\n    dnl\n    dnl Check if required version of MySQL is available\n    dnl\n\n\n    mysql_version_req=ifelse([$1], [], [], [$1])\n\n    if test \"$found_mysql\" = \"yes\" -a -n \"$mysql_version_req\"; then\n\n        AC_MSG_CHECKING([if MySQL version is >= $mysql_version_req])\n\n        dnl Decompose required version string of MySQL\n        dnl and calculate its number representation\n        mysql_version_req_major=`expr $mysql_version_req : '\\([[0-9]]*\\)'`\n        mysql_version_req_minor=`expr $mysql_version_req : '[[0-9]]*\\.\\([[0-9]]*\\)'`\n        mysql_version_req_micro=`expr $mysql_version_req : '[[0-9]]*\\.[[0-9]]*\\.\\([[0-9]]*\\)'`\n        if test \"x$mysql_version_req_micro\" = \"x\"; then\n            mysql_version_req_micro=\"0\"\n        fi\n\n        mysql_version_req_number=`expr $mysql_version_req_major \\* 1000000 \\\n                                   \\+ $mysql_version_req_minor \\* 1000 \\\n                                   \\+ $mysql_version_req_micro`\n\n        dnl Decompose version string of installed MySQL\n        dnl and calculate its number representation\n        mysql_version_major=`expr $MYSQL_VERSION : '\\([[0-9]]*\\)'`\n        mysql_version_minor=`expr $MYSQL_VERSION : '[[0-9]]*\\.\\([[0-9]]*\\)'`\n        mysql_version_micro=`expr $MYSQL_VERSION : '[[0-9]]*\\.[[0-9]]*\\.\\([[0-9]]*\\)'`\n        if test \"x$mysql_version_micro\" = \"x\"; then\n            mysql_version_micro=\"0\"\n        fi\n\n        mysql_version_number=`expr $mysql_version_major \\* 1000000 \\\n                                   \\+ $mysql_version_minor \\* 1000 \\\n                                   \\+ $mysql_version_micro`\n\n        mysql_version_check=`expr $mysql_version_number \\>\\= $mysql_version_req_number`\n        if test \"$mysql_version_check\" = \"1\"; then\n            AC_MSG_RESULT([yes])\n        else\n            AC_MSG_RESULT([no])\n        fi\n    fi\n\n    AC_SUBST([MYSQL_VERSION])\n    AC_SUBST([MYSQL_CFLAGS])\n    AC_SUBST([MYSQL_LDFLAGS])\n])\n# ===========================================================================\n#         http://www.nongnu.org/autoconf-archive/ax_lib_sqlite3.html\n# ===========================================================================\n#\n# SYNOPSIS\n#\n#   AX_LIB_SQLITE3([MINIMUM-VERSION])\n#\n# DESCRIPTION\n#\n#   Test for the SQLite 3 library of a particular version (or newer)\n#\n#   This macro takes only one optional argument, required version of SQLite\n#   3 library. If required version is not passed, 3.0.0 is used in the test\n#   of existance of SQLite 3.\n#\n#   If no intallation prefix to the installed SQLite library is given the\n#   macro searches under /usr, /usr/local, and /opt.\n#\n#   This macro calls:\n#\n#     AC_SUBST(SQLITE3_CFLAGS)\n#     AC_SUBST(SQLITE3_LDFLAGS)\n#     AC_SUBST(SQLITE3_VERSION)\n#\n#   And sets:\n#\n#     HAVE_SQLITE3\n#\n# LICENSE\n#\n#   Copyright (c) 2008 Mateusz Loskot <mateusz@loskot.net>\n#\n#   Copying and distribution of this file, with or without modification, are\n#   permitted in any medium without royalty provided the copyright notice\n#   and this notice are preserved.\n\nAC_DEFUN([AX_LIB_SQLITE3],\n[\n    AC_ARG_WITH([sqlite3],\n        AC_HELP_STRING(\n            [--with-sqlite3=@<:@ARG@:>@],\n            [use SQLite 3 library @<:@default=yes@:>@, optionally specify the prefix for sqlite3 library]\n        ),\n        [\n        if test \"$withval\" = \"no\"; then\n            WANT_SQLITE3=\"no\"\n        elif test \"$withval\" = \"yes\"; then\n            WANT_SQLITE3=\"yes\"\n            ac_sqlite3_path=\"\"\n        else\n            WANT_SQLITE3=\"yes\"\n            ac_sqlite3_path=\"$withval\"\n        fi\n        ],\n        [WANT_SQLITE3=\"yes\"]\n    )\n\n    SQLITE3_CFLAGS=\"\"\n    SQLITE3_LDFLAGS=\"\"\n    SQLITE3_VERSION=\"\"\n\n    if test \"x$WANT_SQLITE3\" = \"xyes\"; then\n\n        ac_sqlite3_header=\"sqlite3.h\"\n\n        sqlite3_version_req=ifelse([$1], [], [3.0.0], [$1])\n        sqlite3_version_req_shorten=`expr $sqlite3_version_req : '\\([[0-9]]*\\.[[0-9]]*\\)'`\n        sqlite3_version_req_major=`expr $sqlite3_version_req : '\\([[0-9]]*\\)'`\n        sqlite3_version_req_minor=`expr $sqlite3_version_req : '[[0-9]]*\\.\\([[0-9]]*\\)'`\n        sqlite3_version_req_micro=`expr $sqlite3_version_req : '[[0-9]]*\\.[[0-9]]*\\.\\([[0-9]]*\\)'`\n        if test \"x$sqlite3_version_req_micro\" = \"x\" ; then\n            sqlite3_version_req_micro=\"0\"\n        fi\n\n        sqlite3_version_req_number=`expr $sqlite3_version_req_major \\* 1000000 \\\n                                   \\+ $sqlite3_version_req_minor \\* 1000 \\\n                                   \\+ $sqlite3_version_req_micro`\n\n        AC_MSG_CHECKING([for SQLite3 library >= $sqlite3_version_req])\n\n        if test \"$ac_sqlite3_path\" != \"\"; then\n            ac_sqlite3_ldflags=\"-L$ac_sqlite3_path/lib\"\n            ac_sqlite3_cppflags=\"-I$ac_sqlite3_path/include\"\n        else\n            for ac_sqlite3_path_tmp in /usr /usr/local /opt ; do\n                if test -f \"$ac_sqlite3_path_tmp/include/$ac_sqlite3_header\" \\\n                    && test -r \"$ac_sqlite3_path_tmp/include/$ac_sqlite3_header\"; then\n                    ac_sqlite3_path=$ac_sqlite3_path_tmp\n                    ac_sqlite3_cppflags=\"-I$ac_sqlite3_path_tmp/include\"\n                    ac_sqlite3_ldflags=\"-L$ac_sqlite3_path_tmp/lib\"\n                    break;\n                fi\n            done\n        fi\n\n        ac_sqlite3_ldflags=\"$ac_sqlite3_ldflags -lsqlite3\"\n\n        saved_CPPFLAGS=\"$CPPFLAGS\"\n        CPPFLAGS=\"$CPPFLAGS $ac_sqlite3_cppflags\"\n\n#        AC_LANG_PUSH(C++)\n        AC_LANG_PUSH(C)\n        AC_COMPILE_IFELSE(\n            [\n            AC_LANG_PROGRAM([[@%:@include <sqlite3.h>]],\n                [[\n#if (SQLITE_VERSION_NUMBER >= $sqlite3_version_req_number)\n// Everything is okay\n#else\n#  error SQLite version is too old\n#endif\n                ]]\n            )\n            ],\n            [\n            AC_MSG_RESULT([yes])\n            success=\"yes\"\n            ],\n            [\n            AC_MSG_RESULT([not found])\n            succees=\"no\"\n            ]\n        )\n#BK edit\n#       AC_LANG_POP([C++])\n       AC_LANG_POP([C])\n\n        CPPFLAGS=\"$saved_CPPFLAGS\"\n\n        if test \"$success\" = \"yes\"; then\n\n            SQLITE3_CFLAGS=\"$ac_sqlite3_cppflags\"\n            SQLITE3_LDFLAGS=\"$ac_sqlite3_ldflags\"\n\n            ac_sqlite3_header_path=\"$ac_sqlite3_path/include/$ac_sqlite3_header\"\n\n            dnl Retrieve SQLite release version\n            if test \"x$ac_sqlite3_header_path\" != \"x\"; then\n                ac_sqlite3_version=`cat $ac_sqlite3_header_path \\\n                    | grep '#define.*SQLITE_VERSION.*\\\"' | sed -e 's/.* \"//' \\\n                        | sed -e 's/\"//'`\n                if test $ac_sqlite3_version != \"\"; then\n                    SQLITE3_VERSION=$ac_sqlite3_version\n                else\n                    AC_MSG_WARN([Can not find SQLITE_VERSION macro in sqlite3.h header to retrieve SQLite version!])\n                fi\n            fi\n\n            AC_SUBST(SQLITE3_CFLAGS)\n            AC_SUBST(SQLITE3_LDFLAGS)\n            AC_SUBST(SQLITE3_VERSION)\n            AC_DEFINE([HAVE_SQLITE3], [], [Have the SQLITE3 library])\n        fi\n    fi\n])\n\n\n# ===========================================================================\n#    http://www.gnu.org/software/autoconf-archive/ax_c___attribute__.html\n# ===========================================================================\n#\n# SYNOPSIS\n#\n#   AX_C___ATTRIBUTE__\n#\n# DESCRIPTION\n#\n#   Provides a test for the compiler support of __attribute__ extensions.\n#   Defines HAVE___ATTRIBUTE__ if it is found.\n#\n# LICENSE\n#\n#   Copyright (c) 2008 Stepan Kasal <skasal@redhat.com>\n#   Copyright (c) 2008 Christian Haggstrom\n#   Copyright (c) 2008 Ryan McCabe <ryan@numb.org>\n#\n#   This program is free software; you can redistribute it and/or modify it\n#   under the terms of the GNU General Public License as published by the\n#   Free Software Foundation; either version 2 of the License, or (at your\n#   option) any later version.\n#\n#   This program is distributed in the hope that it will be useful, but\n#   WITHOUT ANY WARRANTY; without even the implied warranty of\n#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General\n#   Public License for more details.\n#\n#   You should have received a copy of the GNU General Public License along\n#   with this program. If not, see <http://www.gnu.org/licenses/>.\n#\n#   As a special exception, the respective Autoconf Macro's copyright owner\n#   gives unlimited permission to copy, distribute and modify the configure\n#   scripts that are the output of Autoconf when processing the Macro. You\n#   need not follow the terms of the GNU General Public License when using\n#   or distributing such scripts, even though portions of the text of the\n#   Macro appear in them. The GNU General Public License (GPL) does govern\n#   all other use of the material that constitutes the Autoconf Macro.\n#\n#   This special exception to the GPL applies to versions of the Autoconf\n#   Macro released by the Autoconf Archive. When you make and distribute a\n#   modified version of the Autoconf Macro, you may extend this special\n#   exception to the GPL to apply to your modified version as well.\n\n\nAC_DEFUN([AX_C___ATTRIBUTE__], [\n  AC_CACHE_CHECK([for __attribute__], [ax_cv___attribute__],\n    [AC_COMPILE_IFELSE(\n      [AC_LANG_PROGRAM(\n\t[[#include <stdlib.h>\n\t  static void foo(void) __attribute__ ((unused));\n\t  static void\n\t  foo(void) {\n\t      exit(1);\n\t  }\n        ]], [])],\n      [ax_cv___attribute__=yes],\n      [ax_cv___attribute__=no]\n    )\n  ])\n  if test \"$ax_cv___attribute__\" = \"yes\"; then\n    AC_DEFINE([HAVE___ATTRIBUTE__], 1, [define if your compiler has __attribute__])\n  fi\n])\n\n\n\n\n# ld-version-script.m4 serial 3\ndnl Copyright (C) 2008-2014 Free Software Foundation, Inc.\ndnl This file is free software; the Free Software Foundation\ndnl gives unlimited permission to copy and/or distribute it,\ndnl with or without modifications, as long as this notice is preserved.\n\ndnl From Simon Josefsson\n\n# FIXME: The test below returns a false positive for mingw\n# cross-compiles, 'local:' statements does not reduce number of\n# exported symbols in a DLL. Use --disable-ld-version-script to work\n# around the problem.\n\n# gl_LD_VERSION_SCRIPT\n# --------------------\n# Check if LD supports linker scripts, and define automake conditional\n# HAVE_LD_VERSION_SCRIPT if so.\nAC_DEFUN([gl_LD_VERSION_SCRIPT],\n[\n AC_ARG_ENABLE([ld-version-script],\n AS_HELP_STRING([--enable-ld-version-script],\n [enable linker version script (default is enabled when possible)]),\n [have_ld_version_script=$enableval], [])\n if test -z \"$have_ld_version_script\"; then\n AC_MSG_CHECKING([if LD -Wl,--version-script works])\n save_LDFLAGS=\"$LDFLAGS\"\n LDFLAGS=\"$LDFLAGS -Wl,--version-script=conftest.map\"\n cat > conftest.map <<EOF\nfoo\nEOF\n AC_LINK_IFELSE([AC_LANG_PROGRAM([], [])],\n [accepts_syntax_errors=yes], [accepts_syntax_errors=no])\n if test \"$accepts_syntax_errors\" = no; then\n cat > conftest.map <<EOF\nVERS_1 {\n global: sym;\n};\n\nVERS_2 {\n global: sym;\n} VERS_1;\nEOF\n AC_LINK_IFELSE([AC_LANG_PROGRAM([], [])],\n [have_ld_version_script=yes], [have_ld_version_script=no])\n else\n have_ld_version_script=no\n fi\n rm -f conftest.map\n LDFLAGS=\"$save_LDFLAGS\"\n AC_MSG_RESULT($have_ld_version_script)\n fi\n AM_CONDITIONAL(HAVE_LD_VERSION_SCRIPT, test \"$have_ld_version_script\" = \"yes\")\n])\n"
  },
  {
    "path": "install/apophenia.pc.in",
    "content": "prefix=@prefix@\nexec_prefix=@exec_prefix@\nbindir=@bindir@\nlibdir=@libdir@\nincludedir=@includedir@\n\nName: Apophenia\nDescription: The Apophenia library\nURL: http://apophenia.info/\nRequires: gsl\nRequires.private: sqlite3\nVersion: @VERSION@\nCflags: @MYSQL_CFLAGS@\nLibs: -L${libdir} -lapophenia\nLibs.private: @MYSQL_LDFLAGS@\n"
  },
  {
    "path": "install/configure.ac",
    "content": "# Process this file with autoconf to produce a configure script.\n\nm4_define([m4_apop_version], [m4_esyscmd_s(date +%Y%m%d)]) #will switch to this soon.\n\nAC_PREREQ(2.60)\nAC_INIT([apophenia], [1.0], [fluffmail@f-m.fm])\nAM_SILENT_RULES([yes])\nAC_CONFIG_SRCDIR([apop_arms.c])\nAC_CONFIG_AUX_DIR([build-aux])\nAC_CONFIG_HEADER([config.h])\nAC_CONFIG_MACRO_DIR([m4])\nAM_INIT_AUTOMAKE\nAM_MAINTAINER_MODE\n\n# The normal /usr/local default confused too many people\n##AC_PREFIX_DEFAULT([/usr])\n\n# libtool:\nLT_INIT\n\n# check linker script support\ngl_LD_VERSION_SCRIPT\n\n# Checks for programs.\nAC_PROG_CC\nAC_PROG_CC_C99\n\nACX_PTHREAD\nAC_OPENMP\n\n# Checks for libraries.\n## math library\nLT_LIB_M\n## GNU Scientific Library (GSL)\nAX_PATH_GSL([1.12.0],[],[AC_MSG_ERROR(could not find required version of GSL)])\n## DataBase system libraries\n#### MySQL library\nAX_LIB_MYSQL\n#### SQLite3 library\nAX_LIB_SQLITE3\n\n# Checks for header files.\nAC_FUNC_ALLOCA\nAC_HEADER_STDC\nAC_CHECK_HEADERS([float.h inttypes.h limits.h stddef.h stdint.h stdlib.h string.h unistd.h wchar.h])\n\n# Checks for typedefs, structures, and compiler characteristics.\nAC_C_CONST\nAC_C_INLINE\nAC_TYPE_SIZE_T\nAC_STRUCT_TM\nAC_CHECK_TYPES([ptrdiff_t])\nAX_C___ATTRIBUTE__\n\n#Some versions of GCC support atomics iff OpenMP is off.\nexport CFLAGS=\"$CFLAGS $OPENMP_CFLAGS\"\nAC_RUN_IFELSE(\n[AC_LANG_SOURCE([int main(){\n    _Atomic(int) i;\n }\n])]\n, [AC_SUBST([Autoconf_no_atomics], 0)]\n, [AC_SUBST([Autoconf_no_atomics], 1)]\n, [AC_SUBST([Autoconf_no_atomics], 1)]\n)\n\n# Checks for library functions.\nAC_FUNC_MALLOC\nAC_FUNC_REALLOC\nAC_FUNC_STRTOD\nAC_CHECK_FUNCS([floor memset pow regcomp sqrt strcasecmp asprintf])\n\n# Checks for tests tools\nAC_PATH_PROGS([BC],[bc],[/usr/bin/bc])\nAC_PATH_PROGS([SQLITE3],[sqlite3],[/usr/bin/sqlite3])\n\n# Run only the basic tests unless asked to run the full suite.\nAC_MSG_CHECKING([whether to run extended tests])\nAC_ARG_ENABLE([extended-tests],\n      [AS_HELP_STRING([--enable-extended-tests], [run numeric torture tests (may be slow)])],\n                  [enable_extended_tests=\"yes\"], [enable_extended_tests=\"no\"])\nAC_MSG_RESULT([$enable_extended_tests])\nAM_CONDITIONAL([EXTENDED_TESTS], [test \"X$enable_extended_tests\" != \"Xno\"])\n\nAC_CONFIG_FILES([\n\tapophenia.pc\n    apop.h\n\tMakefile\n\ttransform/Makefile\n\tmodel/Makefile\n\tcmd/Makefile\n\ttests/Makefile\n\tdocs/doxygen.conf\n\tdocs/Makefile\n\teg/Makefile\n\t])\n\nAC_CONFIG_FILES([\n\ttests/utilities_test\n\t],\n\t[\n\tchmod a+x tests/utilities_test\n\t])\nAC_OUTPUT\n\n##\n## eof\n"
  },
  {
    "path": "install/prep_variadics.m4",
    "content": "m4_divert(-1)\n\nThese are the macros to filter C files to produce the headers for the compound\nliteral-based variadic function headers. For usage, your best bet is to learn by \nexample and compare files in the base and generated source codes.  \n\nMost of the work is in splitting the input down into the type name, function name,\nfunction arguments, and function body. Then, the parts get reassembled as per the\ntemplate.\n\nSee also the technical notes at http://modelingwithdata.org/arch/00000022.htm\n\nm4_changequote(`<|', `|>')\nm4_changecom()\nm4_define(APOP_VAR_HEAD, <|Variadify(|>)\nm4_define(APOP_VAR_ENDHEAD, <|)|>)\nm4_define(APOP_VAR_END_HEAD, <|)|>)\nm4_define(cutm4, <|m4_define(<|stlenForCut|>, m4_regexp(<|$1|>, <|$2|>))m4_substr(<|$1|>, 0, stlenForCut)|>)\nm4_define(postCutm4, <|m4_define(<|stlenForCut|>, m4_regexp(<|$1|>, <|$2|>))m4_substr(<|$1|>, stlenForCut)|>)\nm4_define(Names_Only, <|m4_patsubst(m4_patsubst($1, <| *$|>,), <|.*[ *]|>,)<||>m4_ifelse(<|$#|>, <|0|>, ,<|$#|>, <|1|>, , <|, Names_Only(m4_shift($@))|>)|>)\n\nm4_define(Variadify, <|m4_dnl\nm4_define(<|PreParen|>, cutm4(<|$*|>, <| *(|>,))m4_dnl\nm4_define(<|Fn_Name|>, m4_patsubst(PreParen, <|^.*\\( \\|\\*\\)|>, ))m4_dnl\nm4_define(<|Type_Name|>, cutm4(cutm4(PreParen, Fn_Name), <| *$|>))m4_dnl\nm4_define(<|PreBrace|>, <|cutm4(<|$*|>,<| *{|>)|>)m4_dnl\nm4_define(<|PostBrace|>, <|postCutm4(<|$*|>,<| *{|>)|>)m4_dnl\nm4_define(<|Args_m4|>, postCutm4(PreBrace, <|(|>))m4_dnl\nm4_define(<|AArgs_m4|>, m4_patsubst(Args_m4,Output_declares,Output_vars))m4_dnl A favor to apop_output.c\n#ifdef APOP_NO_VARIADIC\nType_Name Fn_Name<||>Args_m4{\n#else\napop_varad_head(Type_Name, Fn_Name)PostBrace    m4_ifelse(Type_Name, <|void|>,,return) Fn_Name<||>_base(Names_Only(m4_patsubst(AArgs_m4,<| *[()]|>,)));\n}\n\n Type_Name Fn_Name<||>_base<||>Args_m4{\n#endif|>)\n\nm4_define(Apop_var_declare,<|m4_dnl\nm4_define(<|PreParen|>, cutm4(<|$*|>, <| *(|>,))m4_dnl\nm4_define(<|Fn_Name|>, m4_patsubst(PreParen, <|^.*\\( \\|\\*\\)|>, ))m4_dnl\nm4_define(<|Type_Name|>, cutm4(cutm4(PreParen, Fn_Name), <| *$|>))m4_dnl\nm4_define(<|Args_m4|>, postCutm4(<|$*|>, <|(|>))m4_dnl\nm4_define(<|Args_m4_semicolonized|>, m4_patsubst(m4_translit(Args_m4, <|,|>, ;), <|\\(^ *(\\|) *$\\)|>, ))m4_dnl\nm4_patsubst(m4_dnl\n#ifdef APOP_NO_VARIADIC\n $*;\n#else\n Type_Name Fn_Name<||>_base<||>Args_m4;\n apop_varad_declare(Type_Name, Fn_Name, Args_m4_semicolonized);\n#define Fn_Name<||>(...) apop_varad_link(Fn_Name, __VA_ARGS__)\n#endif,\n<|!|>, <|,|>)\n|>)\n\nm4_define_at_some_point(<|m4_apop_version|>, <|m4_syscmd(date +%Y%m%d)|>)\nm4_define(<|m4_apop_version|>, <|1.0|>)\nm4_divert(0)\n"
  },
  {
    "path": "install/push_pkg",
    "content": "version=1.0\nworkdir=apophenia-$version\ngit checkout -b pkg-`git log -1 | grep commit | cut -f2 -d' ' | head -c 8`\n./configure\ncp install/Readme-pkg .\nfor i in `git ls-files`; do git rm $i; done\n#rsync -aP $workdir/ .\nmv $workdir/apophenia-${version}.tar.gz .\nrm -r $workdir\ntar xz < apophenia-${version}.tar.gz && rm apophenia-${version}.tar.gz\nmv $workdir/* .\nmv $workdir/tests/* ./tests/\nmv $workdir/docs/* ./docs/\nmv $workdir/m4/* ./m4/\nrmdir $workdir/tests\nrmdir $workdir/docs\nrmdir $workdir/m4\nrmdir $workdir\nmv Readme-pkg README\ngit add .\ngit commit -a -m \"Rebuild package based on commit `git rev-parse pkg | head -c 8`\"\ngit merge -X ours remotes/origin/pkg -m \"Merge version based on commit `git rev-parse master | head -c 8`\"\ngit push origin `git rev-parse --abbrev-ref HEAD`:pkg\ngit checkout master\ngit branch -D `git branch| grep pkg-`\n"
  },
  {
    "path": "install/rpm.spec",
    "content": "Vendor:       Ben Klemens\nName:         apophenia\nVersion:      1.0\nRelease:      1\nLicense:      GPLv2 w/Affero-type addendum\nProvides:     apophenia, apophenia-devel\nRequires:     gsl, gsl-devel, sqlite3, sqlite3-devel\nPackager:     fluffmail@f-m.fm\nSummary:      A library of functions and models for scientific computing.\nSource:       PKGNAME\nBuildRoot:    %{buildroot}\nURL:          http://apophenia.info\n%description\nApophenia is a library of functions and models for scientific computing.\nIt is intended to make work easier---and even fun---when handling data sets, fitting\nmodels, and designing new models in C. Facilities for managing data include an easy\nlink to SQLite3 or mySQL databases.\n\n%prep\n%setup\n\n%build\n%configure \nmake\n\n%install\n%makeinstall\n#DESTDIR={%buildroot} %makeinstall\n\n%files\n%defattr(-,root,root)\n/usr/lib/libapophenia.so.0.0.0\n/usr/lib/libapophenia.so.0\n/usr/lib/libapophenia.so\n/usr/lib/libapophenia.la\n/usr/lib/libapophenia.a\n/usr/lib/pkgconfig/apophenia.pc\n/usr/bin/apop_text_to_db\n/usr/bin/apop_db_to_crosstab\n/usr/bin/apop_merge_dbs\n/usr/bin/apop_plot_query\n/usr/bin/apop_lookup\n/usr/include/apop.h\n/usr/include/apophenia\n/usr/include/apophenia/db.h\n/usr/include/apophenia/asst.h\n/usr/include/apophenia/stats.h\n/usr/include/apophenia/types.h\n/usr/include/apophenia/variadic.h\n/usr/include/apophenia/settings.h\n/usr/include/apophenia/deprecated.h\n"
  },
  {
    "path": "model/Makefile.am",
    "content": "\nnoinst_LTLIBRARIES = libapopmodel.la\n\nlibapopmodel_la_SOURCES = \\\n\tapop_bernoulli.c \\\n\tapop_beta.c \\\n\tapop_dirichlet.c \\\n\tapop_exponential.c \\\n\tapop_gamma.c \\\n\tapop_kerneld.c \\\n\tapop_loess.c \\\n\tapop_multinomial.c \\\n\tapop_multivariate_normal.c \\\n\tapop_normal.c \\\n\tapop_ols.c \\\n\tapop_pmf.c \\\n\tapop_poisson.c \\\n\tapop_probit.c \\\n\tapop_t.c \\\n\tapop_uniform.c \\\n\tapop_wishart.c \\\n\tapop_yule.c \\\n\tapop_zipf.c\n\nlibapopmodel_la_CFLAGS = \\\n\t-I $(top_srcdir) \\\n\t$(PTHREAD_CFLAGS) \\\n\t$(OPENMP_CFLAGS) \\\n\t$(SQLITE3_CFLAGS) \\\n\t$(GSL_CFLAGS)\n"
  },
  {
    "path": "model/apop_bernoulli.c",
    "content": "/* The Bernoulli distribution as an \\ref apop_model.\nCopyright (c) 2007--2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n/* \\amodel apop_bernoulli The Bernoulli model: A single random draw with probability \\f$p\\f$.\n\n\\adoc   Input_format\n  The Bernoulli parameter \\f$p\\f$ is the percentage of non-zero\n  values in the matrix. Its variance is \\f$p(1-p)\\f$.\n*/\n\n#include \"apop_internal.h\"\n\nstatic double bernie_ll(double x, void * pin){ \n    double *p = pin; \n    return x ? log(*p) : log(1-*p); \n}\n\nstatic long double bernoulli_log_likelihood(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN);\n    double p = apop_data_get(params->parameters, 0, -1);\n\treturn apop_map_sum(d, .fn_dp = bernie_ll, .param=&p);\n}\n\nstatic double nonzero (double in) { return in !=0; }\n\n/* \\adoc estimated_parameters \\f$p\\f$ is the only element in the vector (e.g., get its\nvalue via <tt>double p = apop_data_get(outmodel->parameters);</tt>). A\n<tt>\\<Covariance\\></tt> page has the variance of \\f$p\\f$ in the (0,0)th element of the matrix.\n\\adoc estimated_info   Reports <tt>log likelihood</tt>.\n*/\nstatic void bernoulli_estimate(apop_data * data,  apop_model *est){\n    Nullcheck_mpd(data, est, ); Get_vmsizes(data); //tsize;\n    double n = tsize;\n    double p = apop_map_sum(data, nonzero)/n;\n    apop_name_add(est->parameters->names, \"p\", 'r');\n\tgsl_vector_set(est->parameters->vector, 0, p);\n    apop_data_add_named_elmt(est->info, \"log likelihood\", bernoulli_log_likelihood(data, est));\n    apop_data *cov = apop_data_add_page(est->parameters, apop_data_alloc(1,1), \"<Covariance>\");\n    apop_data_set(cov, 0,0, p*(1-p));\n}\n\nstatic long double bernoulli_constraint(apop_data *data, apop_model *inmodel){\n    //constraint is 0 < b and  1 > b\n    Staticdef(apop_data *, constraint, apop_data_falloc((2,2,1), 0., 1.,\n                                                                -1., -1.));\n    return apop_linear_constraint(inmodel->parameters->vector, constraint, 1e-3);\n}\n\n/* \\adoc    RNG Returns zero or one. */\nstatic int bernoulli_rng(double *out, gsl_rng *r, apop_model* eps){\n    *out = gsl_rng_uniform (r) < eps->parameters->vector->data[0]; \n    return 0;\n}\n\nstatic long double bernoulli_cdf(apop_data *d, apop_model *params){\n//One of those functions that just fills out the form.\n//CDF to zero = 1-p\n//CDF to one = 1\n    Nullcheck_mpd(d, params, GSL_NAN); Get_vmsizes(d)  //firstcol\n    double val = apop_data_get(d, .col=firstcol);\n    double p = *params->parameters->vector->data;\n    return isnan(val) ? GSL_NAN\n            : val < 0 ? 0\n            : val < 1 ? 1 - p\n                      : 1;\n}\n\nstatic void bernie_print(apop_model *m, FILE *out){\n    fprintf(out, \"Bernoulli distribution with p = %g.\\n\", apop_data_get(m->parameters,0,-1));\n}\n\nstatic void bernie_prep(apop_data *data, apop_model *params){\n    apop_model_print_vtable_add(bernie_print, apop_bernoulli);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_bernoulli = &(apop_model){\"Bernoulli distribution\", 1, .dsize=1, .prep=bernie_prep,\n\t.estimate = bernoulli_estimate, .log_likelihood = bernoulli_log_likelihood, \n   .constraint = bernoulli_constraint, .cdf = bernoulli_cdf, .draw = bernoulli_rng};\n"
  },
  {
    "path": "model/apop_beta.c",
    "content": "/* \\file apop_beta.c  The Beta distribution \nCopyright (c) 2006--2007, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n\\amodel apop_beta\n\nThe beta distribution has two parameters and is restricted to data between zero and one. You\nmay also find \\ref apop_beta_from_mean_var to be useful.\n\n\\adoc    Input_format  Any arrangement of scalar values. \n\\adoc    Parameter_format   A vector, v[0]=\\f$\\alpha\\f$; v[1]=\\f$\\beta\\f$    \n\\adoc    RNG  Produces a scalar \\f$\\in[0,1]\\f$. \n*/\n\n#include \"apop_internal.h\"\n\nstatic long double beta_log_likelihood(apop_data *d, apop_model *p);\n\n/* \\adoc estimated_info   Reports <tt>log likelihood</tt>. */\nstatic void beta_estimate(apop_data * data,  apop_model *est){\n    Nullcheck_mpd(data, est, );\n    Get_vmsizes(data) //vsize, msize1,...\n    double\t\tmmean=0, mvar=0, vmean=0, vvar=0, alpha, beta;\n    if (vsize){\n        vmean = apop_mean(data->vector);\n        vvar = apop_var(data->vector);\n    }\n    if (msize1)\n        apop_matrix_mean_and_var(data->matrix, &mmean, &mvar);\t\n    double mean = mmean *(msize1*msize2/tsize) + vmean *(vsize/tsize);\n    double var = mvar *(msize1*msize2/tsize) + vvar *(vsize/tsize);\n    apop_data_add_names(est->parameters, 'r', \"α\", \"β\");\n    alpha   = gsl_pow_2(mean) * ((1-mean)/var - 1/mean);\n    beta    = alpha * (1-mean)/mean;\n\tgsl_vector_set(est->parameters->vector, 0, alpha);\n\tgsl_vector_set(est->parameters->vector, 1, beta);\n    apop_data_add_named_elmt(est->info, \"log likelihood\", beta_log_likelihood(data, est));\n    //apop_numerical_covariance_matrix(apop_beta, est, data);\n}\n\n/** \\cond doxy_ignore */\ntypedef struct{\n    double alpha, beta; \n} ab_type;\n/** \\endcond */ //End of Doxygen ignore.\n\nstatic double betamap(double x, void *abin) {\n    ab_type *ab = abin; \n    return (x < 0 || x > 1) ? 0\n                : (ab->alpha-1) * log(x) + (ab->beta-1) *log(1-x); \n}\n\n#define Get_ab(p) \\\n    ab_type ab = { .alpha = apop_data_get(p->parameters,0,-1), \\\n                   .beta  = apop_data_get(p->parameters,1,-1) };\n\nstatic long double beta_log_likelihood(apop_data *d, apop_model *p){\n    Nullcheck_mpd(d, p, GSL_NAN); \n    Get_vmsizes(d) //tsize\n    Get_ab(p) //ab\n    Apop_stopif(isnan(ab.alpha+ab.beta), return GSL_NAN, 0, \"NaN α or β input.\");\n\treturn apop_map_sum(d, .fn_dp = betamap, .param=&ab) - gsl_sf_lnbeta(ab.alpha, ab.beta) * tsize;\n}\n\nstatic double dbeta_callback(double x){ return log(1-x); }\n\nstatic void beta_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *m){\n    Nullcheck_mpd(d, m, )\n    Get_vmsizes(d) //tsize\n    Get_ab(m) //ab\n    double lnsum = apop_map_sum(d, log);\n    double ln_x_minus_1_sum = apop_map_sum(d, dbeta_callback);\n\t//Psi is the derivative of the log gamma function.\n\tgsl_vector_set(gradient, 0, lnsum  + (gsl_sf_psi(ab.alpha + ab.beta) - gsl_sf_psi(ab.alpha))*tsize);\n\tgsl_vector_set(gradient, 1, ln_x_minus_1_sum  + (gsl_sf_psi(ab.alpha + ab.beta) - gsl_sf_psi(ab.beta))*tsize);\n}\n\nstatic long double beta_constraint(apop_data *data, apop_model *v){\n    //constraint is 0 < beta_1 and  0 < beta_2\n    return apop_linear_constraint(v->parameters->vector, .margin= 1e-4);\n}\n\nstatic long double beta_cdf(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN)\n    Get_vmsizes(d)  //vsize\n    Get_ab(params)\n    double val = apop_data_get(d);\n    return gsl_cdf_beta_P(val, ab.alpha, ab.beta);\n}\n\nstatic int beta_rng(double *out, gsl_rng *r, apop_model* eps){\n    double ans = GSL_NAN;\n    Nullcheck_mp(eps, 1)\n    Get_ab(eps)\n    do {\n        ans = gsl_ran_beta(r, ab.alpha, ab.beta);\n    } while (!((0.0 < ans) && (GSL_DBL_EPSILON <= 1.0 - ans)));\n    *out = ans;\n    return 0;\n}\n\nstatic void beta_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(beta_dlog_likelihood, apop_beta);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_beta = &(apop_model){\"Beta distribution\", 2,0,0, .dsize=1, .estimate = beta_estimate, \n    .log_likelihood = beta_log_likelihood, .prep = beta_prep, \n    .constraint = beta_constraint, .draw = beta_rng, .cdf = beta_cdf};\n"
  },
  {
    "path": "model/apop_dirichlet.c",
    "content": "/* The Dirichlet distribution \nCopyright (c) 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n\\amodel apop_dirichlet A multivariate generalization of the \\ref apop_beta \"Beta distribution\".\n\n\\adoc    Input_format      Each row of your data matrix is a single observation.  \n\\adoc    Parameter_format   The estimated parameters are in the output model's <tt>parameters->vector</tt>. The size of the model is determined by the width of your input data set, so later RNG draws, \\&c will match in size.\n\\adoc    settings   MLE-type: \\ref apop_mle_settings, \\ref apop_parts_wanted_settings   \n*/\n\n#include \"apop_internal.h\"\n\nstatic double dirichletlnmap(gsl_vector *v, void *pin) {\n    //we used gsl_matrix_row to get here==>guaranteed that v->stride==1.\n    gsl_vector *params = pin;\n    return gsl_ran_dirichlet_lnpdf (params->size, params->data, v->data);\n}\n\nstatic long double dirichlet_log_likelihood(apop_data *d, apop_model *p){\n    Nullcheck_mpd(d, p, GSL_NAN);\n    Apop_stopif(!p->parameters->vector, return GSL_NAN, 0, \"parameters should be in inmodel->parameters->vector.\");\n    double paramsum = apop_sum(p->parameters->vector);\n    Apop_stopif(fabs(paramsum)<1e-5, return GSL_NAN, 0, \"Parameter total is too close to zero.\");\n    Apop_stopif(isnan(paramsum), return GSL_NAN, 0, \"NaN parameter.\");\n\treturn apop_map_sum(d, .fn_vp = dirichletlnmap, .param=p->parameters->vector, .part='r');\n}\n\nstatic void dirichlet_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *m){\n    Nullcheck_mpd(d, m, );\n    double param_sum = apop_sum(m->parameters->vector);\n    int n = d->matrix->size1;\n    for(size_t i=0; i < m->parameters->vector->size; i ++){\n        double thisparam = gsl_vector_get(m->parameters->vector, i);\n        gsl_vector *onecol = Apop_cv(d, i);\n        gsl_vector_set(gradient, i,  //Psi is the derivative of the log gamma function.\n                apop_vector_map_sum(onecol, log) + n*gsl_sf_psi(param_sum) - n*gsl_sf_psi(thisparam));\n    }\n}\n\nstatic long double dirichlet_constraint(apop_data *data, apop_model *v){\n    //all elements are > 0.\n    return apop_linear_constraint(v->parameters->vector, .margin= 1e-4);\n}\n\n/*\\adoc    RNG  A call to \\c gsl_ran_dirichlet. Output format is identical to the input data format.*/\nstatic int dirichlet_rng(double *out, gsl_rng *r, apop_model* eps){\n    gsl_ran_dirichlet(r, eps->parameters->vector->size, eps->parameters->vector->data, out);\n    return 0;\n}\n\nstatic void dirichlet_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(dirichlet_dlog_likelihood, apop_dirichlet);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_dirichlet = &(apop_model){\"Dirichlet distribution\", -1,0,0, .dsize=-1,\n    .log_likelihood = dirichlet_log_likelihood, .prep = dirichlet_prep,\n    .constraint = dirichlet_constraint, .draw = dirichlet_rng};\n"
  },
  {
    "path": "model/apop_exponential.c",
    "content": "/* The Exponential distribution.\n Copyright (c) 2005--2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n \\amodel apop_exponential The Exponential distribution.\n\n\\f$Z(\\mu,k) \t= \\sum_k 1/\\mu e^{-k/\\mu} \t\t\t\\f$ <br>\n\\f$ln Z(\\mu,k) \t= \\sum_k -\\ln(\\mu) - k/\\mu\t\t\t\\f$ <br>\n\\f$dln Z(\\mu,k)/d\\mu \t= \\sum_k -1/\\mu + k/(\\mu^2)\t\t\t\\f$ <br>\n\nSome write the function as:\n\\f$Z(C,k) = \\ln C C^{-k}. \\f$\nIf you prefer this form, just convert your parameter via \\f$\\mu = {1\\over \\ln C}\\f$\n(and convert back from the parameters this function gives you via \\f$C=\\exp(1/\\mu)\\f$).\n\n\\adoc    Input_format  \nOne scalar observation per row (in the \\c matrix or \\c vector).  \nSee also \\ref apop_data_rank_compress for means of dealing with one more input data format.\n                    \n\\adoc    Parameter_format   \\f$\\mu\\f$ is in the zeroth element of the vector.   \n\\adoc    CDF  Returns a scalar draw.\n\\adoc    settings   None.  */\n\n#include \"apop_internal.h\"\n\nstatic long double beta_greater_than_x_constraint(apop_data *data, apop_model *v){\n    //constraint is 0 < beta_1\n    return apop_linear_constraint(v->parameters->vector, .margin = 1e-3);\n}\n\nstatic long double exponential_log_likelihood(apop_data *d, apop_model *p){\n    Nullcheck_mpd(d, p, GSL_NAN);\n    Get_vmsizes(d) //tsize\n    double mu = gsl_vector_get(p->parameters->vector, 0);\n    double llikelihood = -((d->matrix ? apop_matrix_sum(d->matrix):0) + (d->vector ? apop_sum(d->vector) : 0))/ mu;\n\tllikelihood\t-= tsize * log(mu);\n\treturn llikelihood;\n}\n\nstatic void exponential_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *p){\n    Nullcheck_mpd(d, p, );\n    Get_vmsizes(d) //tsize\n    double mu = gsl_vector_get(p->parameters->vector, 0);\n    double d_likelihood = (d->matrix ? apop_matrix_sum(d->matrix):0) + (d->vector ? apop_sum(d->vector) : 0);\n\td_likelihood /= gsl_pow_2(mu);\n\td_likelihood -= tsize /mu;\n\tgsl_vector_set(gradient,0, d_likelihood);\n}\n\nstatic void exponential_estimate(apop_data * data,  apop_model *est){\n    apop_score_vtable_add(exponential_dlog_likelihood, apop_exponential);\n    apop_name_add(est->parameters->names, \"μ\", 'r');\n    Get_vmsizes(data); //msize1, msize2, vsize, tsize\n    double mu =  (vsize ? vsize * apop_vector_mean(data->vector):0\n                + msize1 ? msize1*msize2 * apop_matrix_mean(data->matrix):0)/tsize;\n\tgsl_vector_set(est->parameters->vector, 0, mu);\n    apop_data_add_named_elmt(est->info, \"log likelihood\", exponential_log_likelihood(data, est));\n}\n\nstatic long double expo_cdf(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN);\n    Get_vmsizes(d)  //vsize\n    double val = apop_data_get(d, 0, vsize ? -1 : 0);\n    double lambda = gsl_vector_get(params->parameters->vector, 0);\n    return gsl_cdf_exponential_P(val, lambda);\n}\n\n/* \\adoc RNG Just a wrapper for \\c gsl_ran_exponential.  */\nstatic int exponential_rng(double *out, gsl_rng* r, apop_model *p){\n\t*out = gsl_ran_exponential(r, p->parameters->vector->data[0]);\n    return 0;\n}\n\nstatic void exponential_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(exponential_dlog_likelihood, apop_exponential);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_exponential = &(apop_model){\"Exponential distribution\", 1,0,0,.dsize=1,\n\t .estimate = exponential_estimate, .log_likelihood = exponential_log_likelihood, \n     .prep = exponential_prep, .constraint = beta_greater_than_x_constraint, \n     .draw = exponential_rng, .cdf = expo_cdf};\n"
  },
  {
    "path": "model/apop_gamma.c",
    "content": "/* The gamma distribution.\nCopyright (c) 2005--2007, 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n\\amodel apop_gamma \n\n\\f$G(x, a, b)     = {1\\over (\\Gamma(a) b^a)}  x^{a-1} e^{-x/b}\\f$\n\n\\f$ln G(x, a, b)= -ln \\Gamma(a) - a ln b + (a-1)ln(x) + -x/b\\f$\n\n\\f$d ln G/ da    =  -\\psi(a) - ln b + ln(x) \\f$    (also, \\f$d ln \\gamma = \\psi\\f$)\n\n\\f$d ln G/ db    =  -a/b + x/(b^2) \\f$\n\n\\adoc    Input_format A scalar, in the \\c vector or \\c matrix elements of the input \\ref apop_data set.\n\nSee also \\ref apop_data_rank_compress for means of dealing with one more input data format.\n\\adoc    Parameter_format   First two elements of the vector are $\\fa\\f$  and $\\fb\\f$.\n\\adoc    settings    MLE-type: \\ref apop_mle_settings, \\ref apop_parts_wanted_settings  \n  */\n\n#include \"apop_internal.h\"\n\nstatic long double gamma_constraint(apop_data *data, apop_model *v){\n    //constraint is 0 < beta_1 and 0 < beta_2\n    return apop_linear_constraint(v->parameters->vector, .margin= 1e-5);\n}\n\n/** \\cond doxy_ignore */\ntypedef struct {double a, b, ln_ga_plus_a_ln_b;} abstruct;\n/** \\endcond */ //End of Doxygen ignore.\n\nstatic double apply_for_gamma(double x, void *abin) { \n    abstruct *ab = abin;\n    return x ? ((ab->a-1)*log(x) - x/ab->b - ab->ln_ga_plus_a_ln_b) : 0; \n}\n\nstatic long double gamma_log_likelihood(apop_data *d, apop_model *p){\n    Nullcheck_mpd(d, p, GSL_NAN) \n    Get_vmsizes(d)\n    abstruct ab = {.a = gsl_vector_get(p->parameters->vector, 0),\n                   .b = gsl_vector_get(p->parameters->vector, 1) };\n    double llikelihood  = 0,\n        ln_ga  = gsl_sf_lngamma(ab.a),\n        ln_b   = log(ab.b),\n        a_ln_b = ab.a * ln_b;\n    ab.ln_ga_plus_a_ln_b = ln_ga + a_ln_b;\n    llikelihood = apop_map_sum(d, .fn_dp = apply_for_gamma, .param = &ab);\n    return llikelihood;\n}\n\nstatic double a_callback(double x, void *ab){ return log(x)- *(double*)ab; }\n\nstatic double b_callback(double x, void *abv){ \n    double *ab = abv;\n    return x/gsl_pow_2(ab[0]) - ab[1]; \n}\n\nstatic void gamma_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *p){\n    Nullcheck_mp(p, ) \n    double  a = gsl_vector_get(p->parameters->vector, 0),\n        \tb = gsl_vector_get(p->parameters->vector, 1);\n    double psi_a_ln_b  = gsl_sf_psi(a) + log(b);\n    double b_and_ab[2] = {b, a/b};\n    gsl_vector_set(gradient, 0, apop_map_sum(d, .fn_dp = a_callback, .param=&psi_a_ln_b));\n    gsl_vector_set(gradient, 1, apop_map_sum(d, .fn_dp = b_callback, .param=&b_and_ab));\n}\n\n/* \\adoc RNG A wrapper for \\c gsl_ran_gamma, which returns a scalar.\n\nSee the notes for \\ref apop_exponential on a popular alternate form.  */\nstatic int gamma_rng( double *out, gsl_rng* r, apop_model *p){\n    *out    = gsl_ran_gamma(r, gsl_vector_get(p->parameters->vector, 0), gsl_vector_get(p->parameters->vector, 1));\n    return 0;\n}\n\nstatic long double gamma_cdf(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN)\n    Get_vmsizes(d)  //vsize\n    double val = apop_data_get(d, 0, vsize ? -1 : 0);\n    double alpha = gsl_vector_get(params->parameters->vector, 0);\n    double beta = gsl_vector_get(params->parameters->vector, 1);\n    return gsl_cdf_gamma_P(val, alpha, beta);\n}\n\nstatic void gamma_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(gamma_dlog_likelihood, apop_gamma);\n    apop_model_clear(data, params);\n}\n\n/* via method of moments.\n   E(data) = ab\n   var(data) = a b^2\n   so a = E^2/var\n      b = var/E\n*/\nstatic void gamma_est(apop_data *data, apop_model *m){\n    Get_vmsizes(data)\n    double mmean=0, mvar=0, vmean=0, vvar=0;\n    if (vsize){\n        vmean = apop_mean(data->vector);\n        vvar = apop_var(data->vector);\n    }\n    if (msize1) apop_matrix_mean_and_var(data->matrix, &mmean, &mvar);\n    double mean = mmean *(msize1*msize2/(tsize+0.0)) + vmean *(vsize/(tsize+0.0));\n    double var = mvar *(msize1*msize2/(tsize+0.0)) + vvar *(vsize/(tsize+0.0));\n    apop_data_set(m->parameters, 0, .val=gsl_pow_2(mean)/var);\n    apop_data_set(m->parameters, 1, .val=var/mean);\n}\n\napop_model *apop_gamma = &(apop_model){\"Gamma distribution\", 2,0,0, .dsize=1, \n    .estimate = gamma_est, .log_likelihood = gamma_log_likelihood, .prep = gamma_prep, \n    .constraint = gamma_constraint, .cdf = gamma_cdf, .draw = gamma_rng};\n"
  },
  {
    "path": "model/apop_kerneld.c",
    "content": "/** \\file */\n/* The kernel density estimate (meta-)model.\n\nCopyright (c) 2007, 2010, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n*/\n\n#include \"apop_internal.h\"\n#include <gsl/gsl_math.h>\n\n\n/*\\amodel apop_kernel_density The kernel density smoothing of a PMF or histogram.\n\nAt each point along the histogram, put a distribution (default: Normal(0,1)) on top\nof the point. Sum all of these distributions to form the output distribution.\n\nSetting up a kernel density consists of setting up a model with the base data and the\ninformation about the kernel model around each point. This can be done using the \\ref\napop_model_set_settings function to get a copy of the base \\ref apop_kernel_density model\nand add a \\ref apop_kernel_density_settings group with the appropriate information;\nsee the \\c main function of the example below.\n\n\\adoc Input_format  One observation per line. Each row in turn will be passed through to the elements of <tt>kernelbase</tt> and optional <tt>set_params</tt> function, so follow the format of the base model.\n\\adoc Parameter_format  None\n\\adoc Estimated_parameters None\n\\adoc Estimated_settings  The estimate method basically just runs\n                          <tt>apop_model_add_group(your_data, apop_kernel_density);</tt>\n\\adoc Settings  \\ref apop_kernel_density_settings, including:\n\n\\li \\c data a data set, which, if  not \\c NULL and \\c base_pmf is \\c NULL, will be converted to an \\ref apop_pmf model.\n\\li \\c base_pmf This is the preferred format for input data. It is the histogram to be smoothed.\n\\li \\c kernelbase The kernel to use for smoothing, with all parameters set and a \\c p method. Popular favorites are \\ref apop_normal and \\ref apop_uniform.\n\\li \\c set_params A function that takes in a single number and the model, and sets\nthe parameters accordingly. The function will call this for every point in the data\nset. Here is the default, which is used if this is \\c NULL. It simply sets the first\nelement of the model's parameter vector to the input number; this is appropriate for a\nNormal distribution, where we want to center the distribution on each data point in turn.\n\n\\code\nstatic void apop_set_first_param(apop_data *in, apop_model *m){\n    apop_data_set(m->parameters, .val= apop_data_get(in));\n}\n\\endcode\n\nSee the sample code for for a Uniform[0,1] recentered around the first element of the PMF matrix.\n\n\\adoc Examples\nThis example sets up and uses KDEs based on Normal and Uniform distributions.\n\n\\include kernel.c\n*/\n\nstatic void apop_set_first_param(apop_data *in, apop_model *m){\n    apop_data_set(m->parameters, .val= apop_data_get(in));\n}\n\nApop_settings_init(apop_kernel_density, \n    //If there's a PMF associated with the model, run with it.\n    //else, generate one from the data.\n    Apop_varad_set(base_pmf, apop_estimate(in.base_data, apop_pmf));\n    Apop_varad_set(kernel, apop_model_set_parameters(apop_normal, 0, 1));\n    Apop_varad_set(set_fn, apop_set_first_param);\n    out->own_pmf = !in.base_pmf;\n    out->own_kernel = !in.kernel;\n    if (!out->kernel->parameters) apop_prep(out->base_data, out->kernel);\n)\n\nApop_settings_copy(apop_kernel_density,\n    out->own_pmf    =\n    out->own_kernel = 0;\n)\n\nApop_settings_free(apop_kernel_density,\n    if (in->own_pmf)    apop_model_free(in->base_pmf);\n    if (in->own_kernel) apop_model_free(in->kernel);\n)\n\nstatic void apop_kernel_estimate(apop_data *d, apop_model *m){\n    Nullcheck_d(d, );\n    if (!apop_settings_get_group(m, apop_kernel_density))\n        Apop_settings_add_group(m, apop_kernel_density, .base_data=d);\n}\n\n/* \\adoc    CDF Sums the CDF to the given point of all the sub-distributions.*/\nstatic long double kernel_cdf(apop_data *d, apop_model *m){\n    Nullcheck_m(m, GSL_NAN);\n    long double total = 0;\n    apop_kernel_density_settings *ks = apop_settings_get_group(m, apop_kernel_density);\n    apop_data *pmf_data = apop_settings_get(m, apop_kernel_density, base_pmf)->data;\n    Get_vmsizes(pmf_data); //maxsize\n    for (size_t k = 0; k < maxsize; k++){\n        apop_data *r = Apop_r(pmf_data, k);\n        double wt = r->weights ? *r->weights->data : 1;\n        OMP_critical(kernel_p_cdf)\n        {\n        (ks->set_fn)(r, ks->kernel);\n        total += apop_cdf(d, ks->kernel)*wt;\n        }\n    }\n    long double weight = pmf_data->weights ? apop_sum(pmf_data->weights) : maxsize;\n    total /= weight;\n    return total;\n}\n\nstatic long double kernel_ll(apop_data *d, apop_model *m){\n    Nullcheck_m(m, GSL_NAN);\n    size_t datasize;\n    {Get_vmsizes(d); datasize=maxsize;}\n    apop_kernel_density_settings *ks = apop_settings_get_group(m, apop_kernel_density);\n    apop_data *pmf_data = apop_settings_get(m, apop_kernel_density, base_pmf)->data;\n    Get_vmsizes(pmf_data); //maxsize\n    long double ll = 0;\n    OMP_for_reduce(+:ll,    int i=0; i< datasize; i++){\n        long double lls[maxsize];\n        apop_data *datapt = Apop_r(d, i);\n        for(int k=0; k< maxsize; k++){\n            apop_data *r = Apop_r(pmf_data, k);\n            OMP_critical(kernel_p_cdf)\n            {\n            (ks->set_fn)(r, ks->kernel);\n            lls[k] = apop_log_likelihood(datapt, ks->kernel);\n            }\n        }\n\n        //let p_m w_m be the largest value among the p_i w_is. Then\n        //log (Σp_i w_i) = log(p_m w_m) + log(Σ(p_i w_i/p_m w_m).\n        //This gives us a little more numeric accuracy.\n        double max_ll = -INFINITY;\n        double total = 0;\n        #define getwt(i) (pmf_data->weights ? gsl_vector_get(pmf_data->weights, i) : 1);\n        for (int i=0; i< maxsize; i++) if (lls[i]>max_ll) max_ll = lls[i];\n        if (max_ll==-INFINITY) {ll=-INFINITY; continue;}\n        for (int i=0; i< maxsize; i++) lls[i]-=max_ll;\n        for (int i=0; i< maxsize; i++) lls[i]= exp(lls[i]) * getwt(i);\n        for (int i=0; i< maxsize; i++) total += lls[i];\n        ll += max_ll + log(total);\n    }\n    ll -= datasize * log(pmf_data->weights ? apop_sum(pmf_data->weights) : maxsize);\n    return ll;\n}\n\n/* \\adoc    RNG  Randomly selects a data point, then randomly draws from that sub-distribution.\n Returns 0 on success, 1 if unable to pick a sub-distribution (meaning the weights over the distributions are somehow broken), and 2 if unable to draw from the sub-distribution.\n */\nstatic int kernel_draw(double *d, gsl_rng *r, apop_model *m){\n    //randomly select a point, using the weights.\n    apop_kernel_density_settings *ks = apop_settings_get_group(m, apop_kernel_density);\n    apop_model *pmf = apop_settings_get(m, apop_kernel_density, base_pmf);\n    apop_data *point = apop_data_alloc(1, pmf->dsize);\n    Apop_stopif(apop_draw(Apop_rv(point, 0)->data, r, pmf), return 1, 0, \"Unable to use the PMF over kernels to select a kernel from which to draw.\");\n    (ks->set_fn)(point, ks->kernel);\n    //Now draw from the distribution around that point.\n    Apop_stopif(apop_draw(d, r, ks->kernel), return 2, 0, \"unable to draw from a single selected kernel.\");\n    apop_data_free(point);\n    return 0;\n}\n\napop_model *apop_kernel_density = &(apop_model){\"kernel density estimate\", .dsize=1,\n    .estimate = apop_kernel_estimate, .log_likelihood = kernel_ll,\n\t.cdf=kernel_cdf, .draw=kernel_draw};\n"
  },
  {
    "path": "model/apop_loess.c",
    "content": "/* Functions to calculate loess regressions. \n\nProvenance:\n   * Originally written in FORTRAN `77 by Cleveland, Devlin, Grosse, and Shyu, 1988\n   * Some C wrapper written by Cleveland, Grosse, and Shyu, 1992\n   * Punched through f2c, munged into one file, and heavily edited by Klemens, 2009, 2011\n   * Legal & ref.s: Documentation from CG&S state that their code is\n            public domain. See http://netlib.org/a/cloess.pdf (esp. if\n            you hope to modify the below). You can still get the `92\n            version from http://netlib.org/a/dloess (a shell script\n            that unpacks into everything you need). Most BK edits (c)\n            2009, licensed under the GPLv2; see COPYING.\n            Those BK edits made during time working as a gov't\n            employee are public domain.\n\n    By the way, search the code for execnt: many functions will let you\n    query how many times they have been hit, which you might find to be useful.\n\n\\amodel apop_loess Regression via loess smoothing\n\nThis uses a somewhat black-box routine, first written by Chamberlain, Devlin, Grosse,\nand Shyu in 1988, to fit a smoothed series of quadratic curves to the input data,\nthus producing a curve more closely fitting than a simple regression would.\n\nThe curve is basically impossible to describe using a short list of parameters, so the\nrepresentation is in the form of the \\c predicted vector of the \\c expected data set;\nsee below.\n\nFrom the 1992 manual for the package:\n``The method we will use to fit local regression models is called <em>loess</em>, which\nis short for local regression, and was chosen as the name since a loess is a deposit\nof fine clay or silt along a river valley, and thus is a surface of sorts. The word\ncomes from the German löss, and is pronounced löíss.''\n\n\\adoc    Input_format  \nThe data is basically OLS-like:                     \nthe first column of the data is the dependent variable to be explained; subsequent\nvariables are the independent explanatory variables.  Thus, your input data can either\nhave a dependent vector plus explanatory matrix, or a matrix where the first column\nis the dependent variable.\n\nUnlike with OLS, I won't move your original data, and I won't add a <b>1</b>, because\nthat's not really the loess custom. You can of course set up your data that way if\nyou like.\n\nIf your data set has a weights vector, I'll use it.\n\nIn any case, all data is copied into the model's \\ref apop_loess_settings. The code\nis primarily FORTRAN code from 1988 converted to C; the data thus has to be converted\ninto a relatively obsolete internal format.\n\n\n\\adoc    Parameter_format  Unused. \n\\adoc    estimated_parameters None.  \n\\adoc    Postestimate_settings The \\ref apop_loess_settings is filled with results (and internal processing cruft). The\n\\c out_model->info data set has a table giving the actual, \\c predicted, and \\c residual\ncolumns, which is amenable to plotting.  Try:\n        \\code\n        apop_data_show(apop_data_get_page(output_model->info, \"<Predicted>\"));\n        \\endcode\n        \n\\adoc    Predict \nFills in the zeroth column (ignoring and overwriting any data there), and adds an additional page to the input \\ref\napop_data set named \"<Confidence>\" with a lower and upper CI for each point.\n\n\\adoc    settings \\ref apop_loess_settings */\n\n#include \"apop_internal.h\"\n\n////////////a few lines from f2c.h\n#define TRUE_ (1)\n#define FALSE_ (0)\n#define abs(x) ((x) >= 0 ? (x) : -(x))\n#define min(a,b) ((a) <= (b) ? (a) : (b))\n#define max(a,b) ((a) >= (b) ? (a) : (b))\n\ntypedef long int logical;\ntypedef long int integer;\n\n#define Calloc(n,t)\t(t *)calloc((unsigned)(n),sizeof(t))   // From #include \"S.h\"\n#define Warning(msg) Apop_assert_c(0, , 0 , \"%s\", msg);\n\nstatic integer c__0 = 0;\nstatic integer c__1 = 1;\nstatic integer c__15 = 15;\nstatic integer c__2 = 2;\nstatic integer c__21 = 21;\ndouble doublepluszero = 0;\n\n//I'm using the GSL's blas system. These are pass-through functions that\n//save me the trouble of slogging through the code and making substitutions.\nvoid dswap(const integer N, double *x, double *y){ cblas_dswap(N, x, 1, y, 1);}\n\ndouble dnrm2(const integer *N, const double *x){ return cblas_dnrm2(*N, x, 1); }\n\ndouble ddot_(const integer *N, const double *x, const integer *incx, const double *y, const integer *incy){\n\t\treturn cblas_ddot(*N, x, *incx, y, *incy);}\n\nvoid daxpy_(integer *N, const double *alpha, const double *x, integer *incx, double *y, integer *incy){\n\t\tcblas_daxpy(*N, *alpha, x, *incx, y, *incy);}\n\nvoid dcopy_(const integer *N, const double *x, const integer incx, double *y, const integer incy){\n\t\tcblas_dcopy(*N, x, incx, y, incy); }\n\nvoid dscal(integer *N, const double alpha, double *x){ cblas_dscal(*N, alpha, x, 1);}\n\nstatic void drotg_(double *a,double *b, double *c, double *s){ cblas_drotg(a,b,c,s);}\n\nvoid drot_(const integer *N, double *x, const integer *incx, double *y, const integer *incy, const double *c, const double *s){\n\t\tcblas_drot(*N, x, *incx, y, *incy, *c, *s);}\n\n//Procedures from libf2c.\nstatic integer pow_ii(integer x, integer n) { return gsl_pow_int(x, n); }\n\nstatic double d_sign(double *a, double *b) {\n    double x = (*a >= 0 ? *a : - *a);\n    return( *b >= 0 ? x : -x);\n}\n\n/*//// dqrsl.f -- translated by f2c (version 20061008).\n\n     dqrsl applies the output of dqrdc to compute coordinate transformations,\n     projections, and least squares solutions.  for k .le. min(n,p), let xk be the matrix\n\n            xk = (x(jpvt(1)),x(jpvt(2)), ... ,x(jpvt(k))) \n\n     formed from columns jpvt(1), ... ,jpvt(k) of the original n x p matrix x that was\n     input to dqrdc (if no pivoting was done, xk consists of the first k columns of x\n     in their original order).  dqrdc produces a factored orthogonal matrix q and an\n     upper triangular matrix r such that\n\n              xk = q * (r)\n                         (0)             \n\n     this information is contained in coded form in the arrays x and qraux. \n\n     on entry \n\n        x      double precision(ldx,p). \n               x contains the output of dqrdc. \n\n        ldx    integer. \n               ldx is the leading dimension of the array x. \n\n        n      integer. \n               n is the number of rows of the matrix xk.  it must have the same value\n               as n in dqrdc.\n\n        k      integer. \n               k is the number of columns of the matrix xk.  k must nnot be greater\n               than min(n,p), where p is the same as in the calling sequence to dqrdc.\n\n        qraux  double precision(p). \n               qraux contains the auxiliary output from dqrdc. \n\n        y      double precision(n) \n               y contains an n-vector that is to be manipulated by dqrsl. \n\n        job    integer. \n               job specifies what is to be computed.  job has the decimal expansion\n               abcde, with the following meaning.\n\n                    if a.ne.0, compute qy.\n                      if b,c,d, or e .ne. 0, compute qty.\n                      if c.ne.0, compute b.\n                      if d.ne.0, compute rsd.\n                      if e.ne.0, compute xb. \n\n               note that a request to compute b, rsd, or xb automatically triggers\n               the computation of qty, for which an array must be provided in the\n               calling sequence.\n\n     on return \n\n        qy     double precision(n). \n               qy contains q*y, if its computation has been requested. \n\n        qty    double precision(n). \n               qty contains trans(q)*y, if its computation has been requested.\n               here trans(q) is the transpose of the matrix q.\n\n        b      double precision(k) \n               b contains the solution of the least squares problem \n\n                    minimize norm2(y - xk*b), \n\n               if its computation has been requested.  (note that if pivoting was\n               requested in dqrdc, the j-th component of b will be associated with\n               column jpvt(j) of the original matrix x that was input into dqrdc.)\n\n        rsd    double precision(n). \n               rsd contains the least squares residual y - xk*b, if its computation\n               has been requested.  rsd is also the orthogonal projection of y onto\n               the orthogonal complement of the column space of xk.\n\n        xb     double precision(n). \n               xb contains the least squares approximation xk*b, if its computation\n               has been requested.  xb is also the orthogonal projection of y onto\n               the column space of x.\n\n        info   integer. \n               info is zero unless the computation of b has been requested and r is\n               exactly singular.  in this case, info is the index of the first zero\n               diagonal element of r and b is left unaltered.\n\n     The parameters qy, qty, b, rsd, and xb are not referenced if their computation\n     is not requested and in this case can be replaced by dummy variables in the\n     calling program.  to save storage, the user may in some cases use the same array\n     for different parameters in the calling sequence.  a frequently occurring example\n     is when one wishes to compute any of b, rsd, or xb and does not need y or qty.\n     in this case one may identify y, qty, and one of b, rsd, or xb, while providing\n     separate arrays for anything else that is to be computed.  thus the calling sequence\n\n          call dqrsl(x,ldx,n,k,qraux,y,dum,y,b,y,dum,110,info) \n\n     will result in the computation of b and rsd, with rsd overwriting y.  More\n     generally, each item in the following list contains groups of permissible\n     identifications for a single calling sequence.\n\n          1. (y,qty,b) (rsd) (xb) (qy)\n            2. (y,qty,rsd) (b) (xb) (qy)\n            3. (y,qty,xb) (b) (rsd) (qy)\n            4. (y,qy) (qty,b) (rsd) (xb)\n            5. (y,qy) (qty,rsd) (b) (xb)\n            6. (y,qy) (qty,xb) (b) (rsd) \n\n     in any group the value returned in the array allocated to the group corresponds\n     to the last member of the group.\n\n     linpack. this version dated 08/14/78 . \n     g.w. stewart, university of maryland, argonne national lab. */\nstatic void dqrsl_(double *x, integer *ldx, integer *n, integer * k, double *qraux, double *y, \n        double *qy, double *qty, double *b, double *rsd, double *xb, integer job, integer *info) {\n\n    integer x_dim1, i__1, i__2;\n    static integer i__, j;\n    static double t, temp;\n    static integer jj, ju, kp1;\n    static logical cb, cr, cxb, cqy, cqty;\n\n    x_dim1 = *ldx;\n    x -= 1 + x_dim1;\n    --qraux;\n    --y;\n    --qy;\n    --qty;\n    --b;\n    --rsd;\n    --xb;\n\n    *info = 0; /*     set info flag. */\n\n    /*     determine what is to be computed. */\n    cqy = job / 10000 != 0;\n    cqty = job % 10000 != 0;\n    cb = job % 1000 / 100 != 0;\n    cr = job % 100 / 10 != 0;\n    cxb = job % 10 != 0;\n    ju = min(*k,*n - 1);\n\n    /*     special action when n=1. */\n    if (ju != 0) goto L40;\n    if (cqy)     qy[1] = y[1];\n    if (cqty)    qty[1] = y[1];\n    if (cxb)     xb[1] = y[1];\n    if (!cb)     goto L30;\n    if (x[x_dim1 + 1] != 0.) goto L10;\n    *info = 1;\n    goto L20;\nL10:\n    b[1] = y[1] / x[x_dim1 + 1];\nL20:\nL30:\n    if (cr) rsd[1] = 0.;\n    return;\nL40:\n    /*        set up to compute qy or qty. */\n    if (cqy)  dcopy_(n, &y[1], 1, &qy[1], 1);\n    if (cqty) dcopy_(n, &y[1], 1, &qty[1], 1);\n    if (!cqy) goto L70;\n\n    /*           compute qy. */\n    for (jj = 1; jj <= ju; ++jj) {\n        j = ju - jj + 1;\n        if (qraux[j] == 0.)\n            continue;\n        temp = x[j + j * x_dim1];\n        x[j + j * x_dim1] = qraux[j];\n        i__2 = *n - j + 1;\n        t = -ddot_(&i__2, &x[j + j * x_dim1], &c__1, &qy[j], &c__1) / x[j + j * x_dim1];\n        i__2 = *n - j + 1;\n        daxpy_(&i__2, &t, &x[j + j * x_dim1], &c__1, &qy[j], &c__1); x[j + j * x_dim1] = temp;\n    }\nL70:\n    if (cqty) /*           compute trans(q)*y. */\n        for (j = 1; j <= ju; ++j) {\n            if (qraux[j] == 0.)\n                continue;\n            temp = x[j + j * x_dim1];\n            x[j + j * x_dim1] = qraux[j];\n            i__2 = *n - j + 1;\n            t = -ddot_(&i__2, &x[j + j * x_dim1], &c__1, &qty[j], &c__1) / x[j + j * x_dim1];\n            i__2 = *n - j + 1;\n            daxpy_(&i__2, &t, &x[j + j * x_dim1], &c__1, &qty[j], &c__1);\n            x[j + j * x_dim1] = temp;\n        }\n\n    /*        set up to compute b, rsd, or xb. */\n    if (cb) dcopy_(k, &qty[1], 1, &b[1], 1);\n    kp1 = *k + 1;\n    if (cxb) dcopy_(k, &qty[1], 1, &xb[1], 1);\n    if (cr && *k < *n) {\n        i__1 = *n - *k;\n        dcopy_(&i__1, &qty[kp1], 1, &rsd[kp1], 1);\n    }\n    if (! cxb || kp1 > *n)\n        goto L120;\n    for (i__ = kp1; i__ <= *n; ++i__)\n        xb[i__] = 0.;\nL120:\n    if (! cr)\n        goto L140;\n    for (i__ = 1; i__ <= *k; ++i__)\n        rsd[i__] = 0.;\nL140:\n    if (! cb)\n        goto L190;\n\n    /*           compute b. */\n    for (jj = 1; jj <= *k; ++jj) {\n        j = *k - jj + 1;\n        if (x[j + j * x_dim1] != 0.)\n            goto L150;\n        *info = j;\n        break;\n    L150:\n        b[j] /= x[j + j * x_dim1];\n        if (j == 1)\n            continue;\n        t = -b[j];\n        i__2 = j - 1;\n        daxpy_(&i__2, &t, &x[j * x_dim1 + 1], &c__1, &b[1], &c__1);\n    }\nL190:\n    if (! cr && ! cxb)\n        return;\n    /*           compute rsd or xb as required. */\n    for (jj = 1; jj <= ju; ++jj) {\n        j = ju - jj + 1;\n        if (qraux[j] == 0.)\n            continue;\n        temp = x[j + j * x_dim1];\n        x[j + j * x_dim1] = qraux[j];\n        if (cr) {\n            i__2 = *n - j + 1;\n            t = -ddot_(&i__2, &x[j + j * x_dim1], &c__1, &rsd[j], &c__1) / x[j + j * x_dim1];\n            i__2 = *n - j + 1;\n            daxpy_(&i__2, &t, &x[j + j * x_dim1], &c__1, &rsd[j], &c__1);\n        }\n        if (cxb) {\n            i__2 = *n - j + 1;\n            t = -ddot_(&i__2, &x[j + j * x_dim1], &c__1, &xb[j], &c__1) / x[j + j * x_dim1];\n            i__2 = *n - j + 1;\n            daxpy_(&i__2, &t, &x[j + j * x_dim1], &c__1, &xb[j], &c__1);\n        }\n        x[j + j * x_dim1] = temp;\n    }\n} /* dqrsl_ */\n\n/*//// dsvdc.f -- translated by f2c (version 20061008).  \n\n     dsvdc is a subroutine to reduce a double precision nxp matrix x by orthogonal\n     transformations u and v to diagonal form.  the diagonal elements s(i) are the\n     singular values of x.  the columns of u are the corresponding left singular vectors,\n     and the columns of v the right singular vectors.\n\n     on entry \n\n         x         double precision(ldx,p), where ldx.ge.n. \n                   x contains the matrix whose singular value decomposition is to\n                   be computed.  x is destroyed by dsvdc.\n\n         ldx       integer. \n                   ldx is the leading dimension of the array x. \n\n         n         integer. \n                   n is the number of rows of the matrix x. \n\n         p         integer. \n                   p is the number of columns of the matrix x. \n\n         ldu       integer. \n                   ldu is the leading dimension of the array u.  (see below).\n\n         ldv       integer. \n                   ldv is the leading dimension of the array v.  (see below).\n\n         work      double precision(n). \n                   work is a scratch array. \n\n         job       integer. \n                   job controls the computation of the singular vectors.  It has the\n                   decimal expansion ab with the following meaning\n\n                    a.eq.0    do not compute the left singular vectors.\n                      a.eq.1    return the n left singular vectors in u.\n                      a.ge.2    return the first min(n,p) singular vectors in u. \n                      b.eq.0    do not compute the right singular vectors. \n                      b.eq.1    return the right singular vectors in v. \n\n     on return \n\n         s         double precision(mm), where mm=min(n+1,p).\n                   the first min(n,p) entries of s contain the singular values of x\n                   arranged in descending order of magnitude.\n\n         e         double precision(p), \n                   e ordinarily contains zeros.  however see the discussion of info\n                   for exceptions.\n\n         u         double precision(ldu,k), where ldu.ge.n.  if \n                                   joba.eq.1 then k.eq.n, if joba.ge.2 \n                                   then k.eq.min(n,p). \n                   u contains the matrix of left singular vectors.  u is not referenced\n                   if joba.eq.0.  if n.le.p or if joba.eq.2, then u may be identified\n                   with x in the subroutine call.\n\n         v         double precision(ldv,p), where ldv.ge.p. \n                   v contains the matrix of right singular vectors.  v is not referenced\n                   if job.eq.0.  if p.le.n, then v may be identified with x in the\n                   subroutine call.\n\n         info      integer. \n                   the singular values (and their corresponding singular vectors)\n                   s(info+1),s(info+2),...,s(m) are correct (here m=min(n,p)).  thus if\n                   info.eq.0, all the singular values and their vectors are correct.\n                   in any event, the matrix b = trans(u)*x*v is the bidiagonal matrix\n                   with the elements of s on its diagonal and the elements of e on its\n                   super-diagonal (trans(u) is the transpose of u).  thus the singular\n                   values of x and b are the same.\n\n     linpack. this version dated 08/14/78 . \n              correction made to shift 2/84. \n     g.w. stewart, university of maryland, argonne national lab. \n\n     Modified 2000-12-28 to use a relative convergence test, \n     as this was infinite-looping on ix86. \n\n     dsvdc uses the following functions and subprograms. */\n\nstatic void dsvdc_(double *x, integer *ldx, integer *n, integer * p, double *s,\n        double *e, double *u, integer *ldu,\n        double *v, integer *ldv, double *work, integer *job, integer * info) {\n    integer x_dim1, u_dim1, v_dim1, i__2, i__3;\n    double d__1;\n    static double b, c__, f, g, t, t1, el, cs, sl, sm, sn, acc, emm1, smm1;\n    static double test, scale, shift, ztest;\n    static integer i__, j, k, l, m, kk, ll, mm, ls, lu, lm1, mm1, lp1, mp1, nct, ncu, lls, nrt;\n    static integer kase, jobu, iter, nctp1, nrtp1, maxit;\n    static logical wantu, wantv;\n\n    x_dim1 = *ldx;\n    x -= 1 + x_dim1;\n    --s;\n    --e;\n    u_dim1 = *ldu;\n    u -= 1 + u_dim1;\n    v_dim1 = *ldv;\n    v -= 1 + v_dim1;\n    --work;\n    l = 0;\n    ls = 0;\n    maxit = 30; // set the maximum number of iterations.\n\n    /*     determine what is to be computed. */\n    wantu = FALSE_;\n    wantv = FALSE_;\n    jobu = *job % 100 / 10;\n    ncu = *n;\n    if (jobu > 1)\n        ncu = min(*n,*p);\n    if (jobu != 0)\n        wantu = TRUE_;\n    if (*job % 10 != 0)\n        wantv = TRUE_;\n\n    /*     reduce x to bidiagonal form, storing the diagonal elements */\n    /*     in s and the super-diagonal elements in e. */\n    *info = 0;\n    nct = min(*n - 1,*p);\n    nrt = max(0,min(*p - 2,*n));\n    lu = max(nct,nrt);\n    if (lu < 1)\n        goto L170;\n    for (l = 1; l <= lu; ++l) {\n        lp1 = l + 1;\n        if (l <= nct) {\n            /*           compute the transformation for the l-th column and */\n            /*           place the l-th diagonal in s(l). */\n            i__2 = *n - l + 1;\n            s[l] = dnrm2(&i__2, &x[l + l * x_dim1]);\n            if (s[l] != 0.) {\n                if (x[l + l * x_dim1] != 0.) {\n                    s[l] = d_sign(&s[l], &x[l + l * x_dim1]);\n                }\n                i__2 = *n - l + 1;\n                dscal(&i__2, 1./s[l], &x[l + l * x_dim1]);\n                x[l + l * x_dim1] += 1.;\n            }\n            s[l] = -s[l];\n        }\n        if (*p >= lp1)\n            for (j = lp1; j <= *p; ++j) {\n                if ((l <= nct) && (s[l] != 0.)) {\n                    /*              apply the transformation. */\n                    i__3 = *n - l + 1;\n                    t = -ddot_(&i__3, &x[l + l * x_dim1], &c__1, &x[l + j * x_dim1], &c__1) / x[l + l * x_dim1];\n                    i__3 = *n - l + 1;\n                    daxpy_(&i__3, &t, &x[l + l * x_dim1], &c__1, &x[l + j * x_dim1], &c__1);\n                }\n                /*           place the l-th row of x into  e for the */\n                /*           subsequent calculation of the row transformation. */\n                e[j] = x[l + j * x_dim1];\n            }\n        if (!(! wantu || l > nct)) /* place the transformation in u for subsequent back multiplication. */\n            for (i__ = l; i__ <= *n; ++i__)\n                u[i__ + l * u_dim1] = x[i__ + l * x_dim1];\n        if (l > nrt)\n            continue;\n\n        /*           compute the l-th row transformation and place the */\n        /*           l-th super-diagonal in e(l). */\n        i__2 = *p - l;\n        e[l] = dnrm2(&i__2, &e[lp1]);\n        if (e[l] != 0.) {\n            if (e[lp1] != 0.)\n                e[l] = d_sign(&e[l], &e[lp1]);\n            i__2 = *p - l;\n            dscal(&i__2, 1./e[l], &e[lp1]);\n            e[lp1] += 1.;\n        }\n        e[l] = -e[l];\n        if (!(lp1 > *n || e[l] == 0.)) {\n            /*              apply the transformation. */\n            for (i__ = lp1; i__ <= *n; ++i__)\n                work[i__] = 0.;\n            for (j = lp1; j <= *p; ++j) {\n                i__3 = *n - l;\n                daxpy_(&i__3, &e[j], &x[lp1 + j * x_dim1], &c__1, &work[lp1], &c__1);\n            }\n            for (j = lp1; j <= *p; ++j) {\n                i__3 = *n - l;\n                d__1 = -e[j] / e[lp1];\n                daxpy_(&i__3, &d__1, &work[lp1], &c__1, &x[lp1 + j * x_dim1], &c__1);\n            }\n        }\n        if (! wantv)\n            continue;\n\n        /*  place the transformation in v for subsequent back multiplication. */\n        for (i__ = lp1; i__ <= *p; ++i__)\n            v[i__ + l * v_dim1] = e[i__];\n    }\nL170:\n\n    /*     set up the final bidiagonal matrix or order m. */\n    m = GSL_MIN(*p, *n + 1);\n    nctp1 = nct + 1;\n    nrtp1 = nrt + 1;\n    if (nct < *p)\n        s[nctp1] = x[nctp1 + nctp1 * x_dim1];\n    if (*n < m)\n        s[m] = 0.;\n    if (nrtp1 < m)\n        e[nrtp1] = x[nrtp1 + m * x_dim1];\n    e[m] = 0.;\n\n    /*     if required, generate u. */\n    if (wantu) {\n        if (ncu >= nctp1)\n            for (j = nctp1; j <= ncu; ++j) {\n                for (i__ = 1; i__ <= *n; ++i__)\n                    u[i__ + j * u_dim1] = 0.;\n                u[j + j * u_dim1] = 1.;\n            }\n        if (nct >= 1)\n            for (ll = 1; ll <= nct; ++ll) {\n                l = nct - ll + 1;\n                if (s[l] != 0.) {\n                    lp1 = l + 1;\n                    if (ncu >= lp1)\n                        for (j = lp1; j <= ncu; ++j) {\n                            i__3 = *n - l + 1;\n                            t = -ddot_(&i__3, &u[l + l * u_dim1], &c__1, &u[l + j * u_dim1], & c__1) / u[l + l * u_dim1];\n                            i__3 = *n - l + 1;\n                            daxpy_(&i__3, &t, &u[l + l * u_dim1], &c__1, &u[l + j * u_dim1], & c__1);\n                        }\n                    i__2 = *n - l + 1;\n                    dscal(&i__2, -1., &u[l + l * u_dim1]);\n                    u[l + l * u_dim1] += 1.;\n                    lm1 = l - 1;\n                    if (lm1 >= 1)\n                        for (i__ = 1; i__ <= lm1; ++i__)\n                            u[i__ + l * u_dim1] = 0.;\n                    continue;\n                }\n                for (i__ = 1; i__ <= *n; ++i__)\n                    u[i__ + l * u_dim1] = 0.;\n                u[l + l * u_dim1] = 1.;\n            }\n    }\n\n    /*     if it is required, generate v. */\n    if (wantv)\n        for (ll = 1; ll <= *p; ++ll) {\n            l = *p - ll + 1;\n            lp1 = l + 1;\n            if ((l <= nrt) && (e[l] != 0.))\n                for (j = lp1; j <= *p; ++j) {\n                    i__3 = *p - l;\n                    t = -ddot_(&i__3, &v[lp1 + l * v_dim1], &c__1, &v[lp1 + j * v_dim1], &c__1) / v[lp1 + l * v_dim1];\n                    i__3 = *p - l;\n                    daxpy_(&i__3, &t, &v[lp1 + l * v_dim1], &c__1, &v[lp1 + j * v_dim1], &c__1);\n                }\n            for (i__ = 1; i__ <= *p; ++i__)\n                v[i__ + l * v_dim1] = 0.;\n            v[l + l * v_dim1] = 1.;\n        }\n\n    /*     main iteration loop for the singular values. */\n    mm = m;\n    iter = 0;\nL360:\n    // quit if all the singular values have been found.\n    if (m == 0)\n        return;\n    if (iter < maxit) // if too many iterations have been performed, set  flag and return.\n        goto L370;\n    *info = m;\n    return;\nL370:\n\n/*      This section of the program inspects for negligible elements in the s and\n        e arrays.  On completion the variables kase and l are set as follows.\n\n           kase = 1     if s(m) and e(l-1) are negligible and l.lt.m \n           kase = 2     if s(l) is negligible and l.lt.m \n           kase = 3     if e(l-1) is negligible, l.lt.m, and \n                        s(l), ..., s(m) are not negligible (qr step). \n           kase = 4     if e(m-1) is negligible (convergence). */\n\n    for (ll = 1; ll <= m; ++ll) {\n        l = m - ll;\n        if (l == 0)\n            break;\n        test = abs(s[l]) + abs(s[l + 1]);\n        ztest = test + abs(e[l]);\n        acc = abs(test - ztest) / (test + 1e-100);\n        if (acc > 1e-15)\n            continue;\n    /*            if (ztest .ne. test) go to 380 */\n        e[l] = 0.;\n        break;\n    }\n    if (l != m - 1)\n        goto L410;\n    kase = 4;\n    goto L480;\nL410:\n    lp1 = l + 1;\n    mp1 = m + 1;\n    for (lls = lp1; lls <= mp1; ++lls) {\n        ls = m - lls + lp1;\n        if (ls == l)\n            break;\n        test = 0.;\n        if (ls != m)\n            test += abs(e[ls]);\n        if (ls != l + 1)\n            test += abs(e[ls - 1]);\n        ztest = test + abs(s[ls]);\n        /* 1.0d-100 is to guard against a zero matrix, hence zero test */\n        acc = abs(test - ztest) / (test + 1e-100);\n        if (acc > 1e-15)\n            continue;\n        /*               if (ztest .ne. test) go to 420 */\n        s[ls] = 0.;\n        break;\n    }\n    if (ls != l)\n        goto L450;\n    kase = 3;\n    goto L470;\nL450:\n    if (ls != m)\n        goto L460;\n    kase = 1;\n    goto L470;\nL460:\n    kase = 2;\n    l = ls;\nL470:\nL480:\n    ++l;\n\n    switch (kase) { /*        perform the task indicated by kase. */\n        case 1:  goto L490;\n        case 2:  goto L520;\n        case 3:  goto L540;\n        case 4:  goto L570;\n    }\n\n/*        deflate negligible s(m). */\nL490:\n    mm1 = m - 1;\n    f = e[m - 1];\n    e[m - 1] = 0.;\n    for (kk = l; kk <= mm1; ++kk) {\n        k = mm1 - kk + l;\n        t1 = s[k];\n        drotg_(&t1, &f, &cs, &sn);\n        s[k] = t1;\n        if (k != l) {\n            f = -sn * e[k - 1];\n            e[k - 1] = cs * e[k - 1];\n        }\n        if (wantv)\n            drot_(p, &v[k * v_dim1 + 1], &c__1, &v[m * v_dim1 + 1], &c__1, & cs, &sn);\n    }\n    goto L610;\n\n/*        split at negligible s(l). */\nL520:\n    f = e[l - 1];\n    e[l - 1] = 0.;\n    for (k = l; k <= m; ++k) {\n        t1 = s[k];\n        drotg_(&t1, &f, &cs, &sn);\n        s[k] = t1;\n        f = -sn * e[k];\n        e[k] = cs * e[k];\n        if (wantu)\n            drot_(n, &u[k * u_dim1 + 1], &c__1, &u[(l - 1) * u_dim1 + 1], & c__1, &cs, &sn);\n    }\n    goto L610;\n\n/*        perform one qr step. */\nL540:\n\n    /*           calculate the shift. */\n    scale = GSL_MAX(abs(s[m]),  //chain binary maxes to find largest in the group.\n                GSL_MAX(abs(s[m - 1]),\n                GSL_MAX(abs(e[m - 1]), \n                GSL_MAX(abs(s[l]), \n                GSL_MAX(abs(e[l]), abs(s[m - 1]))))));\n    sm = s[m] / scale;\n    smm1 = s[m - 1] / scale;\n    emm1 = e[m - 1] / scale;\n    sl = s[l] / scale;\n    el = e[l] / scale;\n    b = ((smm1 + sm) * (smm1 - sm) + emm1 * emm1) / 2.;\n    c__ = gsl_pow_2( sm * emm1);\n    shift = 0.;\n    if (b == 0. && c__ == 0.)\n        goto L550;\n    shift = sqrt(b * b + c__);\n    if (b < 0.)\n        shift = -shift;\n    shift = c__ / (b + shift);\nL550:\n    f = (sl + sm) * (sl - sm) + shift;\n    g = sl * el;\n\n    /*           chase zeros. */\n    mm1 = m - 1;\n    for (k = l; k <= mm1; ++k) {\n        drotg_(&f, &g, &cs, &sn);\n        if (k != l)\n            e[k - 1] = f;\n        f = cs * s[k] + sn * e[k];\n        e[k] = cs * e[k] - sn * s[k];\n        g = sn * s[k + 1];\n        s[k + 1] = cs * s[k + 1];\n        if (wantv)\n            drot_(p, &v[k * v_dim1 + 1], &c__1, &v[(k + 1) * v_dim1 + 1], & c__1, &cs, &sn);\n        drotg_(&f, &g, &cs, &sn);\n        s[k] = f;\n        f = cs * e[k] + sn * s[k + 1];\n        s[k + 1] = -sn * e[k] + cs * s[k + 1];\n        g = sn * e[k + 1];\n        e[k + 1] = cs * e[k + 1];\n        if (wantu && k < *n)\n            drot_(n, &u[k * u_dim1 + 1], &c__1, &u[(k + 1) * u_dim1 + 1], & c__1, &cs, &sn);\n    }\n    e[m - 1] = f;\n    ++iter;\n    goto L610;\n\n/*        convergence. */\nL570:\n\n/*           make the singular value  positive. */\n\n    if (s[l] < 0.) {\n        s[l] = -s[l];\n        if (wantv)\n            dscal(p, -1., &v[l * v_dim1 + 1]);\n    }\n/*           order the singular value. */\n\n    while ((l != mm) && (s[l] < s[l + 1])) {\n        t = s[l];\n        s[l] = s[l + 1];\n        s[l + 1] = t;\n        if (wantv && l < *p)\n            dswap(*p, &v[l * v_dim1 + 1], &v[(l + 1) * v_dim1 + 1]);\n        if (wantu && l < *n)\n            dswap(*n, &u[l * u_dim1 + 1], &u[(l + 1) * u_dim1 + 1]);\n        ++l;\n    }\n    iter = 0;\n    --m;\nL610:\n    goto L360;\n} /* dsvdc_ */\n\n////// loessc.c\n#define\tGAUSSIAN\t1\n#define SYMMETRIC\t0\n\nstatic long\t*iv, liv, lv, tau;\nstatic double *v;\n\n/* begin ehg's FORTRAN-callable C-codes */\n\nstatic void loess_error(int i){ //used to be ehg182.\n  char *mess, mess2[50];\n    switch(i){\ncase 101: mess=\"d>dMAX in ehg131.  Need to recompile with increased dimensions.\"; break;\ncase 102: mess=\"liv too small.   (Discovered by lowesd)\"; break;\ncase 103: mess=\"lv too small.    (Discovered by lowesd)\"; break;\ncase 104: mess=\"span too small.  fewer data values than degrees of freedom.\"; break;\ncase 105: mess=\"k>d2MAX in ehg136.  Need to recompile with increased dimensions.\"; break;\ncase 106: mess=\"lwork too small\"; break;\ncase 110: mess=\"not enough extra workspace for robustness calculation\"; break;\ncase 120: mess=\"zero-width neighborhood. make span bigger\"; break;\ncase 121: mess=\"all data on boundary of neighborhood. make span bigger\"; break;\ncase 123: mess=\"ihat=1 (diag L) in l2fit only makes sense if z=x (eval=data).\"; break;\ncase 171: mess=\"lowesd must be called first.\"; break;\ncase 172: mess=\"lowesf must not come between lowesb and lowese, lowesr, or lowesl.\"; break;\ncase 173: mess=\"lowesb must come before lowese, lowesr, or lowesl.\"; break;\ncase 174: mess=\"lowesb need not be called twice.\"; break;\ncase 175: mess=\"need setLf=.true. for lowesl.\"; break;\ncase 180: mess=\"nv>nvmax in cpvert.\"; break;\ncase 182: mess=\"svddc failed in l2fit.\"; break;\ncase 185: mess=\"trouble descending to leaf in vleaf.\"; break;\ncase 186: mess=\"insufficient workspace for lowesf.\"; break;\ncase 187: mess=\"insufficient stack space\"; break;\ncase 193: mess=\"workspace in loread appears to be corrupted\"; break;\ncase 194: mess=\"trouble in l2fit/l2tr\"; break;\ncase 195: mess=\"only constant, linear, or quadratic local models allowed\"; break;\ncase 196: mess=\"degree must be at least 1 for vertex influence matrix\"; break;\ndefault: sprintf(mess=mess2,\"Assert failed; error code %d\\n\",i); break;\n    }\n    Apop_assert_n(0, \"%s\", mess);\n}\n\nstatic void ehg183_(char *s, integer *i, integer n, integer inc) {\n  char mess[4000], num[20];\n  strcpy(mess,s);\n  for (int j=0; j<n; j++) {\n    sprintf(num,\" %ld\",i[j * inc]);\n    strcat(mess,num);\n  }\n  Warning(mess);\n}\n\nstatic void ehg184_(char *s, double *x, integer n, integer inc) {\n  char mess[4000], num[30];\n  strcpy(mess,s);\n  for (int j=0; j< n; j++) {\n    sprintf(num,\" %.5g\",x[j * inc]);\n    strcat(mess,num);\n  }\n  Warning(mess);\n}\n\n////// loessf.f -- translated by f2c (version 20061008). \n\nstatic void dqrdc_(double *x, integer *ldx, integer *n, integer *p, double *qraux, \n        integer *jpvt, double *work, integer job) ;\nstatic double ehg128_(double *, integer *, integer *, integer *, integer *, \n        double *, integer *, integer *, integer *, double *, integer *, double *);\nstatic void ehg129_(integer *, integer *, integer *, double *, integer *, integer, double *),\nehg139_(double *, integer *, integer *, integer *, integer *, integer *, double *, double *,\n        integer *, integer *, double *, double *, double *, integer *, integer *,\n\t    double *, double *, double *, double *, integer *, double *, double *,\n        double *, integer *, integer *, integer *, double *, integer *, integer *, integer *,\n        integer *, double *, integer *, integer *, integer *, integer *, integer *, double *,\n        logical *, double *),\nehg197(integer deg, integer d__, double f, integer *dk, double *trl);\nstatic double ehg176_(double *);\n\nstatic integer ifloor(double x) {\n    integer ret_val = x;\n    if ((double) ret_val > x)\n        --ret_val;\n    return ret_val;\n}\n\nstatic void ehg126_(integer *d__, integer *n, integer *vc, double *x, double *v, integer *nvmax) {\n    integer v_dim1, x_dim1;\n    static integer execnt = 0, i__, j, k;\n    static double t, mu, beta, alpha, machin;\n\n    x_dim1 = *n;\n    x -= 1 + x_dim1;\n    v_dim1 = *nvmax;\n    v -= 1 + v_dim1;\n\n    ++execnt;\n    if (execnt == 1)\n        machin = DBL_MAX;\n/*     fill in vertices for bounding box of $x$ */\n/*     lower left, upper right */\n    for (k = 1; k <= *d__; ++k) {\n        alpha = machin;\n        beta = -machin;\n        for (i__ = 1; i__ <= *n; ++i__) {\n            t = x[i__ + k * x_dim1];\n            alpha = min(alpha,t);\n            beta = max(beta,t);\n        }\n    /*        expand the box a little */\n        mu = .005 * GSL_MAX(beta - alpha,\n                            GSL_MAX(abs(alpha), abs(beta)) * 1e-10 + 1e-30);\n        alpha -= mu;\n        beta += mu;\n        v[k * v_dim1 + 1] = alpha;\n        v[*vc + k * v_dim1] = beta;\n    }\n/*     remaining vertices */\n    for (i__ = 2; i__ <= *vc - 1; ++i__) {\n        j = i__ - 1;\n        for (k = 1; k <= *d__; ++k) {\n            v[i__ + k * v_dim1] = v[j % 2 * (*vc - 1) + 1 + k * v_dim1];\n            j = (integer) ((double) j / 2.);\n        }\n    }\n} /* ehg126_ */\n\nstatic void ehg125_(integer *p, integer *nv, double *v, integer *vhit, integer nvmax, integer d__,\n       integer k, double *t, integer *r__, integer *s, integer *f, integer *l, integer *u) {\n\n    integer f_dim1, l_dim1, u_dim1, v_dim1;\n    static integer h__, i__, j, m, i3, mm, execnt = 0;\n    static logical match;\n\n    --vhit;\n    v_dim1 = nvmax;\n    v -= 1 + v_dim1;\n    u_dim1 = *r__;\n    u -= 1 + (u_dim1 << 1);\n    l_dim1 = *r__;\n    l -= 1 + (l_dim1 << 1);\n    f_dim1 = *r__;\n    f -= 1 + (f_dim1 << 1);\n\n    ++execnt;\n    h__ = *nv;\n    for (i__ = 1; i__ <= *r__; ++i__)\n        for (j = 1; j <= *s; ++j) {\n            ++h__;\n            for (i3 = 1; i3 <= d__; ++i3)\n                v[h__ + i3 * v_dim1] = v[f[i__ + (j << 1) * f_dim1] + i3 * v_dim1];\n            v[h__ + k * v_dim1] = *t;\n    /*           check for redundant vertex */\n            match = FALSE_;\n            m = 1;\n            while (!match && (m <=*nv)){\n                match = v[m + v_dim1] == v[h__ + v_dim1];\n                for (mm = 2 ;match && ( mm <= d__); mm++)\n                    match = v[m + mm * v_dim1] == v[h__ + mm * v_dim1];\n                ++m;\n                }\n            --m;\n            if (match)\n                --h__;\n            else {\n                m = h__;\n                if (vhit[1] >= 0)\n                    vhit[m] = *p;\n            }\n            l[i__ + (j << 1) * l_dim1] = f[i__ + (j << 1) * f_dim1];\n            l[i__ + ((j << 1) + 1) * l_dim1] = m;\n            u[i__ + (j << 1) * u_dim1] = m;\n            u[i__ + ((j << 1) + 1) * u_dim1] = f[i__ + ((j << 1) + 1) * f_dim1];\n        }\n    *nv = h__;\n    if (! (*nv <= nvmax))\n        loess_error(180);\n} /* ehg125_ */\n\nstatic void find_kth_smallest(integer il, integer ir, integer k, integer nk, double *p, integer *pi) {\n    //Formerly ehg106\n    integer p_dim1;\n    static integer execnt = 0, i__, j, l, r, ii;\n    static double t;\n\n    --pi;\n    p_dim1 = nk;\n    p -= 1 + p_dim1;\n\n    ++execnt;\n    /*     find the $k$-th smallest of $n$ elements */\n    /*     Floyd+Rivest, CACM Mar '75, Algorithm 489 */\n    l = il;\n    r = ir;\n    while (l < r) {\n        /*  to avoid recursion, sophisticated partition deleted */\n        /*  partition $x sub {l..r}$ about $t$ */\n        t = p[pi[k] * p_dim1 + 1];\n        i__ = l;\n        j = r;\n        ii = pi[l];\n        pi[l] = pi[k];\n        pi[k] = ii;\n        if (t < p[pi[r] * p_dim1 + 1]) {\n            ii = pi[l];\n            pi[l] = pi[r];\n            pi[r] = ii;\n        }\n        while (i__ < j) {\n            ii = pi[i__];\n            pi[i__] = pi[j];\n            pi[j] = ii;\n            ++i__;\n            --j;\n            while (p[pi[i__] * p_dim1 + 1] < t)\n                ++i__;\n            while (t < p[pi[j] * p_dim1 + 1])\n                --j;\n        }\n        if (p[pi[l] * p_dim1 + 1] == t) {\n            ii = pi[l];\n            pi[l] = pi[j];\n            pi[j] = ii;\n        } else {\n            ++j;\n            ii = pi[r];\n            pi[r] = pi[j];\n            pi[j] = ii;\n        }\n        if (j <= k)\n            l = j + 1;\n        if (k <= j)\n            r = j - 1;\n    }\n} /* ehg106_ */\n\nstatic integer idamax(integer n, double *dx, integer incx) {\n    /*     Finds the index of element having max. absolute value. */\n    /*     jack dongarra, linpack, 3/11/78. */\n    int ret_val = 1;\n    static integer i__, ix;\n    static double dmax__;\n    --dx;\n\n    if (n < 1)\n        return 0;\n    if (n == 1)\n        return 1;\n    if (incx != 1) { // code for increment not equal to 1\n        ix = 1;\n        dmax__ = abs(dx[1]);\n        ix += incx;\n        for (i__ = 2; i__ <= n; ++i__, ix += incx)\n            if ( abs(dx[ix]) > dmax__){\n                ret_val = i__;\n                dmax__ = abs(dx[ix]);\n            }\n    } else { // code for increment equal to 1\n        dmax__ = abs(dx[1]);\n        for (i__ = 2; i__ <= n; ++i__)\n            if ( abs(dx[i__]) > dmax__){\n                ret_val = i__;\n                dmax__ = abs(dx[i__]);\n            }\n    }\n    return ret_val;\n} /* idamax */\n\nstatic void ehg124_(integer *ll, integer *uu, integer d__, integer n, integer *nv, integer *nc,\n        integer *ncmax, integer *vc, double * x, integer *pi, integer *a, double *xi,\n        integer *lo, integer *hi, integer *c__, double *v, integer *vhit, integer nvmax, integer *\n        fc, double *fd, integer *dd) {\n    integer c_dim1, v_dim1, v_offset, x_dim1, x_offset, i__1, i__3;\n    static integer execnt = 0, k, l, m, p, u, i4, check, lower, upper, inorm2, offset;\n    static logical i1, i2, leaf;\n    static double diag[8], diam, sigma[8];\n\n    --pi; --hi; --lo; --xi; --a; --vhit;\n    x_dim1 = n;\n    x -= x_offset = 1 + x_dim1;\n    c_dim1 = *vc;\n    c__ -= 1 + c_dim1;\n    v_dim1 = nvmax;\n    v -= v_offset = 1 + v_dim1;\n\n    ++execnt;\n    p = 1;\n    l = *ll;\n    u = *uu;\n    lo[p] = l;\n    hi[p] = u;\n    while (p <= *nc){\n        for (i4 = 1; i4 <= *dd; ++i4)\n            diag[i4 - 1] = v[c__[*vc + p * c_dim1] + i4 * v_dim1] - v[c__[p * c_dim1 + 1] + i4 * v_dim1];\n        diam = 0.;\n        for (inorm2 = 1; inorm2 <= *dd; ++inorm2)\n            diam += gsl_pow_2(diag[inorm2 - 1]);\n        diam = sqrt(diam);\n        i1 = (u - l + 1 <= *fc)\n             ? TRUE_\n             : (diam <= *fd);\n        if (i1)\n            leaf = TRUE_;\n        else {\n            if (*ncmax < *nc + 2)\n                i2 = TRUE_;\n            else\n                i2 = (double) (nvmax) < *nv + (double) (*vc) / 2.;\n            leaf = i2;\n        }\n        if (! leaf) {\n            ehg129_(&l, &u, dd, &x[x_offset], &pi[1], n, sigma);\n            k = idamax(*dd, sigma, 1);\n            m = (integer) ((double) (l + u) / 2.);\n            find_kth_smallest(l, u, m, 1, &x[k * x_dim1 + 1], &pi[1]);\n            /* bug fix from btyner@gmail.com 2006-07-20 */\n            offset = 0;\n            while (! (m + offset >= u || m + offset < l)) {\n                if (offset < 0) {\n                    lower = l;\n                    check = m + offset;\n                    upper = check;\n                } else {\n                    lower = m + offset + 1;\n                    check = lower;\n                    upper = u;\n                }\n                find_kth_smallest(lower, upper, check, 1, &x[k * x_dim1 + 1], &pi[1]);\n                if (x[pi[m + offset] + k * x_dim1] == x[pi[m + offset + 1] + k * x_dim1]) {\n                    offset = -offset;\n                    if (offset >= 0)\n                        ++offset;\n                } else {\n                    m += offset;\n                    break;\n                }\n            }\n            if (v[c__[p * c_dim1 + 1] + k * v_dim1] == x[pi[m] + k * x_dim1])\n                leaf = TRUE_;\n            else\n                leaf = v[c__[*vc + p * c_dim1] + k * v_dim1] == x[pi[m] + k * x_dim1];\n        }\n        if (leaf)\n            a[p] = 0;\n        else {\n            a[p] = k;\n            xi[p] = x[pi[m] + k * x_dim1];\n            /*           left son */\n            ++(*nc);\n            lo[p] = *nc;\n            lo[*nc] = l;\n            hi[*nc] = m;\n            /*           right son */\n            ++(*nc);\n            hi[p] = *nc;\n            lo[*nc] = m + 1;\n            hi[*nc] = u;\n            i__1 = pow_ii(2, k - 1);\n            i__3 = pow_ii(2, d__ - k);\n            ehg125_(&p, nv, &v[v_offset], &vhit[1], nvmax, d__, k, &xi[p], &i__1,\n                 &i__3, &c__[p * c_dim1 + 1], &c__[lo[p] * c_dim1 + 1], &c__[hi[p] * c_dim1 + 1]);\n        }\n        ++p;\n        l = lo[p];\n        u = hi[p];\n    }\n} /* ehg124_ */\n\nstatic void ehg127_(double *q, integer *n, integer *d__, integer *nf, double *f, double *x,\n        integer *psi, double *y, double *rw, integer *kernel, integer *k, double *dist,\n        double *eta, double *b, integer *od, double *w, double *rcond, integer *sing,\n        double *sigma, double *u, double *e, double *dgamma, double *qraux,\n        double * work, double *tol, integer *dd, integer *tdeg, integer *cdeg, double *s) {\n\n    integer b_dim1, x_dim1, b_offset;\n    double d__1;\n    static integer execnt = 0, i__, j, i3, i9, jj, info, jpvt, inorm2, column;\n    static double g[15], i2, rho, scal, machep, colnor[15];\n\n    --rw; --y; --psi;\n    x_dim1 = *n;\n    x -= 1 + x_dim1;\n    --q;\n    --w;\n    --eta;\n    b_dim1 = *nf;\n    b -= b_offset = 1+b_dim1;\n    --sigma;\n    u -= 16;\n    e -= 16;\n    --dgamma; --qraux; --work; --cdeg;\n\n    ++execnt;\n    if (execnt == 1)\n        machep = DBL_EPSILON;\n    /*     sort by distance */\n    for (i3 = 1; i3 <= *n; ++i3)\n        dist[i3] = 0.;\n    for (j = 1; j <= *dd; ++j)\n        for (i3 = 1; i3 <= *n; ++i3)\n            dist[i3] += gsl_pow_2(x[i3 + j * x_dim1] - q[j]);\n    find_kth_smallest(1, *n, *nf, 1, &dist[1], &psi[1]);\n    rho = dist[psi[*nf]] * max(1.,*f);\n    if (! (0. < rho))\n        loess_error(120);\n    /*     compute neighborhood weights */\n    if (*kernel == 2) {\n        for (i__ = 1; i__ <= *nf; ++i__)\n            w[i__] = (dist[psi[i__]] < rho)\n                    ? sqrt(rw[psi[i__]])\n                    : 0.;\n    } else {\n        for (i3 = 1; i3 <= *nf; ++i3)\n            w[i3] = sqrt(dist[psi[i3]] / rho);\n        for (i3 = 1; i3 <= *nf; ++i3)\n            w[i3] = sqrt(rw[psi[i3]] * gsl_pow_3(1 - gsl_pow_3(w[i3])) );\n    }\n    if (abs(w[idamax(*nf, &w[1], 1)]) == 0.) { //why |x|==0, and not just x == 0 ?\n        ehg184_(\"at \", &q[1], *dd, 1);\n        ehg184_(\"radius \", &rho, 1,1);\n        loess_error(121);\n    }\n    /*     fill design matrix */\n    column = 1;\n    for (i3 = 1; i3 <= *nf; ++i3)\n        b[i3 + column * b_dim1] = w[i3];\n    if (*tdeg >= 1)\n        for (j = 1; j <= *d__; ++j)\n            if (cdeg[j] >= 1) {\n                ++column;\n                for (i3 = 1; i3 <= *nf; ++i3)\n                    b[i3 + column * b_dim1] = w[i3] * (x[psi[i3] + j * x_dim1] - q[j]);\n            }\n    if (*tdeg >= 2) {\n        for (j = 1; j <= *d__; ++j)\n            if (cdeg[j] >= 1) {\n                if (cdeg[j] >= 2) {\n                    ++column;\n                    for (i3 = 1; i3 <= *nf; ++i3)\n                        b[i3 + column * b_dim1] = w[i3] * gsl_pow_2(x[psi[i3] + j * x_dim1] - q[j]);\n                }\n                for (jj = j + 1; jj <= *d__; ++jj)\n                    if (cdeg[jj] >= 1) {\n                        ++column;\n                        for (i3 = 1; i3 <= *nf; ++i3)\n                            b[i3 + column * b_dim1] = w[i3] * (x[psi[i3] + j * x_dim1] - q[j])\n                                                        * (x[psi[i3] + jj * x_dim1] - q[jj]);\n                    }\n            }\n        *k = column;\n    }\n    for (i3 = 1; i3 <= *nf; ++i3)\n        eta[i3] = w[i3] * y[psi[i3]];\n    /*     equilibrate columns */\n    for (j = 1; j <= *k; ++j) {\n        scal = 0.;\n        for (inorm2 = 1; inorm2 <= *nf; ++inorm2)\n            scal += gsl_pow_2(b[inorm2 + j * b_dim1]);\n        scal = sqrt(scal);\n        if (0. < scal) {\n            for (i3 = 1; i3 <= *nf; ++i3)\n                b[i3 + j * b_dim1] /= scal;\n            colnor[j - 1] = scal;\n        } else\n            colnor[j - 1] = 1.;\n    }\n/*     singular value decomposition */\n    dqrdc_(&b[b_offset], nf, nf, k, &qraux[1], &jpvt, &work[1], 0);\n    dqrsl_(&b[b_offset], nf, nf, k, &qraux[1], &eta[1], &work[1], &eta[1], & eta[1], &work[1], &work[1], 1000, &info);\n    for (i9 = 1; i9 <= *k; ++i9)\n        for (i3 = 1; i3 <= *k; ++i3)\n            u[i3 + i9 * 15] = 0.;\n    for (i__ = 1; i__ <= *k; ++i__)\n        for (j = i__; j <= *k; ++j)\n            u[i__ + j * 15] = b[i__ + j * b_dim1];\n    dsvdc_(&u[16], &c__15, k, k, &sigma[1], g, &u[16], &c__15, &e[16], &c__15, &work[1], &c__21, &info);\n    if (info != 0)\n        loess_error(182);\n    *tol = sigma[1] * (machep * 100);\n    *rcond = GSL_MIN(*rcond, sigma[*k]/sigma[1]);\n    if (sigma[*k] <= *tol) {\n        ++(*sing);\n        if (*sing == 1) {\n            ehg184_(\"Warning. pseudoinverse used at\", &q[1], *d__, 1);\n            d__1 = sqrt(rho);\n            ehg184_(\"neighborhood radius\", &d__1, 1, 1);\n            ehg184_(\"reciprocal condition number \", rcond, 1, 1);\n        } else if (*sing == 2)\n            ehg184_(\"There are other near singularities as well.\", &rho, 1, 1);\n    }\n/*     compensate for equilibration */\n    for (j = 1; j <= *k; ++j) \n        for (i3 = 1; i3 <= *k; ++i3)\n            e[j + i3 * 15] /= colnor[j - 1];\n/*     solve least squares problem */\n    for (j = 1; j <= *k; ++j) {\n        if (*tol < sigma[j])\n            i2 = ddot_(k, &u[j * 15 + 1], &c__1, &eta[1], &c__1) / sigma[j];\n        else\n            i2 = 0.;\n        dgamma[j] = i2;\n    }\n    for (j = 0; j <= *od; ++j)  //bug fix 2006-07-04 for k=1, od>1.   (thanks btyner@gmail.com) */\n        if (j < *k)\n            s[j] = ddot_(k, &e[j + 16], &c__15, &dgamma[1], &c__1);\n        else\n            s[j] = 0.;\n} /* ehg127_ */\n\nstatic void ehg129_(integer *l, integer *u, integer *d__, double *x, integer *pi, integer n, double *sigma) {\n    integer x_dim1;\n    static integer execnt = 0;\n    static double t, beta, alpha, machin;\n    --sigma;\n    --pi;\n    x_dim1 = n;\n    x -= 1 + x_dim1;\n    ++execnt;\n    if (execnt == 1)\n        machin = DBL_MAX;\n    for (integer k = 1; k <= *d__; ++k) {\n        alpha = machin;\n        beta = -machin;\n        for (integer i__ = *l; i__ <= *u; ++i__) {\n            t = x[pi[i__] + k * x_dim1];\n            alpha = GSL_MIN(alpha, x[pi[i__] + k * x_dim1]);\n            beta = max(beta,t);\n        }\n        sigma[k] = beta - alpha;\n    }\n} /* ehg129_ */\n\nstatic void ehg131_(double *x, double *y, double *rw, double *trl, \n        double *diagl, integer *kernel, integer *k, integer *n, integer *d__, \n        integer *nc, integer *ncmax, integer *vc, integer *nv, integer *nvmax, \n        integer *nf, double *f, integer *a, integer *c__, integer *hi, integer *lo, \n        integer *pi, integer *psi, double *v, integer *vhit, double *vval, double *xi,\n        double *dist, double *eta, double *b, integer *ntol, double *fd, \n        double *w, double *vval2, double *rcond, integer *sing, integer *dd, \n        integer *tdeg, integer *cdeg, integer *lq, double *lf, logical *setlf) {\n\n    integer lq_dim1, lq_offset, c_dim1, c_offset, lf_dim1, lf_dim2, lf_offset,\n\t     v_dim1, v_offset, vval_dim1, vval_offset, vval2_dim1, vval2_offset, x_dim1, x_offset;\n\n    static integer execnt = 0, j, i1, i2;\n    static double delta[8];\n    static integer identi;\n\n    --psi; --pi; \n    x_dim1 = *n;\n    x -= x_offset = 1 + x_dim1;\n    --xi;\n    --lo;\n    --hi;\n    --a;\n    c_dim1 = *vc;\n    c__ -= c_offset = 1 + c_dim1;\n    vval2_dim1 = *d__ - 0 + 1;\n    vval2 -= vval2_offset = 0 + vval2_dim1;\n    vval_dim1 = *d__ - 0 + 1;\n    vval -= vval_offset = 0 + vval_dim1;\n    --vhit;\n    v_dim1 = *nvmax;\n    v -= v_offset = 1 + v_dim1;\n    lf_dim1 = *d__ - 0 + 1;\n    lf_dim2 = *nvmax;\n    lf -= lf_offset = 0 + lf_dim1 * (1 + lf_dim2);\n    lq_dim1 = *nvmax;\n    lq -= lq_offset = 1 + lq_dim1;\n    --w; --eta; --b; --cdeg;\n\n    ++execnt;\n    if (! (*d__ <= 8))\n        loess_error(101);\n/*     build $k$-d tree */\n    ehg126_(d__, n, vc, &x[x_offset], &v[v_offset], nvmax);\n    *nv = *vc;\n    *nc = 1;\n    for (j = 1; j <= *vc; ++j) {\n        c__[j + *nc * c_dim1] = j;\n        vhit[j] = 0;\n    }\n    for (i1 = 1; i1 <= *d__; ++i1)\n        delta[i1 - 1] = v[*vc + i1 * v_dim1] - v[i1 * v_dim1 + 1];\n    *fd *= dnrm2(d__, delta);\n    for (identi = 1; identi <= *n; ++identi)\n        pi[identi] = identi;\n    ehg124_(&c__1, n, *d__, *n, nv, nc, ncmax, vc, &x[x_offset], &pi[1], &a[1],\n\t    &xi[1], &lo[1], &hi[1], &c__[c_offset], &v[v_offset], &vhit[1], *nvmax, ntol, fd, dd);\n    // Smooth\n    if (*trl != 0.)\n        for (i2 = 1; i2 <= *nv; ++i2)\n            for (i1 = 0; i1 <= *d__; ++i1)\n                vval2[i1 + i2 * vval2_dim1] = 0.;\n    ehg139_(&v[v_offset], nvmax, nv, n, d__, nf, f, &x[x_offset], &pi[1], &psi[1], y,\n            rw, trl, kernel, k, dist, dist, &eta[1], &b[1], d__, &w[1], diagl,\n            &vval2[vval2_offset], nc, vc, &a[1], &xi[1], &lo[1], &hi[1], &c__[c_offset], &vhit[1], rcond, sing,\n            dd, tdeg, &cdeg[1], &lq[lq_offset], &lf[lf_offset], setlf, &vval[vval_offset]);\n} /* ehg131_ */\n\nstatic void ehg133_(integer *n, integer *d__, integer *vc, integer *nvmax, integer *nc, \n        integer *ncmax, integer *a, integer *c__, integer *hi, integer *lo, double *v, \n        double *vval, double *xi, integer m, double *z__, double *s) {\n    integer c_dim1, c_offset, v_dim1, v_offset, vval_dim1, vval_offset, z_dim1, z_offset;\n\n    static integer execnt = 0, i__, i1;\n    static double delta[8];\n\n    vval_dim1 = *d__ - 0 + 1;\n    vval -= vval_offset = 0 + vval_dim1;\n    --s;\n    v_dim1 = *nvmax;\n    v -= v_offset = 1 + v_dim1;\n    c_dim1 = *vc;\n    c__ -= c_offset = 1 + c_dim1;\n    z_dim1 = m;\n    z__ -= z_offset = 1 + z_dim1;\n\n    ++execnt;\n    for (i__ = 1; i__ <= m; ++i__) {\n        for (i1 = 1; i1 <= *d__; ++i1)\n            delta[i1 - 1] = z__[i__ + i1 * z_dim1];\n        s[i__] = ehg128_(delta, d__, ncmax, vc, a, xi, lo, hi,\n             &c__[c_offset], &v[v_offset], nvmax, &vval[vval_offset]);\n    }\n}\n\nstatic void set_cs(integer d, integer i, double *c1, double *c2, double *c3){\n    static double c[48] = { .297162,.380266,.5886043,.4263766,.3346498,\n\t    .6271053,.5241198,.3484836,.6687687,.6338795,.4076457,.7207693,\n\t    .1611761,.3091323,.4401023,.2939609,.3580278,.5555741,.397239,\n\t    .4171278,.6293196,.4675173,.469907,.6674802,.2848308,.2254512,\n\t    .2914126,.5393624,.251723,.389897,.7603231,.2969113,.474013,\n\t    .9664956,.3629838,.5348889,.207567,.2822574,.2369957,.3911566,\n\t    .2981154,.3623232,.5508869,.3501989,.4371032,.7002667,.4291632,\n\t    .493037 };\n    if (d <= 4) {\n        *c1 = c[i - 1];\n        *c2 = c[i];\n        *c3 = c[i + 1];\n    } else {\n        *c1 = c[i - 1] + (d - 4) * (c[i - 1] - c[i - 4]);\n        *c2 = c[i]     + (d - 4) * (c[i]     - c[i - 3]);\n        *c3 = c[i + 1] + (d - 4) * (c[i + 1] - c[i - 2]);\n    }\n}\n\nstatic void ehg141_(double *trl, integer *n, integer *deg, integer *k, integer *d,\n        integer *nsing, integer *dk, double * delta1, double *delta2) {\n\n    static integer i;\n    static double z, c1, c2, c3, c4, corx;\n\n/*     coef, d, deg, del */\n    if (*deg == 0)\n        *dk = 1;\n    if (*deg == 1)\n        *dk = *d + 1;\n    if (*deg == 2)\n        *dk = ((double) ((*d + 2) * (*d + 1)) / 2.);\n    corx = sqrt(*k / (double) (*n));\n    z = (sqrt(*k / *trl) - corx) / (1 - corx);\n    if ((*nsing == 0 && 1. < z) ||(z < 0.) )\n        ehg184_(\"Chernobyl! trL<k\", trl, 1, 1);\n    z = GSL_MIN(1.,  GSL_MAX(0.,z));\n    c4 = exp(ehg176_(&z));\n    i = (min(*d,4) - 1 + ((*deg - 1) << 2)) * 3 + 1;\n    set_cs(*d, i, &c1, &c2, &c3);\n    *delta1 = *n - *trl * exp(c1 * pow(z, c2) * pow(1-z, c3) * c4);\n    i += 24;\n    set_cs(*d, i, &c1, &c2, &c3);\n    *delta2 = *n - *trl * exp(c1 * pow(z, c2) * pow(1-z, c3) * c4);\n} /* ehg141_ */\n\nstatic void lowesc_(integer *n, double *l, double *ll, double *trl, double *delta1, double *delta2) {\n    static integer execnt = 0, i__, j;\n    integer l_dim1, ll_dim1;\n\n    ll_dim1 = *n;\n    ll -= 1 + ll_dim1;\n    l_dim1 = *n;\n    l -= 1 + l_dim1;\n\n    ++execnt;\n/*     compute $LL~=~(I-L)(I-L)'$ */\n    for (i__ = 1; i__ <= *n; ++i__)\n        --l[i__ + i__ * l_dim1];\n    for (i__ = 1; i__ <= *n; ++i__)\n        for (j = 1; j <= i__; ++j)\n            ll[i__ + j * ll_dim1] = ddot_(n, &l[i__ + l_dim1], n, &l[j + l_dim1], n);\n    for (i__ = 1; i__ <= *n; ++i__)\n        for (j = i__ + 1; j <= *n; ++j)\n            ll[i__ + j * ll_dim1] = ll[j + i__ * ll_dim1];\n    for (i__ = 1; i__ <= *n; ++i__)\n        ++l[i__ + i__ * l_dim1];\n/*     accumulate first two traces */\n    *trl = 0.;\n    *delta1 = 0.;\n    for (i__ = 1; i__ <= *n; ++i__) {\n        *trl += l[i__ + i__ * l_dim1];\n        *delta1 += ll[i__ + i__ * ll_dim1];\n    }\n/*     $delta sub 2 = \"tr\" LL sup 2$ */\n    *delta2 = 0.;\n    for (i__ = 1; i__ <= *n; ++i__)\n        *delta2 += ddot_(n, &ll[i__ + ll_dim1], n, &ll[i__ * ll_dim1 + 1], & c__1);\n} /* lowesc_ */\n\nstatic void ehg169_(integer d__, integer *vc, integer *nc, integer *ncmax, integer *nv, \n        integer nvmax, double *v, integer *a, double *xi, integer *c__, integer *hi, integer *lo) {\n    integer c_dim1, v_dim1, v_offset, i__1, i__3;\n    static integer execnt = 0, i__, j, k, p, mc, mv, novhit[1];\n\n    --lo;\n    --hi;\n    c_dim1 = *vc;\n    c__ -= 1 + c_dim1;\n    --xi;\n    --a;\n    v_dim1 = nvmax;\n    v -= v_offset = 1 + v_dim1;\n\n    ++execnt;\n    /*     as in bbox */\n    /*     remaining vertices */\n    for (i__ = 2; i__ <= *vc - 1; ++i__) {\n        j = i__ - 1;\n        for (k = 1; k <= d__; ++k) {\n            v[i__ + k * v_dim1] = v[j % 2 * (*vc - 1) + 1 + k * v_dim1];\n            j = ifloor((double)j / 2.);\n        }\n    }\n    /*     as in ehg131 */\n    mc = 1;\n    mv = *vc;\n    novhit[0] = -1;\n    for (j = 1; j <= *vc; ++j)\n        c__[j + mc * c_dim1] = j;\n    /*     as in rbuild */\n    for (p=1; p <= *nc; p++)\n        if (a[p] != 0) {\n            k = a[p];\n            // left son\n            ++mc;\n            lo[p] = mc;\n            // right son\n            ++mc;\n            hi[p] = mc;\n            i__1 = pow_ii(2, k-1);\n            i__3 = pow_ii(2, d__ - k);\n            ehg125_(&p, &mv, &v[v_offset], novhit, nvmax, d__, k, &xi[p], &i__1,\n                &i__3, &c__[p * c_dim1 + 1], &c__[lo[p] * c_dim1 + 1], &c__[hi[p] * c_dim1 + 1]);\n        }\n    if (! (mc == *nc) || ! (mv == *nv))\n        loess_error(193);\n}\n\nstatic double ehg176_(double *z) {\n    static integer d__ = 1;\n    static integer vc = 2;\n    static integer nv = 10;\n    static integer nc = 17;\n    static integer a[17] = { 1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,0,0 };\n    static struct {\n        integer e_1[7];\n        integer fill_2[7];\n        integer e_3;\n        integer fill_4[2];\n\t} equiv_94 = { {3, 5, 7, 9, 11, 13, 15}, {0}, 17 };//seven numbers, seven zeros, a number.\n\n#define hi ((integer *)&equiv_94)\n\n    static struct {\n        integer e_1[7];\n        integer fill_2[7];\n        integer e_3;\n        integer fill_4[2];\n\t} equiv_95 = { {2, 4, 6, 8, 10, 12, 14}, {0}, 16 };\n\n#define lo ((integer *)&equiv_95)\n\n    static struct {\n        double e_1[7];\n        double fill_2[7];\n        double e_3;\n        double fill_4[2];\n\t} equiv_96 = { {.3705, .2017, .5591, .1204, .2815, .4536, .7132}, {0}, .8751 };\n\n#define xi ((double *)&equiv_96)\n\n    static integer c__[34]\t/* was [2][17] */ = { 1,2,1,3,3,2,1,4,4,3,3,5,\n\t    5,2,1,6,6,4,4,7,7,3,3,8,8,5,5,9,9,2,9,10,10,2 };\n    static double vval[20]\t/* was [2][10] */ = { -.090572,4.4844,\n\t    -.010856,-.7736,-.053718,-.3495,.026152,-.7286,-.058387,.1611,\n\t    .095807,-.7978,-.031926,-.4457,-.06417,.032813,-.020636,.335,\n\t    .040172,-.041032 };\n    static double v[10]\t/* was [10][1] */ = { -.005,1.005,.3705,.2017,\n\t    .5591,.1204,.2815,.4536,.7132,.8751 };\n\n    return ehg128_(z, &d__, &nc, &vc, a, xi, lo, hi, c__, v, &nv, vval);\n}\n\n#undef xi\n#undef lo\n#undef hi\n\nstatic void lowesa_(double *trl, integer *n, integer *d__,\n            integer *tau, integer *nsing, double *delta1, double *delta2) {\n    static integer execnt = 0, dka, dkb;\n    static double d1a, d1b, d2a, d2b, alpha;\n\n    ++execnt;\n    ehg141_(trl, n, &c__1, tau, d__, nsing, &dka, &d1a, &d2a);\n    ehg141_(trl, n, &c__2, tau, d__, nsing, &dkb, &d1b, &d2b);\n    alpha = (double) (*tau - dka) / (double) (dkb - dka);\n    *delta1 = (1 - alpha) * d1a + alpha * d1b;\n    *delta2 = (1 - alpha) * d2a + alpha * d2b;\n} /* lowesa_ */\n\nstatic void ehg191_(integer *m, double *z__, double *l, integer *d__, integer *n, integer *nf,\n        integer *nv, integer *ncmax, integer *vc, integer *a, double *xi, integer *lo, integer *hi,\n        integer *c__, double *v, integer *nvmax, double *vval2, double *lf, integer *lq) {\n\n    integer lq_dim1, c_offset, l_dim1, lf_dim1, lf_dim2, v_offset, vval2_dim1, vval2_offset, z_dim1;\n\n    static integer execnt = 0, i__, j, p, i1, i2, lq1;\n    static double zi[8];\n    z_dim1 = *m;\n    z__ -= 1 + z_dim1;\n    l_dim1 = *m;\n    l -=  1 + l_dim1;\n    --hi;\n    --lo;\n    --xi;\n    --a;\n    c__ -= c_offset = 1 + *vc;\n    lq_dim1 = *nvmax;\n    lq -= 1 + lq_dim1;\n    lf_dim1 = *d__ - 0 + 1;\n    lf_dim2 = *nvmax;\n    lf -= 0 + lf_dim1 * (1 + lf_dim2);\n    vval2_dim1 = *d__ - 0 + 1;\n    vval2 -= vval2_offset = 0 + vval2_dim1;\n    v -= v_offset = 1 + *nvmax;\n\n    ++execnt;\n    for (j = 1; j <= *n; ++j) {\n        for (i2 = 1; i2 <= *nv; ++i2)\n            for (i1 = 0; i1 <= *d__; ++i1)\n                vval2[i1 + i2 * vval2_dim1] = 0.;\n        for (i__ = 1; i__ <= *nv; ++i__) { // linear search for i in Lq\n            lq1 = lq[i__ + lq_dim1];\n            lq[i__ + lq_dim1] = j;\n            p = *nf;\n            while (lq[i__ + p * lq_dim1] != j)\n                --p;\n            lq[i__ + lq_dim1] = lq1;\n            if (lq[i__ + p * lq_dim1] == j) //BK: doesn't this always hit?  \n                for (i1 = 0; i1 <= *d__; ++i1)\n                    vval2[i1 + i__ * vval2_dim1] = lf[i1 + (i__ + p * lf_dim2) * lf_dim1];\n        }\n        for (i__ = 1; i__ <= *m; ++i__) {\n            for (i1 = 1; i1 <= *d__; ++i1)\n                zi[i1 - 1] = z__[i__ + i1 * z_dim1];\n            l[i__ + j * l_dim1] = ehg128_(zi, d__, ncmax, vc, &a[1], &xi[1], &\n                lo[1], &hi[1], &c__[c_offset], &v[v_offset], nvmax, & vval2[vval2_offset]);\n        }\n    }\n} /* ehg191_ */\n\nstatic void ehg196_(integer tau, integer d__, double f, double *trl) {\n    static integer execnt = 0, dka, dkb;\n    static double trla, trlb, alpha;\n\n    ++execnt;\n    ehg197(1, d__, f, &dka, &trla);\n    ehg197(2, d__, f, &dkb, &trlb);\n    alpha = (double) (tau - dka) / (double) (dkb - dka);\n    *trl = (1 - alpha) * trla + alpha * trlb;\n}\n\nstatic void ehg197(integer deg, integer d__, double f, integer *dk, double *trl) {\n    *dk = 0;\n    if (deg == 1)\n        *dk = d__ + 1;\n    if (deg == 2)\n        *dk = (integer) ((double) ((d__ + 2) * (d__ + 1)) / 2.);\n    float g1 = (d__ * -.08125 + .13) * d__ + 1.05;\n    *trl = *dk * (GSL_MAX( 0., (g1 - f) / f) + 1);\n}\n\nvoid hermite_prep(double h, double *phi0, double *phi1, double *psi0, double *psi1){\n    *phi0 = (1-h)*(1-h)* (h * 2 + 1);\n    *phi1 = h *h * (3 - h * 2);\n    *psi0 = h * (1-h)*(1-h);\n    *psi1 = h * h * (h - 1);\n}\n\nint xibar_search(const integer *a, const integer t[], const double* xi, const double xibar, integer nt){\n    int m = nt -1;\n    int done;\n    while (1){\n        if (m == 0)\n            done = TRUE_;\n        else {\n            if (a[t[m - 1]] == 2)\n                done = xi[t[m - 1]] == xibar;\n            else\n                done = FALSE_;\n        }\n        if (done)\n            return m;\n        --m;\n    }\n}\n\nstatic double ehg128_(double *z__, integer *d__, integer *ncmax, integer *vc,\n        integer *a, double *xi, integer *lo, integer *hi, integer *c__,\n        double *v, integer *nvmax, double *vval) {\n    integer c_dim1, v_dim1, vval_dim1;\n    static double g[2304]\t/* was [9][256] */, h__;\n    static logical i2;\n    static integer execnt = 0, t[20], i__, j, m, i1, i11, i12, ig, ii, lg, ll, nt, ur;\n    static double g0[9], g1[9], s, v0, v1, ge, gn, gs, gw;\n    static double gpe, gpn, gps, gpw, sew, sns, phi0, phi1, psi0, psi1, xibar;\n\n    --z__; --hi; --lo; --xi; --a;\n    c_dim1 = *vc;\n    c__ -= 1 + c_dim1;\n    vval_dim1 = *d__ - 0 + 1;\n    vval -= 0 + vval_dim1;\n    v_dim1 = *nvmax;\n    v -= 1 + v_dim1;\n\n    ++execnt;\n    /*     locate enclosing cell */\n    nt = 1;\n    t[nt - 1] = 1;\n    j = 1;\n    while (a[j] != 0) {\n        ++nt;\n        /*     bug fix 2006-07-18 (thanks, btyner@gmail.com) */\n        if (z__[a[j]] <= xi[j])\n            i1 = lo[j];\n        else\n            i1 = hi[j];\n        t[nt - 1] = i1;\n        Apop_assert(nt < 20, \"nt>=20 in eval.\");\n        j = t[nt - 1];\n    }\n    /*     tensor */\n    for (i12 = 1; i12 <= *vc; ++i12)\n        for (i11 = 0; i11 <= *d__; ++i11)\n                g[i11 + i12 * 9 - 9] = vval[i11 + c__[i12 + j * c_dim1] * vval_dim1];\n    lg = *vc;\n    ll = c__[j * c_dim1 + 1];\n    ur = c__[*vc + j * c_dim1];\n    for (i__ = *d__; i__ >= 1; --i__) {\n        h__ = (z__[i__] - v[ll + i__ * v_dim1]) / (v[ur + i__ * v_dim1] - v[ ll + i__ * v_dim1]);\n        if (h__ < -.001) {\n            ehg184_(\"eval \", &z__[1], *d__, 1);\n            ehg184_(\"lowerlimit \", &v[ll + v_dim1], *d__, *nvmax);\n        } else if (1.001 < h__) {\n                ehg184_(\"eval \", &z__[1], *d__, 1);\n                ehg184_(\"upperlimit \", &v[ur + v_dim1], *d__, *nvmax);\n            }\n        if (-.001 <= h__)\n            i2 = h__ <= 1.001;\n        else\n            i2 = FALSE_;\n        Apop_assert(i2, \"extrapolation not allowed with blending.\");\n        lg = (integer) ((double) lg / 2.);\n        for (ig = 1; ig <= lg; ++ig) {\n            // Hermite basis\n            hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n            g[ig * 9 - 9] = phi0 * g[ig * 9 - 9] + phi1 * g[(ig + lg) * 9 - 9]\n                 + (psi0 * g[i__ + ig * 9 - 9] + psi1 * g[i__ + (ig + lg)\n                * 9 - 9]) * (v[ur + i__ * v_dim1] - v[ll + i__ * v_dim1]);\n            for (ii = 1; ii <= i__ - 1; ++ii)\n                g[ii + ig * 9 - 9] = phi0 * g[ii + ig * 9 - 9] + phi1 * g[ii + (ig + lg) * 9 - 9];\n        }\n    }\n    s = g[0];\n/*     blending */\n    if (*d__ == 2) {\n    /*        ----- North ----- */\n        v0 = v[ll + v_dim1];\n        v1 = v[ur + v_dim1];\n        for (i11 = 0; i11 <= *d__; ++i11) \n            g0[i11] = vval[i11 + c__[j * c_dim1 + 3] * vval_dim1];\n        for (i11 = 0; i11 <= *d__; ++i11) \n            g1[i11] = vval[i11 + c__[j * c_dim1 + 4] * vval_dim1];\n        xibar = v[ur + (v_dim1 << 1)];\n        m= xibar_search(a, t, xi, xibar, nt);\n        if (m >= 1) {\n            m = hi[t[m - 1]];\n            while (a[m] != 0)\n                if (z__[a[m]] <= xi[m])\n                    m = lo[m];\n                else\n                    m = hi[m];\n            if (v0 < v[c__[m * c_dim1 + 1] + v_dim1]) {\n                v0 = v[c__[m * c_dim1 + 1] + v_dim1];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g0[i11] = vval[i11 + c__[m * c_dim1 + 1] * vval_dim1];\n            }\n            if (v[c__[m * c_dim1 + 2] + v_dim1] < v1) {\n                v1 = v[c__[m * c_dim1 + 2] + v_dim1];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g1[i11] = vval[i11 + c__[m * c_dim1 + 2] * vval_dim1];\n            }\n        }\n        h__ = (z__[1] - v0) / (v1 - v0);\n    /*        Hermite basis */\n        hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n        gn = phi0 * g0[0] + phi1 * g1[0] + (psi0 * g0[1] + psi1 * g1[1]) * (v1 - v0);\n        gpn = phi0 * g0[2] + phi1 * g1[2];\n    /*        ----- South ----- */\n        v0 = v[ll + v_dim1];\n        v1 = v[ur + v_dim1];\n        for (i11 = 0; i11 <= *d__; ++i11)\n            g0[i11] = vval[i11 + c__[j * c_dim1 + 1] * vval_dim1];\n        for (i11 = 0; i11 <= *d__; ++i11)\n            g1[i11] = vval[i11 + c__[j * c_dim1 + 2] * vval_dim1];\n        xibar = v[ll + (v_dim1 << 1)];\n        m= xibar_search(a, t,  xi, xibar, nt);\n        if (m >= 1) {\n            m = lo[t[m - 1]];\n            while  (a[m] != 0)\n                if (z__[a[m]] <= xi[m])\n                    m = lo[m];\n                else\n                    m = hi[m];\n            if (v0 < v[c__[m * c_dim1 + 3] + v_dim1]) {\n                v0 = v[c__[m * c_dim1 + 3] + v_dim1];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g0[i11] = vval[i11 + c__[m * c_dim1 + 3] * vval_dim1];\n            }\n            if (v[c__[m * c_dim1 + 4] + v_dim1] < v1) {\n                v1 = v[c__[m * c_dim1 + 4] + v_dim1];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g1[i11] = vval[i11 + c__[m * c_dim1 + 4] * vval_dim1];\n            }\n        }\n        h__ = (z__[1] - v0) / (v1 - v0);\n    /*        Hermite basis */\n        hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n        gs = phi0 * g0[0] + phi1 * g1[0] + (psi0 * g0[1] + psi1 * g1[1]) * (v1 - v0);\n        gps = phi0 * g0[2] + phi1 * g1[2];\n    /*        ----- East ----- */\n        v0 = v[ll + (v_dim1 << 1)];\n        v1 = v[ur + (v_dim1 << 1)];\n        for (i11 = 0; i11 <= *d__; ++i11)\n            g0[i11] = vval[i11 + c__[j * c_dim1 + 2] * vval_dim1];\n        for (i11 = 0; i11 <= *d__; ++i11)\n            g1[i11] = vval[i11 + c__[j * c_dim1 + 4] * vval_dim1];\n        xibar = v[ur + v_dim1];\n        m= xibar_search(a, t,  xi, xibar, nt);\n        if (m >= 1) {\n            m = hi[t[m - 1]];\n            while (a[m] != 0)\n                if (z__[a[m]] <= xi[m])\n                    m = lo[m];\n                else\n                    m = hi[m];\n            if (v0 < v[c__[m * c_dim1 + 1] + (v_dim1 << 1)]) {\n                v0 = v[c__[m * c_dim1 + 1] + (v_dim1 << 1)];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g0[i11] = vval[i11 + c__[m * c_dim1 + 1] * vval_dim1];\n            }\n            if (v[c__[m * c_dim1 + 3] + (v_dim1 << 1)] < v1) {\n                v1 = v[c__[m * c_dim1 + 3] + (v_dim1 << 1)];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g1[i11] = vval[i11 + c__[m * c_dim1 + 3] * vval_dim1];\n            }\n        }\n        h__ = (z__[2] - v0) / (v1 - v0);\n    /*        Hermite basis */\n        hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n        ge = phi0 * g0[0] + phi1 * g1[0] + (psi0 * g0[2] + psi1 * g1[2]) * ( v1 - v0);\n        gpe = phi0 * g0[1] + phi1 * g1[1];\n    /*        ----- West ----- */\n        v0 = v[ll + (v_dim1 << 1)];\n        v1 = v[ur + (v_dim1 << 1)];\n        for (i11 = 0; i11 <= *d__; ++i11)\n            g0[i11] = vval[i11 + c__[j * c_dim1 + 1] * vval_dim1];\n        for (i11 = 0; i11 <= *d__; ++i11)\n            g1[i11] = vval[i11 + c__[j * c_dim1 + 3] * vval_dim1];\n        xibar = v[ll + v_dim1];\n        m = xibar_search(a, t,  xi, xibar, nt);\n        if (m >= 1) {\n            m = lo[t[m - 1]];\n            while (a[m] != 0)\n                if (z__[a[m]] <= xi[m])\n                    m = lo[m];\n                else\n                    m = hi[m];\n            if (v0 < v[c__[m * c_dim1 + 2] + (v_dim1 << 1)]) {\n                v0 = v[c__[m * c_dim1 + 2] + (v_dim1 << 1)];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g0[i11] = vval[i11 + c__[m * c_dim1 + 2] * vval_dim1];\n            }\n            if (v[c__[m * c_dim1 + 4] + (v_dim1 << 1)] < v1) {\n                v1 = v[c__[m * c_dim1 + 4] + (v_dim1 << 1)];\n                for (i11 = 0; i11 <= *d__; ++i11)\n                    g1[i11] = vval[i11 + c__[m * c_dim1 + 4] * vval_dim1];\n            }\n        }\n        h__ = (z__[2] - v0) / (v1 - v0);\n    /*        Hermite basis */\n        hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n        gw = phi0 * g0[0] + phi1 * g1[0] + (psi0 * g0[2] + psi1 * g1[2]) * (\n            v1 - v0);\n        gpw = phi0 * g0[1] + phi1 * g1[1];\n    /*        NS */\n        h__ = (z__[2] - v[ll + (v_dim1 << 1)]) / (v[ur + (v_dim1 << 1)] - v[\n            ll + (v_dim1 << 1)]);\n    /*        Hermite basis */\n        hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n        sns = phi0 * gs + phi1 * gn + (psi0 * gps + psi1 * gpn) * (v[ur + (v_dim1 << 1)] - v[ll + (v_dim1 << 1)]);\n    /*        EW */\n        h__ = (z__[1] - v[ll + v_dim1]) / (v[ur + v_dim1] - v[ll + v_dim1]);\n    /*        Hermite basis */\n        hermite_prep(h__, &phi0, &phi1, &psi0, &psi1);\n        sew = phi0 * gw + phi1 * ge + (psi0 * gpw + psi1 * gpe) * (v[ur + v_dim1] - v[ll + v_dim1]);\n        s = sns + sew - s;\n    }\n    return s;\n}\n\nstatic void ehg136_(double *u, integer *lm, integer *m, integer *n, integer *d__, integer *nf, \n        double *f, double *x, integer *psi, double *y, double *rw, integer *kernel, integer *k, \n        double *dist, double *eta, double *b, integer *od, double *o, integer *ihat, double *w, \n        double *rcond, integer *sing, integer *dd, integer *tdeg, integer *cdeg, double * s) {\n\n    static integer execnt = 0;\n    integer o_dim1, b_dim1, b_offset, s_dim1, u_dim1, x_dim1, x_offset;\n    static integer i__, j, l, i1, info, identi;\n    static double q[8], tol, work[15], scale, sigma[15], qraux[15], dgamma[15];\n    static double e[225]\t/* was [15][15] */, g[225]\t/* was [15][15] */;\n\n    o_dim1 = *m;\n    o -= 1 + o_dim1;\n    --dist;\n    --rw;\n    --psi;\n    x_dim1 = *n;\n    x -= x_offset = 1 + x_dim1;\n    u_dim1 = *lm;\n    u -= 1 + u_dim1;\n    --w;\n    --eta;\n    b_dim1 = *nf;\n    b -= b_offset = 1 + b_dim1;\n    s_dim1 = *od - 0 + 1;\n    s -= 0 + s_dim1;\n    --cdeg;\n\n    ++execnt;\n    if (! (*k <= *nf - 1))\n        loess_error(104);\n    if (! (*k <= 15))\n        loess_error(105);\n    for (identi = 1; identi <= *n; ++identi)\n        psi[identi] = identi;\n    for (l = 1; l <= *m; ++l) {\n        for (i1 = 1; i1 <= *d__; ++i1)\n            q[i1 - 1] = u[l + i1 * u_dim1];\n        ehg127_(q, n, d__, nf, f, &x[x_offset], &psi[1], y, &rw[1],\n            kernel, k, &dist[1], &eta[1], &b[b_offset], od, &w[1], rcond,\n            sing, sigma, e, g, dgamma, qraux, work, &tol, dd, tdeg, &cdeg[1], &s[l * s_dim1]);\n        if (*ihat == 1) {\n    /*           $L sub {l,l} = */\n    /*           V sub {1,:} SIGMA sup {+} U sup T */\n    /*           (Q sup T W e sub i )$ */\n            if (! (*m == *n))\n                loess_error(123);\n    /*           find $i$ such that $l = psi sub i$ */\n            i__ = 1;\n            while  (l != psi[i__]) {\n                ++i__;\n                if (! (i__ < *nf))\n                    loess_error(123);\n            }\n            for (i1 = 1; i1 <= *nf; ++i1)\n                eta[i1] = 0.;\n            eta[i__] = w[i__];\n    /*           $eta = Q sup T W e sub i$ */\n            dqrsl_(&b[b_offset], nf, nf, k, qraux, &eta[1], &eta[1], &eta[1],\n                &eta[1], &eta[1], &eta[1], 1000, &info);\n    /*           $gamma = U sup T eta sub {1:k}$ */\n            for (i1 = 1; i1 <= *k; ++i1)\n                dgamma[i1 - 1] = 0.;\n            for (j = 1; j <= *k; ++j)\n                for (i1 = 1; i1 <= *k; ++i1)\n                        dgamma[i1 - 1] += eta[j] * e[j + i1 * 15 - 16];\n    /*           $gamma = SIGMA sup {+} gamma$ */\n            for (j = 1; j <= *k; ++j)\n                if (tol < sigma[j - 1])\n                         dgamma[j - 1] /= sigma[j - 1];\n                else\n                        dgamma[j - 1] = 0.;\n            o[l + o_dim1] = ddot_(k, g, &c__15, dgamma, &c__1);\n        } else if (*ihat == 2) {\n            /*     $L sub {l,:} = */\n            /*     V sub {1,:} SIGMA sup {+} */\n            /*     ( U sup T Q sup T ) W $ */\n            for (i1 = 1; i1 <= *n; ++i1)\n                    o[l + i1 * o_dim1] = 0.;\n            for (j = 1; j <= *k; ++j) {\n                for (i1 = 1; i1 <=  *nf; ++i1)\n                    eta[i1] = 0.;\n                for (i1 = 1; i1 <=  *k; ++i1)\n                    eta[i1] = e[i1 + j * 15 - 16];\n                dqrsl_(&b[b_offset], nf, nf, k, qraux, &eta[1], &eta[1],\n                    work, work, work, work, 10000, &info);\n                if (tol < sigma[j - 1])\n                    scale = 1. / sigma[j - 1];\n                else\n                    scale = 0.;\n                for (i1 = 1; i1 <=  *nf; ++i1)\n                    eta[i1] *= scale * w[i1];\n                for (i__ = 1; i__ <= *nf; ++i__)\n                    o[l + psi[i__] * o_dim1] += g[j * 15 - 15] * eta[i__];\n            }\n        }\n    }\n} /* ehg136_ */\n\nstatic void ehg137_(double *z__, integer *kappa, integer *leaf, integer *nleaf, integer *d__, \n        integer *nv, integer *nvmax, integer * ncmax, integer *a, double *xi, integer *lo, integer *hi) {\n    static integer execnt = 0, p, pstack[20], stackt;\n\n    --leaf;\n    --z__;\n    --hi;\n    --lo;\n    --xi;\n    --a;\n    /*     stacktop -> stackt */\n    ++execnt;\n    /*     find leaf cells affected by $z$ */\n    stackt = 0;\n    p = 1;\n    *nleaf = 0;\n    while (0 < p) {\n        if (a[p] == 0) {\n            // leaf\n            ++(*nleaf);\n            leaf[*nleaf] = p;\n            // Pop\n            if (stackt >= 1)\n                p = pstack[stackt - 1];\n            else\n                p = 0;\n            stackt = GSL_MAX(0, stackt-1);\n        } else {\n            if (z__[a[p]] == xi[p]) {\n                // Push\n                ++stackt;\n                if (! (stackt <= 20))\n                    loess_error(187);\n                pstack[stackt - 1] = hi[p];\n                p = lo[p];\n            } else {\n                p = (z__[a[p]] <= xi[p])\n                        ? lo[p]\n                        : hi[p];\n            }\n        }\n    }\n    if (! (*nleaf <= 256))\n        loess_error(185);\n} /* ehg137_ */\n\n\n//BK: Changed phi (the 17th input) from integer to double, because this fn is only called\n//once, and there, dist==phi.\nstatic void ehg139_(double *v, integer *nvmax, integer *nv, integer *n, integer *d__, \n        integer *nf, double *f, double *x, integer *pi, integer *psi, double *y, \n        double *rw, double *trl, integer *kernel, integer *k, double *dist, double *phi,\n        double *eta, double *b, integer *od, double *w, double *diagl, double *vval2, \n        integer *ncmax, integer *vc, integer *a, double *xi, integer *lo, integer *hi, integer *c__,\n        integer *vhit, double *rcond, integer *sing, integer *dd, integer\n        *tdeg, integer *cdeg, integer *lq, double *lf, logical *setlf, double *s) {\n\n    integer lq_dim1, c_dim1, c_offset, lf_dim1, lf_dim2, b_dim1, b_offset, \n            s_dim1, v_dim1, v_offset, vval2_dim1, vval2_offset, x_dim1, x_offset, i__1, i__3;\n\n    static integer execnt = 0;\n    static double e[225]\t/* was [15][15] */;\n    static double q[8], u[225]\t/* was [15][15] */, z__[8], i4, i7, tol;\n    static integer i__, j, l, i5, i6, ii, leaf[256], info, ileaf, nleaf, identi;\n    static double term, work[15], scale, sigma[15], qraux[15], dgamma[15];\n\n    --vhit; --diagl; --phi; --dist; --rw; --y;\n    --psi; --pi; --w; --eta; --hi; --lo; --xi; --cdeg;\n    vval2_dim1 = *d__ - 0 + 1;\n    vval2 -= vval2_offset = 0 + vval2_dim1;\n    x_dim1 = *n;\n    x -= x_offset = 1 + x_dim1;\n    v_dim1 = *nvmax;\n    v -= v_offset = 1 + v_dim1;\n    lf_dim1 = *d__ - 0 + 1;\n    lf_dim2 = *nvmax;\n    lf -=  0 + lf_dim1 * (1 + lf_dim2);\n    lq_dim1 = *nvmax;\n    lq -= 1 + lq_dim1;\n    b_dim1 = *nf;\n    b -= b_offset = 1 + b_dim1;\n    s_dim1 = *od - 0 + 1;\n    s -= 0 + s_dim1;\n    c_dim1 = *vc;\n    c__ -= c_offset = 1 + c_dim1;\n\n    ++execnt;\n    /*     l2fit with trace(L) */\n    if (! (*k <= *nf - 1))\n        loess_error(104);\n    if (! (*k <= 15))\n        loess_error(105);\n    if (*trl != 0.) {\n        for (i5 = 1; i5 <= *n; ++i5)\n            diagl[i5] = 0.;\n        for (i6 = 1; i6 <= *nv; ++i6)\n            for (i5 = 0; i5 <= *d__; ++i5)\n                vval2[i5 + i6 * vval2_dim1] = 0.;\n    }\n    for (identi = 1; identi <= *n; ++identi)\n        psi[identi] = identi;\n    i__1 = *nv;\n    for (l = 1; l <= i__1; ++l) {\n        for (i5 = 1; i5 <= *d__; ++i5)\n            q[i5 - 1] = v[l + i5 * v_dim1];\n        ehg127_(q, n, d__, nf, f, &x[x_offset], &psi[1], &y[1], &rw[1],\n            kernel, k, dist, &eta[1], &b[b_offset], od, &w[1], rcond,\n            sing, sigma, u, e, dgamma, qraux, work, &tol, dd, tdeg, &cdeg[1], &s[l * s_dim1]);\n        if (*trl != 0.) { // invert $psi$\n            for (i5 = 1; i5 <= *n; ++i5)\n                phi[i5] = 0;\n            for (i__ = 1; i__ <= *nf; ++i__)\n                phi[psi[i__]] = i__;\n            for (i5 = 1; i5 <= *d__; ++i5)\n                z__[i5 - 1] = v[l + i5 * v_dim1];\n            ehg137_(z__, &vhit[l], leaf, &nleaf, d__, nv, nvmax, ncmax, a, &xi[1], &lo[1], &hi[1]);\n            for (ileaf = 1; ileaf <= nleaf; ++ileaf) {\n                i__3 = hi[leaf[ileaf - 1]];\n                for (ii = lo[leaf[ileaf - 1]]; ii <= i__3; ++ii) {\n                    i__ = phi[pi[ii]];\n                    if (i__ != 0) {\n                        if (! (psi[i__] == pi[ii]))\n                            loess_error(194);\n                        for (i5 = 1; i5 <= *nf; ++i5)\n                            eta[i5] = 0.;\n                        eta[i__] = w[i__];\n                        /*                    $eta = Q sup T W e sub i$ */\n                        dqrsl_(&b[b_offset], nf, nf, k, qraux, &eta[1], work,\n                            &eta[1], &eta[1], work, work, 1000, &info);\n                        for (j = 1; j <= *k; ++j) {\n                            i4 = (tol < sigma[j - 1])\n                                ? ddot_(k, &u[j * 15 - 15], &c__1, &eta[1], &c__1) / sigma[j - 1]\n                                : 0.;\n                            dgamma[j - 1] = i4;\n                        }\n                        for (j = 1; j <= *d__ + 1; ++j) // bug fix 2006-07-15 for k=1, od>1.   (thanks btyner@gmail.com) */\n                            vval2[j - 1 + l * vval2_dim1] = (j <= *k)\n                                        ? ddot_(k, &e[j - 1], &c__15, dgamma, &c__1)\n                                        : 0.;\n                        for (i5 = 1; i5 <= *d__; ++i5)\n                            z__[i5 - 1] = x[pi[ii] + i5 * x_dim1];\n                        term = ehg128_(z__, d__, ncmax, vc, a, &xi[1], &\n                            lo[1], &hi[1], &c__[c_offset], &v[v_offset],\n                            nvmax, &vval2[vval2_offset]);\n                        diagl[pi[ii]] += term;\n                        for (i5 = 0; i5 <= *d__; ++i5)\n                            vval2[i5 + l * vval2_dim1] = 0.;\n                    }\n                }\n            }\n        }\n        if (*setlf) {\n            /*           $Lf sub {:,l,:} = V SIGMA sup {+} U sup T Q sup T W$ */\n            if (! (*k >= *d__ + 1))\n                loess_error(196);\n            for (i5 = 1; i5 <= *nf; ++i5)\n                lq[l + i5 * lq_dim1] = psi[i5];\n            for (i6 = 1; i6 <= *nf; ++i6)\n                for (i5 = 0; i5 <= *d__; ++i5)\n                    lf[i5 + (l + i6 * lf_dim2) * lf_dim1] = 0.;\n            for (j = 1; j <= *k; ++j) {\n                for (i5 = 1; i5 <= *nf; ++i5)\n                    eta[i5] = 0.;\n                for (i5 = 1; i5 <= *k; ++i5)\n                    eta[i5] = u[i5 + j * 15 - 16];\n                dqrsl_(&b[b_offset], nf, nf, k, qraux, &eta[1], &eta[1], work, work, work, work, 10000, &info);\n                scale = (tol < sigma[j - 1])\n                        ? 1. / sigma[j - 1]\n                        : 0.;\n                for (i5 = 1; i5 <= *nf; ++i5)\n                    eta[i5] *= scale * w[i5];\n                for (i__ = 1; i__ <= *nf; ++i__) {\n                    i7 = eta[i__];\n                    for (i5 = 0; i5 <= *d__; ++i5)\n                        if (i5 < *k)\n                            lf[i5 + (l + i__ * lf_dim2) * lf_dim1] += e[i5 + 1 + j * 15 - 16] * i7;\n                        else\n                            lf[i5 + (l + i__ * lf_dim2) * lf_dim1] = 0.;\n                }\n            }\n        }\n    }\n    if (*trl != 0.) {\n        if (*n <= 0)\n            *trl = 0.;\n        else {\n            *trl = diagl[*n];\n            for (int i2 = *n - 1; i2 >= 1; --i2)\n                *trl += diagl[i2];\n        }\n    }\n}\n\nstatic void dqrdc_(double *x, integer *ldx, integer *n, integer *p, double *qraux, \n        integer *jpvt, double *work, integer job) {\n    integer x_dim1, i__2, i__3;\n    double d__2;\n\n    static integer j, l, jj, jp, pl, pu, lp1, lup, maxj;\n    static logical negj, swapj;\n    static double t, tt, nrmxl, maxnrm;\n\n/*     dqrdc uses householder transformations to compute the qr \n     factorization of an n by p matrix x.  column pivoting \n     based on the 2-norms of the reduced columns may be \n     performed at the users option. \n\n     on entry \n\n        x       double precision(ldx,p), where ldx .ge. n. \n                x contains the matrix whose decomposition is to be \n                computed. \n\n        ldx     integer. \n                ldx is the leading dimension of the array x. \n\n        n       integer. \n                n is the number of rows of the matrix x. \n\n        p       integer. \n                p is the number of columns of the matrix x. \n\n        jpvt    integer(p). \n                jpvt contains integers that control the selection \n                of the pivot columns.  the k-th column x(k) of x \n                is placed in one of three classes according to the \n                value of jpvt(k). \n\n                   if jpvt(k) .gt. 0, then x(k) is an initial \n                                      column. \n\n                   if jpvt(k) .eq. 0, then x(k) is a free column. \n\n                   if jpvt(k) .lt. 0, then x(k) is a final column. \n\n                before the decomposition is computed, initial columns \n                are moved to the beginning of the array x and final \n                columns to the end.  both initial and final columns \n                are frozen in place during the computation and only \n                free columns are moved.  at the k-th stage of the \n                reduction, if x(k) is occupied by a free column \n                it is interchanged with the free column of largest \n                reduced norm.  jpvt is not referenced if \n                job .eq. 0. \n\n        work    double precision(p). \n                work is a work array.  work is not referenced if \n                job .eq. 0. \n\n        job     integer. \n                job is an integer that initiates column pivoting. \n                if job .eq. 0, no pivoting is done. \n                if job .ne. 0, pivoting is done. \n\n     on return \n\n        x       x contains in its upper triangle the upper \n                triangular matrix r of the qr factorization. \n                below its diagonal x contains information from \n                which the orthogonal part of the decomposition \n                can be recovered.  note that if pivoting has \n                been requested, the decomposition is not that \n                of the original matrix x but that of x \n                with its columns permuted as described by jpvt. \n\n        qraux   double precision(p). \n                qraux contains further information required to recover \n                the orthogonal part of the decomposition. \n\n        jpvt    jpvt(k) contains the index of the column of the \n                original matrix that has been interchanged into \n                the k-th column, if pivoting was requested. \n\n     linpack. this version dated 08/14/78 . \n     g.w. stewart, university of maryland, argonne national lab. \n\n     dqrdc uses the following functions and subprograms. \n\n     blas daxpy,ddot,dscal,dswap,dnrm2 \n     fortran dabs,dmax1,min0,dsqrt */\n\n/*     internal variables */\n\n    x_dim1 = *ldx;\n    x -= 1 + x_dim1;\n    --qraux;\n    --jpvt;\n    --work;\n\n    pl = 1;\n    pu = 0;\n    if (job == 0)\n        goto L60;\n\n    /*        pivoting has been requested.  rearrange the columns */\n    /*        according to jpvt. */\n    for (j = 1; j <= *p; ++j) {\n        swapj = jpvt[j] > 0;\n        negj = jpvt[j] < 0;\n        jpvt[j] = j;\n        if (negj)\n            jpvt[j] = -j;\n        if (! swapj)\n            goto L10;\n        if (j != pl)\n            dswap(*n, &x[pl * x_dim1 + 1], &x[j * x_dim1 + 1]);\n        jpvt[j] = jpvt[pl];\n        jpvt[pl] = j;\n        ++pl;\n    L10:\n        ;\n    }\n    pu = *p;\n    for (jj = 1; jj <= *p; ++jj) {\n        j = *p - jj + 1;\n        if (jpvt[j] >= 0)\n            continue;\n        jpvt[j] = -jpvt[j];\n        if (j == pu)\n            goto L30;\n        dswap(*n, &x[pu * x_dim1 + 1], &x[j * x_dim1 + 1]);\n        jp = jpvt[pu];\n        jpvt[pu] = jpvt[j];\n        jpvt[j] = jp;\n    L30:\n        --pu;\n    }\nL60:\n\n    /*     compute the norms of the free columns. */\n    if (!(pu < pl) )\n        for (j = pl; j <= pu; ++j) {\n            qraux[j] = dnrm2(n, &x[j * x_dim1 + 1]);\n            work[j] = qraux[j];\n        }\n\n    /*     perform the householder reduction of x. */\n    lup = min(*n,*p);\n    for (l = 1; l <= lup; ++l) {\n        if (l < pl || l >= pu)\n            goto L120;\n\n        /*           locate the column of largest norm and bring it */\n        /*           into the pivot position. */\n        maxnrm = 0.;\n        maxj = l;\n        for (j = l; j <= pu; ++j) {\n            if (qraux[j] <= maxnrm)\n                continue;\n            maxnrm = qraux[j];\n            maxj = j;\n        }\n        if (maxj != l) {\n            dswap(*n, &x[l * x_dim1 + 1], &x[maxj * x_dim1 + 1]);\n            qraux[maxj] = qraux[l];\n            work[maxj] = work[l];\n            jp = jpvt[maxj];\n            jpvt[maxj] = jpvt[l];\n            jpvt[l] = jp;\n        }\n    L120:\n        qraux[l] = 0.;\n        if (l == *n)\n            continue;\n        /*           compute the householder transformation for column l. */\n\n        i__2 = *n - l + 1;\n        nrmxl = dnrm2(&i__2, &x[l + l * x_dim1]);\n        if (nrmxl == 0.)\n            continue;\n        if (x[l + l * x_dim1] != 0.)\n            nrmxl = d_sign(&nrmxl, &x[l + l * x_dim1]);\n        i__2 = *n - l + 1;\n        dscal(&i__2, 1./nrmxl, &x[l + l * x_dim1]);\n        x[l + l * x_dim1] += 1.;\n\n        /*  apply the transformation to the remaining columns,  updating the norms. */\n        lp1 = l + 1;\n        if (!(*p < lp1))\n            for (j = lp1; j <= *p; ++j) {\n                i__3 = *n - l + 1;\n                t = -ddot_(&i__3, &x[l + l * x_dim1], &c__1, &x[l + j * x_dim1], &\n                    c__1) / x[l + l * x_dim1];\n                i__3 = *n - l + 1;\n                daxpy_(&i__3, &t, &x[l + l * x_dim1], &c__1, &x[l + j * x_dim1], &\n                    c__1);\n                if ((j < pl || j > pu) || (qraux[j] == 0.) )\n                    continue;\n                /* Computing 2nd power */\n                d__2 = abs(x[l + j * x_dim1]) / qraux[j];\n                tt = GSL_MAX(1. - d__2 * d__2, 0);\n                t = tt;\n                tt = tt * .05 * gsl_pow_2(qraux[j] / work[j]) + 1.;\n                if (tt == 1.)\n                    goto L130;\n                qraux[j] *= sqrt(t);\n                continue;\n                L130:\n                i__3 = *n - l;\n                qraux[j] = dnrm2(&i__3, &x[l + 1 + j * x_dim1]);\n                work[j] = qraux[j];\n            }\n        /*              save the transformation. */\n        qraux[l] = x[l + l * x_dim1];\n        x[l + l * x_dim1] = -nrmxl;\n    }\n} /* dqrdc_ */\n\nstatic void lowesb_(double *xx, double *yy, double *ww, double *diagl, double trl,\n        integer *iv, integer *liv, integer * lv, double *wv) {\n    static integer execnt = 0, setlf;\n    --wv;\n    --iv;\n\n    ++execnt;\n    if (! (iv[28] != 173))\n        loess_error(174);\n    if (iv[28] != 172 && !(iv[28] == 171))\n\t    loess_error(171);\n    iv[28] = 173;\n    setlf = iv[27] != iv[25];\n    integer     i__1 = ifloor(iv[3] * wv[2]);\n    ehg131_(xx, yy, ww, &trl, diagl, &iv[20], &iv[29], &iv[3], &iv[2], &iv[5], &iv[17], \n            &iv[4], &iv[6], &iv[14], &iv[19], &wv[1] , &iv[iv[7]], &iv[iv[8]], &iv[iv[9]], &iv[iv[10]], \n            &iv[iv[22]], & iv[iv[27]], &wv[iv[11]], &iv[iv[23]], &wv[iv[13]], &wv[iv[12]], & wv[iv[15]], \n            &wv[iv[16]], &wv[iv[18]], &i__1, &wv[3], &wv[iv[26]], &wv[iv[24]], &wv[4], &iv[30], &iv[33], \n            &iv[32], &iv[41], &iv[iv[25]], &wv[iv[34]], &setlf);\n    if ((double) iv[14] < iv[6] + (double) iv[4] / 2.)\n        ehg183_(\"Warning. k-d tree limited by memory; nvmax=\", &iv[14], 1, 1);\n    else if (iv[17] < iv[5] + 2)\n\t    ehg183_(\"Warning. k-d tree limited by memory. ncmax=\", &iv[17], 1, 1);\n} /* lowesb_ */\n\nstatic void lowesd_(integer *iv, integer *liv, integer *lv, double *v, \n        integer d__, integer n, double f, integer ideg, integer *nvmax, logical *setlf) {\n    static integer execnt = 0, i__, j, i1, i2, nf, vc, ncmax, bound;\n    --iv;\n    --v;\n\n    ++execnt;\n    iv[28] = 171;\n    iv[2] = d__;\n    iv[3] = n;\n    vc = pow_ii(2, d__);\n    iv[4] = vc;\n    if (! (0. < f))\n        loess_error(120);\n/* Computing MIN */\n    nf = GSL_MIN(n, ifloor(n * f));\n    iv[19] = nf;\n    iv[20] = 1;\n    if (ideg == 0)\n        i1 = 1;\n    else {\n        if (ideg == 1)\n            i1 = d__ + 1;\n        else if (ideg == 2)\n            i1 = (integer) ((double) ((d__ + 2) * (d__ + 1)) / 2.);\n    }\n    iv[29] = i1;\n    iv[21] = 1;\n    iv[14] = *nvmax;\n    ncmax = *nvmax;\n    iv[17] = ncmax;\n    iv[30] = 0;\n    iv[32] = ideg;\n    if (! (ideg >= 0) || (! (ideg <= 2)))\n        loess_error(195);\n    iv[33] = d__;\n    for (i2 = 41; i2 <= 49; ++i2)\n        iv[i2] = ideg;\n    iv[7] = 50;\n    iv[8] = iv[7] + ncmax;\n    iv[9] = iv[8] + vc * ncmax;\n    iv[10] = iv[9] + ncmax;\n    iv[22] = iv[10] + ncmax;\n/*     initialize permutation */\n    j = iv[22] - 1;\n    for (i__ = 1; i__ <= n; ++i__)\n        iv[j + i__] = i__;\n    iv[23] = iv[22] + n;\n    iv[25] = iv[23] + *nvmax;\n    if (*setlf)\n        iv[27] = iv[25] + *nvmax * nf;\n    else\n        iv[27] = iv[25];\n    bound = iv[27] + n;\n    if (! (bound - 1 <= *liv))\n        loess_error(102);\n    iv[11] = 50;\n    iv[13] = iv[11] + *nvmax * d__;\n    iv[12] = iv[13] + (d__ + 1) * *nvmax;\n    iv[15] = iv[12] + ncmax;\n    iv[16] = iv[15] + n;\n    iv[18] = iv[16] + nf;\n    iv[24] = iv[18] + iv[29] * nf;\n    iv[34] = iv[24] + (d__ + 1) * *nvmax;\n    if (*setlf)\n        iv[26] = iv[34] + (d__ + 1) * *nvmax * nf;\n    else\n        iv[26] = iv[34];\n    bound = iv[26] + nf;\n    if (! (bound - 1 <= *lv))\n       loess_error(103);\n\n    v[1] = f;\n    v[2] = .05;\n    v[3] = 0.;\n    v[4] = 1.;\n} /* lowesd_ */\n\nstatic void lowese_(integer *iv, integer *liv, integer *lv, double *wv, integer m, double *z, double *s) {\n    static integer execnt = 0;\n    ++execnt;\n    --iv;\n    --wv;\n\n    if (! (iv[28] != 172))\n        loess_error(172);\n    if (! (iv[28] == 173))\n        loess_error(173);\n    ehg133_(&iv[3], &iv[2], &iv[4], &iv[14], &iv[5], &iv[17], &iv[iv[7]], &iv[iv[8]], &iv[iv[9]],\n            &iv[iv[10]], &wv[iv[11]], &wv[iv[13]], &wv[iv[12]], m, z, s);\n}\n\nstatic void lowesf_(double *xx, double *yy, double *ww, integer *iv, integer *liv, \n        integer *lv, double *wv, integer *m, double *z__, double *l, integer ihat, double *s) {\n    static integer execnt = 0;\n    integer l_dim1, l_offset, z_dim1, z_offset;\n    static logical i1;\n    --xx;\n    --yy;\n    --ww;\n    --iv;\n    --wv;\n    l_dim1 = *m;\n    l -= l_offset = 1 + l_dim1;\n    z_dim1 = *m;\n    z__ -= z_offset = 1 + z_dim1;\n\n    ++execnt;\n    i1 = (171 <= iv[28])\n          ? iv[28] <= 174\n\t      : FALSE_;\n    if (! i1)\n        loess_error(171);\n    iv[28] = 172;\n    if (! (iv[14] >= iv[19]))\n        loess_error(186);\n    ehg136_(&z__[z_offset], m, m, &iv[3], &iv[2], &iv[19], &wv[1], &xx[1], &iv[iv[22]], &yy[1],\n            &ww[1], &iv[20], &iv[29], &wv[iv[15]], &wv[iv[16]], &wv[iv[18]], &c__0, &l[l_offset],\n            &ihat, &wv[iv[26]], &wv[4], &iv[30], &iv[33], &iv[32], &iv[41], s);\n} /* lowesf_ */\n\nstatic void lowesl_(integer *iv, integer *liv, integer *lv, double *wv, integer *m, double *z__, double *l) {\n    static integer execnt = 0;\n    integer l_dim1, l_offset, z_dim1, z_offset;\n\n    --iv;\n    --wv;\n    l_dim1 = *m;\n    l -= l_offset = 1 + l_dim1;\n    z_dim1 = *m;\n    z__ -= z_offset = 1 + z_dim1;\n\n    ++execnt;\n    if (! (iv[28] != 172))\n        loess_error(172);\n    if (! (iv[28] == 173))\n        loess_error(173);\n    if (! (iv[26] != iv[34]))\n        loess_error(175);\n    ehg191_(m, &z__[z_offset], &l[l_offset], &iv[2], &iv[3], &iv[19], &iv[6],\n\t    &iv[17], &iv[4], &iv[iv[7]], &wv[iv[12]], &iv[iv[10]], &iv[iv[9]],\n\t     &iv[iv[8]], &wv[iv[11]], &iv[14], &wv[iv[24]], &wv[iv[34]], &iv[iv[25]]);\n} /* lowesl_ */\n\nstatic void lowesw_(double *res, integer *n, double *rw, integer *pi) {\n    static integer i1, nh, execnt = 0, identi;\n    static double cmad, rsmall;\n    --pi;\n    --rw;\n    --res;\n    ++execnt;\n/*     tranliterated from Devlin's ratfor */\n/*     find median of absolute residuals */\n    for (i1 = 1; i1 <= *n; ++i1)\n        rw[i1] = abs(res[i1]);\n    for (identi = 1; identi <= *n; ++identi)\n        pi[identi] = identi;\n    nh = ifloor((double) (*n) / 2.) + 1;\n/*     partial sort to find 6*mad */\n    find_kth_smallest(1, *n, nh, 1, &rw[1], &pi[1]);\n    if (*n - nh + 1 < nh) {\n        find_kth_smallest(1, 1, 2, 1, &rw[1], &pi[1]);\n        cmad = (rw[pi[nh]] + rw[pi[nh - 1]]) * 3;\n    } else\n        cmad = rw[pi[nh]] * 6;\n    rsmall = DBL_MIN;\n    if (cmad < rsmall)\n        for (i1 = 1; i1 <= *n; ++i1)\n            rw[i1] = 1.;\n    else\n        for (int i__ = 1; i__ <= *n; ++i__)\n            if (cmad * .999 < rw[i__])\n                rw[i__] = 0.;\n            else\n                rw[i__] = (cmad * .001 < rw[i__])\n                            ? gsl_pow_2(1 - gsl_pow_2(rw[i__] / cmad))\n                            : 1.;\n}\n\nstatic void pseudovals(integer n, double *y, double *yhat, double *pwgts,  //formerly lowesp\n                double *rwgts, integer *pi, double *ytilde) {\n    static integer m, i5, identi, execnt = 0;\n    static double i4, mad;\n\n    --ytilde;\n    --pi;\n    --rwgts;\n    --pwgts;\n    --yhat;\n    --y;\n\n    ++execnt;\n    /*     median absolute deviation */\n    for (i5 = 1; i5 <= n; ++i5)\n        ytilde[i5] = abs(y[i5] - yhat[i5]) * sqrt(pwgts[i5]);\n    for (identi = 1; identi <= n; ++identi)\n        pi[identi] = identi;\n    m = ifloor((double) (n) / 2.) + 1;\n    find_kth_smallest(1, n, m, 1, &ytilde[1], &pi[1]);\n    if (n - m + 1 < m) {\n        find_kth_smallest(1, m-1, m-1, 1, &ytilde[1], &pi[1]);\n        mad = (ytilde[pi[m - 1]] + ytilde[pi[m]]) / 2;\n    } else\n        mad = ytilde[pi[m]];\n    for (i5 = 1; i5 <= n; ++i5)\n        ytilde[i5] = 1 - gsl_pow_2(y[i5] - yhat[i5]) * pwgts[i5] / (gsl_pow_2(mad * 6)/ 5);\n    for (i5 = 1; i5 <= n; ++i5)\n        ytilde[i5] *= sqrt(rwgts[i5]);\n    if (n <= 0)\n        i4 = 0.;\n    else {\n        double i1 = ytilde[n];\n        for (integer i2 = n - 1; i2 >= 1; --i2)\n            i1 += ytilde[i2];\n        i4 = i1;\n    }\n    //     pseudovalues\n    for (i5 = 1; i5 <= n; ++i5)\n        ytilde[i5] = yhat[i5] + (n / i4) * rwgts[i5] * (y[i5] - yhat[i5]);\n}\n\n////// Back to loessc.c\nstatic void loess_workspace(long D, long N, double\tspan, long degree,\n\t\t\tlong *nonparametric, long *drop_square, long *sum_drop_sqr, long setLf){\n\tlong tau0, nvmax, nf, i;\n\tnvmax = max(200, N);\n        nf = min(N, floor(N * span));\n        tau0 = (degree > 1) ? ((D + 2) * (D + 1) * 0.5) : (D + 1);\n        tau = tau0 - (*sum_drop_sqr);\n        lv = 50 + (3 * D + 3) * nvmax + N + (tau0 + 2) * nf;\n\tliv = 50 + ((long)pow((double)2, (double)D) + 4) * nvmax + 2 * N;\n\tif(setLf) {\n\t\tlv = lv + (D + 1) * nf * nvmax;\n\t\tliv = liv + nf * nvmax;\t\n\t}\n    iv = Calloc(liv, long);\n    v = Calloc(lv, double);\n\n    lowesd_(iv, &liv, &lv, v, D, N, span, degree, &nvmax, &setLf);\n    iv[32] = *nonparametric;\n    for(i = 0; i < D; i++)\n        iv[i + 40] = drop_square[i];\n}\n\nstatic void loess_free() {\n    free(v);\n    free(iv);\n}\n\nstatic void loess_dfit( double\t*y, double *x, double *x_evaluate, double *weights,\n\t\t\tdouble span, long degree, long *nonparametric, long *drop_square,\n\t\t\tlong *sum_drop_sqr, long d, long n, long *m, double *fit) {\n    loess_workspace(d, n, span, degree, nonparametric, drop_square, sum_drop_sqr, 0);\n\tlowesf_(x, y, weights, iv, &liv, &lv, v, m, x_evaluate, &doublepluszero, 0, fit);\n\tloess_free();\n}\n\nstatic void loess_dfitse( double\t*y, double *x, double *x_evaluate, double *weights, double *robust,\n        int\tfamily, double span, long degree, long *nonparametric, long *drop_square,\n         long *sum_drop_sqr, long d, long n, long *m, double *fit, double *L) {\n    loess_workspace(d, n, span, degree, nonparametric, drop_square, sum_drop_sqr, 0);\n\tif(family == GAUSSIAN)\n\t\tlowesf_(x, y, weights, iv, &liv, &lv, v, m, x_evaluate, L, 2, fit);\n\telse if(family == SYMMETRIC) {\n\t\tlowesf_(x, y, weights, iv, &liv, &lv, v, m, x_evaluate, L, 2, fit);\n\t\tlowesf_(x, y, robust, iv, &liv, &lv, v, m, x_evaluate, &doublepluszero, 0, fit);\n\t}\t\n\tloess_free();\n}\n\nstatic void loess_grow(long\tconst * restrict parameter,long const*restrict a,\n                       double\tconst *restrict xi, double const *restrict vert, \n                       const double *restrict vval) {\n\tlong\td, vc, nc, nv, a1, v1, xi1, vv1, i, k;\n\td = parameter[0];\n\tvc = parameter[2];\n\tnc = parameter[3];\n\tnv = parameter[4];\n\tliv = parameter[5];\n\tlv = parameter[6];\n\tiv = Calloc(liv, long);\n\tv = Calloc(lv, double);\n\n\tiv[1] = d;\n\tiv[2] = parameter[1];\n\tiv[3] = vc;\n\tiv[5] = iv[13] = nv;\n\tiv[4] = iv[16] = nc;\n\tiv[6] = 50;\n\tiv[7] = iv[6] + nc;\n\tiv[8] = iv[7] + vc * nc;\n\tiv[9] = iv[8] + nc;\n\tiv[10] = 50;\n\tiv[12] = iv[10] + nv * d;\n\tiv[11] = iv[12] + (d + 1) * nv;\n\tiv[27] = 173;\n\n\tv1 = iv[10] - 1;\n\txi1 = iv[11] - 1;\n\ta1 = iv[6] - 1;\n\tvv1 = iv[12] - 1;\n\t\n    for(i = 0; i < d; i++) {\n\t\tk = nv * i;\n\t\tv[v1 + k] = vert[i];\n\t\tv[v1 + vc - 1 + k] = vert[i + d];\n\t}\n    for(i = 0; i < nc; i++) {\n            v[xi1 + i] = xi[i];\n            iv[a1 + i] = a[i];\n    }\n\tk = (d + 1) * nv;\n\tfor(i = 0; i < k; i++)\n\t\tv[vv1 + i] = vval[i];\n\tehg169_(d, &vc, &nc, &nc, &nv, nv, v+v1, iv+a1, v+xi1, iv+iv[7]-1, iv+iv[8]-1, iv+iv[9]-1);\n}\n\nstatic void loess_ifit(long const * restrict parameter, long const *restrict a, \n                double const *restrict xi, double const *restrict vert,\n                 const double *restrict vval, long m, double *x_evaluate, double *fit) {\n\tloess_grow(parameter, a, xi, vert, vval);\n\tlowese_(iv, &liv, &lv, v, m, x_evaluate, fit);\n\tloess_free();\n}\n\nstatic void loess_ise( double\t*y, double *x, double *x_evaluate, double *weights, double span, long degree,\n             long int *nonparametric, long int *drop_square, long int *sum_drop_sqr, double *cell, long int d,\n             long int n, long int *m, double *fit, double *L) {\n    loess_workspace(d, n, span, degree, nonparametric, drop_square, sum_drop_sqr, 1);\n\tv[1] = *cell;\n\tlowesb_(x, y, weights, &doublepluszero, 0, iv, &liv, &lv, v);\n\tlowesl_(iv, &liv, &lv, v, m, x_evaluate, L);\n\tloess_free();\n}\n\nstatic void loess_prune( long\t*parameter, long *a, double\t*xi, double *vert, double *vval) {\n\tlong\td, vc, a1, v1, xi1, vv1, nc, nv, nvmax, i, k;\n\td = iv[1];\n\tvc = iv[3] - 1;\n\tnc = iv[4];\n\tnv = iv[5];\n\ta1 = iv[6] - 1;\n\tv1 = iv[10] - 1;\n\txi1 = iv[11] - 1;\n\tvv1 = iv[12] - 1;\n\tnvmax = iv[13];\n\n\tfor(i = 0; i < 5; i++)\n\t\tparameter[i] = iv[i + 1];\n\tparameter[5] = iv[21] - 1;\n\tparameter[6] = iv[14] - 1;\n\n\tfor(i = 0; i < d; i++){\n\t\tk = nvmax * i;\n\t\tvert[i] = v[v1 + k];\n\t\tvert[i + d] = v[v1 + vc + k];\n\t}\n\tfor(i = 0; i < nc; i++) {\n\t\txi[i] = v[xi1 + i];\n\t\ta[i] = iv[a1 + i];\n\t}\n\tk = (d + 1) * nv;\n\tfor(i = 0; i < k; i++)\n\t\tvval[i] = v[vv1 + i];\n}\n\n////// predict.c\n\n/** \\cond doxy_ignore  Private to this file.*/\nstruct pred_struct {\n\tdouble\t*fit;           //The evaluated loess surface at eval.\n\tdouble\t*se_fit;        //Estimates of the standard errors of the surface values.\n\tdouble  residual_scale; //Estimate of the scale of the residuals.\n\tdouble  df;             //The degrees of freedom of the t-distribution used to compute pointwise \n                            //   confidence intervals for the evaluated surface. \n};\n/** \\endcond */ //End of Doxygen ignore.\n\nvoid predict(double  *new_x, long M, struct loess_struct *lo, struct pred_struct *pre, int want_cov) {\n\t\n    long D = lo->in.p;//Aliases for the purposes of merging some fn.s\n    long N = lo->in.n;\n            \n\tint     i, j, k, p;\n\tdouble x[N * D], x_tmp[N * D], x_evaluate[M * D];\n\n\tfor(i = 0; i < D; i++) {\n\t\tk = i * M;\n\t\tfor(j = 0; j < M; j++) {\n\t\t\tp = k + j;\n\t\t\tnew_x[p] /= lo->out.divisor[i];\n\t\t}\n\t}\n    memcpy(x_tmp, lo->in.x, N * D * sizeof(double));\n\tif(!strcmp(lo->control.surface, \"direct\") || want_cov)\n        for(i = 0; i < D; i++) {\n\t\t\tk = i * N;\n\t\t\tfor(j = 0; j < N; j++) {\n                p = k + j;\n                x_tmp[p] = lo->in.x[p] / lo->out.divisor[i];\n            }\n\t\t}\n\tj = D - 1;\n    long sum_drop_sqr = 0, sum_parametric = 0, nonparametric = 0;\n    long order_parametric[D], order_drop_sqr[D];\n\tfor(i = 0; i < D; i++) {\n        sum_drop_sqr += lo->model.drop_square[i];\n        sum_parametric += lo->model.parametric[i];\n        if(lo->model.parametric[i])\n            order_parametric[j--] = i;\n        else\n            order_parametric[nonparametric++] = i;\n\t}\n    for(i = 0; i < D; i++) {\n        order_drop_sqr[i] = 2 - lo->model.drop_square[order_parametric[i]];\n        k = i * M;\n        p = order_parametric[i] * M;\n        for(j = 0; j < M; j++)\n            x_evaluate[k + j] = new_x[p + j];\n        k = i * N;\n        p = order_parametric[i] * N;\n        for(j = 0; j < N; j++)\n            x[k + j] = x_tmp[p + j];\n    }\n\tfor(i = 0; i < N; i++)\n\t\tlo->out.robust[i] *= lo->in.weights[i];\n\n    pre->fit = malloc(M * sizeof(double));\n\tpre->residual_scale = lo->out.s;\n\tpre->df = (lo->out.one_delta * lo->out.one_delta) / lo->out.two_delta;\n    double L[N * M];\n\tif(!strcmp(lo->control.surface, \"direct\")) {\n        if(want_cov)\n            loess_dfitse(lo->in.y, x, x_evaluate, lo->in.weights, lo->out.robust, !strcmp(lo->model.family, \"gaussian\"), \n                lo->model.span, lo->model.degree, &nonparametric, order_drop_sqr, &sum_drop_sqr, D, N, &M, pre->fit, L);\n        else\n            loess_dfit(lo->in.y, x, x_evaluate, lo->out.robust, lo->model.span, lo->model.degree, &nonparametric,\n                order_drop_sqr, &sum_drop_sqr, D, N, &M, pre->fit);\n    } else {\n        loess_ifit(lo->kd_tree.parameter, lo->kd_tree.a, lo->kd_tree.xi, lo->kd_tree.vert, \n                        lo->kd_tree.vval, M, x_evaluate, pre->fit);\n        if(want_cov) {\n            double new_cell = lo->model.span * lo->control.cell;\n            double fit_tmp[M];\n            loess_ise(lo->in.y, x, x_evaluate, lo->in.weights, lo->model.span, lo->model.degree, &nonparametric, \n                    order_drop_sqr, &sum_drop_sqr, &new_cell, D, N, &M, fit_tmp, L);\n        }\n    }\n\tif (want_cov) {\n        pre->se_fit = malloc(M * sizeof(double));\n        for(i = 0; i < N; i++) {\n            k = i * M;\n            for(j = 0; j < M; j++) {\n                p = k + j;\n                L[p] /= lo->in.weights[i];\n                L[p] *= L[p]; //i.e., square\n            }\n\t\t}\n\t\tfor(i = 0; i < M; i++) {\n            double tmp = 0;\n\t\t\tfor(j = 0; j < N; j++)\n                tmp += L[i + j * M];\n\t\t\tpre->se_fit[i] = lo->out.s * sqrt(tmp);\n\t\t}\n\t}\n}\n\nvoid pred_free_mem(struct\tpred_struct\t*pre){\n\tfree(pre->fit);\n\tfree(pre->se_fit);\n}\n\n ///// loess.c\nstatic  char    *surf_stat;\n\nint comp(const void *d1_in, const void *d2_in) {\n    const double *d1 = d1_in;\n    const double *d2 = d1_in;\n        if(*d1 < *d2)\n                return(-1);\n        else if(*d1 == *d2)\n                return(0);\n        else\n                return(1);\n}\n\nstatic void condition(char\t**surface, char *new_stat, char **trace_hat_in) {\n\tif(!strcmp(*surface, \"interpolate\")) {\n\t\tif(!strcmp(new_stat, \"none\"))\n\t\t\tsurf_stat = \"interpolate/none\";\n\t\telse if(!strcmp(new_stat, \"exact\"))\n\t\t\tsurf_stat = \"interpolate/exact\";\n\t\telse if(!strcmp(new_stat, \"approximate\"))\n\t\t{\n\t\t\tif(!strcmp(*trace_hat_in, \"approximate\"))\n\t\t\t\tsurf_stat = \"interpolate/2.approx\";\n\t\t\telse if(!strcmp(*trace_hat_in, \"exact\"))\n\t\t\t\tsurf_stat = \"interpolate/1.approx\";\n\t\t}\n\t}\n\telse if(!strcmp(*surface, \"direct\")) {\n\t\tif(!strcmp(new_stat, \"none\"))\n\t\t\tsurf_stat = \"direct/none\";\n\t\telse if(!strcmp(new_stat, \"exact\"))\n\t\t\tsurf_stat = \"direct/exact\";\n\t\telse if(!strcmp(new_stat, \"approximate\"))\n\t\t\tsurf_stat = \"direct/approximate\";\n\t}\n}\n\nstatic void loess_raw( double\t*y, double *x, double *weights, double *robust, long\t*d, \n            long *n, double *span, long *degree, long *nonparametric, long *drop_square, \n            long *sum_drop_sqr, double *cell, char\t**surf_stat, double *surface, long\t*parameter, \n            long *a, double *xi, double *vert, double\t*vval,double *diagonal, double*trL, \n            double*one_delta, double*two_delta, long *setLf) {\n\tlong nsing, i, k;\n\tdouble\t*hat_matrix, *LL;\n\t*trL = 0;\n\tloess_workspace(*d, *n, *span, *degree, nonparametric, drop_square, sum_drop_sqr, *setLf);\n        v[1] = *cell;\n\tif(!strcmp(*surf_stat, \"interpolate/none\")) {\n\t\tlowesb_(x, y, robust, &doublepluszero, 0, iv, &liv, &lv, v);\n\t\tlowese_(iv, &liv, &lv, v, *n, x, surface);\n\t\tloess_prune(parameter, a, xi, vert, vval);\n\t}\t\t\t\n\telse if (!strcmp(*surf_stat, \"direct/none\"))\n\t\tlowesf_(x, y, robust, iv, &liv, &lv, v, n, x, &doublepluszero, 0, surface);\n\telse if (!strcmp(*surf_stat, \"interpolate/1.approx\")) {\n\t\tlowesb_(x, y, weights, diagonal, 1, iv, &liv, &lv, v);\n\t\tlowese_(iv, &liv, &lv, v, *n, x, surface);\n\t\tnsing = iv[29];\n\t\tfor(i = 0; i < *n; i++) *trL = *trL + diagonal[i];\n\t\tlowesa_(trL, n, d, &tau, &nsing, one_delta, two_delta);\n\t\tloess_prune(parameter, a, xi, vert, vval);\n\t}\n    else if (!strcmp(*surf_stat, \"interpolate/2.approx\")) {\n\t\tlowesb_(x, y, robust, &doublepluszero, 0, iv, &liv, &lv, v);\n\t\tlowese_(iv, &liv, &lv, v, *n, x, surface);\n\t\tnsing = iv[29];\n\t\tehg196_(tau, *d, *span, trL);\n\t\tlowesa_(trL, n, d, &tau, &nsing, one_delta, two_delta);\n\t\tloess_prune(parameter, a, xi, vert, vval);\n\t}\n\telse if (!strcmp(*surf_stat, \"direct/approximate\")) {\n\t\tlowesf_(x, y, weights, iv, &liv, &lv, v, n, x, diagonal, 1, surface);\n\t\tnsing = iv[29];\n\t\tfor(i = 0; i < (*n); i++) *trL = *trL + diagonal[i];\n\t\tlowesa_(trL, n, d, &tau, &nsing, one_delta, two_delta);\n\t}\n\telse if (!strcmp(*surf_stat, \"interpolate/exact\")) {\n\t\that_matrix = Calloc((*n)*(*n), double);\n\t\tLL = Calloc((*n)*(*n), double);\n\t\tlowesb_(x, y, weights, diagonal, 1, iv, &liv, &lv, v);\n\t\tlowesl_(iv, &liv, &lv, v, n, x, hat_matrix);\n\t\tlowesc_(n, hat_matrix, LL, trL, one_delta, two_delta);\n\t\tlowese_(iv, &liv, &lv, v, *n, x, surface);\n\t\tloess_prune(parameter, a, xi, vert, vval);\n\t\tfree(hat_matrix);\n\t\tfree(LL);\n\t}\n\telse if (!strcmp(*surf_stat, \"direct/exact\")) {\n\t\that_matrix = Calloc((*n)*(*n), double);\n\t\tLL = Calloc((*n)*(*n), double);\n\t\t//lowesf_(x, y, weights, iv, liv, lv, v, n, x, hat_matrix, &two, surface);//seems wrong.\n\t\tlowesf_(x, y, weights, iv, &liv, &lv, v, n, x, hat_matrix, 2, surface);\n\t\tlowesc_(n, hat_matrix, LL, trL, one_delta, two_delta);\n        k = (*n) + 1;\n\t\tfor(i = 0; i < (*n); i++)\n\t\t\tdiagonal[i] = hat_matrix[i * k];\n\t\tfree(hat_matrix);\n\t\tfree(LL);\n\t}\n\tloess_free();\n}\n\nstatic void loess_(double *y, double *x_, long *size_info, double *weights,\n            double *span, long  *degree, long  *parametric, long *drop_square,\n            long  *normalize, char\t**statistics, char **surface, double *cell,\n            char **trace_hat_in, long *iterations, double*fitted_values,\n            double *fitted_residuals, double *enp, double *s, double *one_delta,\n            double *two_delta, double *pseudovalues, double*trace_hat_out, double *diagonal,\n            double *robust, double *divisor, long  *parameter, long  *a,\n            double *xi, double *vert, double *vval){\n\tdouble new_cell, trL, delta1, delta2, sum_squares = 0, *pseudo_resid=NULL,\n                trL_tmp = 0, d1_tmp = 0, d2_tmp = 0, sum, mean;\n\tlong\ti, j, k, p, N, D, sum_drop_sqr = 0, sum_parametric = 0, setLf,\t\n                nonparametric = 0, zero = 0, max_kd;\n\tchar   *new_stat;\n\n\tD = size_info[0];\n\tN = size_info[1];\n\tmax_kd = (N > 200 ? N : 200);\n\t*one_delta = *two_delta = *trace_hat_out = 0;\n\n\tdouble x[D * N], x_tmp[D*N], temp[N], xi_tmp[max_kd], vert_tmp[D * 2], \n                vval_tmp[(D + 1) * max_kd], diag_tmp[N];\n\tlong a_tmp[max_kd], param_tmp[N], order_parametric[D], order_drop_sqr[D];\n    integer int_temp[N];//original code sent double, but lowesw & lowesp want an int\n\n    if((*iterations) > 0)\n        pseudo_resid =  malloc(N * sizeof(double));\n\n\tnew_cell = (*span) * (*cell);\n\tfor(i = 0; i < N; i++)\n\t\trobust[i] = 1;\n        for(i = 0; i < (N * D); i++)\n                x_tmp[i] = x_[i];\n\tif((*normalize) && (D > 1)) {\n\t\tint cut = ceil(0.100000000000000000001 * N);\n\t\tfor(i = 0; i < D; i++) {\n\t\t\tk = i * N;\n\t\t\tfor(j = 0; j < N; j++)\n\t\t\t\ttemp[j] = x_[k + j];\n\t\t\tqsort(temp, N, sizeof(double), comp);\n\t\t\tsum = 0;\n\t\t\tfor(j = cut; j <= (N - cut - 1); j++)\n\t\t\t        sum = sum + temp[j];\n\t\t\tmean = sum / (N - 2 * cut);\n\t\t\tsum = 0;\n\t\t\tfor(j = cut; j <= (N - cut - 1); j++) {\n\t\t\t\ttemp[j] = temp[j] - mean;\n\t\t\t\tsum = sum + temp[j] * temp[j];\n\t\t\t}\n\t\t\tdivisor[i] = sqrt(sum / (N - 2 * cut - 1));\n\t\t\tfor(j = 0; j < N; j++) {\n\t\t\t\tp = k + j;\n\t\t\t\tx_tmp[p] = x_[p] / divisor[i];\t\t\n\t\t\t}\n\t\t}\n\t}\n\telse\n\t\tfor(i = 0; i < D; i++) divisor[i] = 1;\n\tj = D - 1;\n\tfor(i = 0; i < D; i++) {\n\t\tsum_drop_sqr = sum_drop_sqr + drop_square[i];\n\t\tsum_parametric = sum_parametric + parametric[i];\n\t\tif(parametric[i])\n\t\t\torder_parametric[j--] = i;\n\t\telse\n\t\t\torder_parametric[nonparametric++] = i;\n\t}\n    for(i = 0; i < D; i++) {\n        order_drop_sqr[i] = 2 - drop_square[order_parametric[i]];\n        k = i * N;\n        p = order_parametric[i] * N;\n        for(j = 0; j < N; j++)\n            x[k + j] = x_tmp[p + j];\n    }\n    Apop_assert_n(!((*degree) == 1 && sum_drop_sqr), \n                \"Specified the square of a factor predictor to be dropped when degree = 1\");\n\tApop_assert_n(!(D == 1 && sum_drop_sqr), \n                \"Specified the square of a predictor to be dropped with only one numeric predictor\");\n\tApop_assert_n(sum_parametric != D, \"Specified parametric for all predictors\");\n\tfor(j = 0; j <= (*iterations); j++) {\n\t\tnew_stat = j ? \"none\" : *statistics;\n\t\tfor(i = 0; i < N; i++)\n\t\t\trobust[i] = weights[i] * robust[i];\n\t\tcondition(surface, new_stat, trace_hat_in);\n\t\tsetLf = !strcmp(surf_stat, \"interpolate/exact\");\n\t\tloess_raw(y, x, weights, robust, &D, &N, span, degree, &nonparametric, order_drop_sqr, \n                &sum_drop_sqr, &new_cell, &surf_stat, fitted_values, parameter, a,\n                xi, vert, vval, diagonal, &trL, &delta1, &delta2, &setLf);\n\t\tif(j == 0) {\n\t\t\t*trace_hat_out = trL;\n\t\t\t*one_delta = delta1;\n\t\t\t*two_delta = delta2;\n\t\t}\n\t\tfor(i = 0; i < N; i++)\n\t\t\tfitted_residuals[i] = y[i] - fitted_values[i];\n\t\tif(j < (*iterations))\n\t\t\tlowesw_(fitted_residuals, &N, robust, int_temp);\n\t}\n\tif((*iterations) > 0) {\n\t\tpseudovals(N, y, fitted_values, weights, robust, int_temp, pseudovalues);\n\t\t\n        //BK: I believe that temp here does not rely on prior temp\n\t\tloess_raw(pseudovalues, x, weights, weights, &D, &N, span, degree, &nonparametric, \n            order_drop_sqr, &sum_drop_sqr, &new_cell, &surf_stat, temp, param_tmp, a_tmp, \n            xi_tmp, vert_tmp, vval_tmp, diag_tmp, &trL_tmp, &d1_tmp, &d2_tmp, &zero);\n\t\tfor(i = 0; i < N; i++)\n\t\t\tpseudo_resid[i] = pseudovalues[i] - temp[i];\n\t}\n\tif(*iterations == 0)\n\t\tfor(i = 0; i < N; i++)\n\t\t\tsum_squares = sum_squares + weights[i] * fitted_residuals[i] * fitted_residuals[i];\n\telse\n\t\tfor(i = 0; i < N; i++)\n\t\t\tsum_squares = sum_squares + weights[i] * pseudo_resid[i] * pseudo_resid[i];\n\t*enp = (*one_delta) + 2 * (*trace_hat_out) - N;\n\t*s = sqrt(sum_squares / (*one_delta));\n\n    if((*iterations) > 0)\n        free(pseudo_resid);\n}\n\nvoid loess( struct\tloess_struct\t*lo) {\n\tlong size_info[2] = {lo->in.p, lo->in.n};\n\tlong iterations = (!strcmp(lo->model.family, \"gaussian\"))\n                        ? 0\n                        : lo->control.iterations;\t\t\n    if(!strcmp(lo->control.trace_hat, \"wait.to.decide\")) {\n        if(!strcmp(lo->control.surface, \"interpolate\"))\n            lo->control.trace_hat = (lo->in.n < 500) ? \"exact\" : \"approximate\";\n        else\n            lo->control.trace_hat = \"exact\";\n    }\n\tloess_(lo->in.y, lo->in.x, size_info, lo->in.weights,\n\t\t&lo->model.span,\n\t\t&lo->model.degree,\n\t\tlo->model.parametric,\n\t\tlo->model.drop_square,\n\t\t&lo->model.normalize,\n\t\t&lo->control.statistics,\n\t\t&lo->control.surface,\n\t\t&lo->control.cell,\n\t\t&lo->control.trace_hat,\n\t\t&iterations,\n\t\tlo->out.fitted_values,\n\t\tlo->out.fitted_residuals,\n\t\t&lo->out.enp,\n\t\t&lo->out.s,\n\t\t&lo->out.one_delta,\n\t\t&lo->out.two_delta,\n\t\tlo->out.pseudovalues,\n\t\t&lo->out.trace_hat,\n\t\tlo->out.diagonal,\n\t\tlo->out.robust,\n\t\tlo->out.divisor,\n\t\tlo->kd_tree.parameter,\n\t\tlo->kd_tree.a,\n\t\tlo->kd_tree.xi,\n\t\tlo->kd_tree.vert,\n\t\tlo->kd_tree.vval);\n}\t\n\nvoid loess_free_mem(struct loess_struct *lo) {\n    free(lo->in.x);\n    free(lo->in.y);\n    free(lo->in.weights);\n    free(lo->out.fitted_values);\n    free(lo->out.fitted_residuals);\n    free(lo->out.pseudovalues);\n    free(lo->out.diagonal);\n    free(lo->out.robust);\n    free(lo->out.divisor);\n    free(lo->kd_tree.parameter);\n    free(lo->kd_tree.a);\n    free(lo->kd_tree.xi);\n    free(lo->kd_tree.vert);\n    free(lo->kd_tree.vval);\n}\n\nvoid loess_summary(struct loess_struct lo, FILE *ap) {\n    fprintf(ap, \"Number of Observations: %ld\\n\", lo.in.n);\n    fprintf(ap, \"Equivalent Number of Parameters: %.1f\\n\", lo.out.enp);\n    if(!strcmp(lo.model.family, \"gaussian\"))\n        fprintf(ap, \"Residual Standard Error: \");\n    else\n        fprintf(ap, \"Residual Scale Estimate: \");\n    fprintf(ap, \"%.4f\\n\", lo.out.s);\n}\n\n//misc.c ---anova and support fns\n\n/* Incomplete beta function.\n * Reference:  Abramowitz and Stegun, 26.5.8.\n * Assumptions: 0 <= x <= 1; a,b > 0.\n */\n#define DOUBLE_EPS      2.2204460492503131E-16\n#define IBETA_LARGE     1.0e30\n#define IBETA_SMALL     1.0e-30\n\ndouble ibeta(double x, double a, double b) {\n    int flipped = 0, i, k, count;\n    double I, temp, pn[6], ak, bk, next, prev, factor, val;\n\n    if (x <= 0)\n        return(0);\n    if (x >= 1)\n        return(1);\n\n    /* use ibeta(x,a,b) = 1-ibeta(1-x,b,a) */\n    if ((a+b+1)*x > (a+1)) {\n        flipped = 1;\n        temp = a;\n        a = b;\n        b = temp;\n        x = 1 - x;\n    }\n\n    pn[0] = 0.0;\n    pn[2] = pn[3] = pn[1] = 1.0;\n    count = 1;\n    val = x/(1.0-x);\n    bk = 1.0;\n    next = 1.0;\n    do {\n        count++;\n        k = count/2;\n        prev = next;\n        if (count%2 == 0)\n            ak = -((a+k-1.0)*(b-k)*val)/ ((a+2.0*k-2.0)*(a+2.0*k-1.0));\n        else\n            ak = ((a+b+k-1.0)*k*val)/ ((a+2.0*k)*(a+2.0*k-1.0));\n        pn[4] = bk*pn[2] + ak*pn[0];\n        pn[5] = bk*pn[3] + ak*pn[1];\n        next = pn[4] / pn[5];\n        for (i=0; i<=3; i++)\n            pn[i] = pn[i+2];\n        if (fabs(pn[4]) >= IBETA_LARGE)\n            for (i=0; i<=3; i++)\n                    pn[i] /= IBETA_LARGE;\n        if (fabs(pn[4]) <= IBETA_SMALL)\n            for (i=0; i<=3; i++)\n                    pn[i] /= IBETA_SMALL;\n    } while (fabs(next-prev) > DOUBLE_EPS*prev);\n    factor = a*log(x) + (b-1)*log(1-x);\n    factor -= tgammal(a+1) + tgammal(b) - tgammal(a+b);\n    I = exp(factor) * next;\n    return(flipped ? 1-I : I);\n}\n\n/* For comparing two loess models, which is not as useful as one would hope...\ndouble pf(double q, double df1, double df2) {\n\treturn ibeta(q*df1/(df2+q*df1), df1/2, df2/2);\n}\n\nstruct anova_struct {\n\tdouble\tdfn;\n\tdouble\tdfd;\n\tdouble  F_value;\n\tdouble  Pr_F;\n};\n\nvoid anova(struct loess_struct *one, struct loess_struct *two, struct anova_struct *out){\n\tdouble\tone_d1, one_d2, one_s, two_d1, two_d2, two_s, rssdiff, d1diff, tmp;\n\tint     max_enp;\n  \n\tone_d1 = one->out.one_delta;\n\tone_d2 = one->out.two_delta;\n\tone_s = one->out.s;\n\ttwo_d1 = two->out.one_delta;\n\ttwo_d2 = two->out.two_delta;\n\ttwo_s = two->out.s;\n\n        rssdiff = fabs(one_s * one_s * one_d1 - two_s * two_s * two_d1);\n        d1diff = fabs(one_d1 - two_d1);\n        out->dfn = d1diff * d1diff / fabs(one_d2 - two_d2);\n\tmax_enp = (one->out.enp > two->out.enp);\n\ttmp = max_enp ? one_d1 : two_d1;\n        out->dfd = tmp * tmp / (max_enp ? one_d2 : two_d2);\n\ttmp = max_enp ? one_s : two_s;\n        out->F_value = (rssdiff / d1diff) / (tmp * tmp);\n        out->Pr_F = 1 - pf(out->F_value, out->dfn, out->dfd);\n}\n*/\n\n\n/*\n * Rational approximation to inverse Gaussian distribution.\n * Absolute error is bounded by 4.5e-4.\n * Reference: Abramowitz and Stegun, page 933.\n * Assumption: 0 < p < 1.\n */\n\nstatic double num[] = {\n        2.515517,\n        0.802853,\n        0.010328\n};\n\nstatic double den[] = {\n        1.000000,\n        1.432788,\n        0.189269,\n        0.001308\n};\n\ndouble invigauss_quick(double p) {\n  int lower;\n  double t, n, d, q;\n    if(p == 0.5)\n        return(0);\n    lower = p < 0.5;\n    p = lower ? p : 1 - p;\n    t = sqrt(-2 * log(p));\n    n = (num[2]*t + num[1])*t + num[0];\n    d = ((den[3]*t + den[2])*t + den[1])*t + den[0];\n    q = lower ? n/d - t : t - n/d;\n    return(q);\n}\n\n/*\n * Quick approximation to inverse incomplete beta function,\n * by matching first two moments with the Gaussian distribution.\n * Assumption: 0 < p < 1, a,b > 0.\n */\n\nstatic double invibeta_quick(double p, double a, double b) {\n  double x, m, s;\n    x = a + b;\n    m = a / x;\n    s = sqrt((a*b) / (x*x*(x+1)));\n    return(GSL_MAX(0.0, GSL_MAX(1.0, invigauss_quick(p)*s + m)));\n}\n\n/*\n * Inverse incomplete beta function.\n * Assumption: 0 <= p <= 1, a,b > 0.\n */\n\nstatic double invibeta(double p,double  a,double  b) {\n    int i;\n    double ql, qr, qm, qdiff;\n    double pl, pr, pm, pdiff;\n\n/*    MEANINGFUL(qm);*/\n\tqm = 0;\n    if(p == 0 || p == 1)\n        return(p);\n\n    /* initialize [ql,qr] containing the root */\n    ql = qr = invibeta_quick(p, a, b);\n    pl = pr = ibeta(ql, a, b);\n    if(pl == p)\n        return(ql);\n    if(pl < p)\n        while(1) {\n            qr += 0.05;\n            if(qr >= 1) {\n                pr = qr = 1;\n                break;\n            }\n            pr = ibeta(qr, a, b);\n            if(pr == p)\n                return(pr);\n            if(pr > p)\n                break;\n        }\n    else\n        while(1) {\n            ql -= 0.05;\n            if(ql <= 0) {\n                pl = ql = 0;\n                break;\n            }\n            pl = ibeta(ql, a, b);\n            if(pl == p)\n                return(pl);\n            if(pl < p)\n                break;\n        }\n\n    /* a few steps of bisection */\n    for(i = 0; i < 5; i++) {\n        qm = (ql + qr) / 2;\n        pm = ibeta(qm, a, b);\n        qdiff = qr - ql;\n        pdiff = pm - p;\n        if(fabs(qdiff) < DOUBLE_EPS*qm || fabs(pdiff) < DOUBLE_EPS)\n            return(qm);\n        if(pdiff < 0) {\n            ql = qm;\n            pl = pm;\n        } else {\n            qr = qm;\n            pr = pm;\n        }\n    }\n\n    /* a few steps of secant */\n    for(i = 0; i < 40; i++) {\n        qm = ql + (p-pl)*(qr-ql)/(pr-pl);\n        pm = ibeta(qm, a, b);\n        qdiff = qr - ql;\n        pdiff = pm - p;\n        if(fabs(qdiff) < 2*DOUBLE_EPS*qm || fabs(pdiff) < 2*DOUBLE_EPS)\n            return(qm);\n        if(pdiff < 0) {\n            ql = qm;\n            pl = pm;\n        } else {\n            qr = qm;\n            pr = pm;\n        }\n    }\n\n    /* no convergence */\n    return(qm);\n}\n\nstatic double qt(double p, double df) {\n  double t = invibeta(fabs(2*p-1), 0.5, df/2);\n    return((p>0.5?1:-1) * sqrt(t*df/(1-t)));\n}\n\n\n\n////The apop_model front end.\n\nvoid loess(struct loess_struct *lo) ;\nvoid loess_setup( double  *x, double *y, long n, long p, struct  loess_struct *lo) ;\n\n\nApop_settings_copy(apop_loess, )\nApop_settings_free(apop_loess, loess_free_mem(&(in->lo_s));)\n\nvoid matrix_to_FORTRAN(gsl_matrix *inmatrix, double *outFORTRAN, int start_col){\n    double *current_outcol = outFORTRAN; \n    for (int i=start_col; i< inmatrix->size2; i++){\n        gsl_vector *col = Apop_mcv(inmatrix, i);\n        for (int j=0; j< col->size; j++)\n            current_outcol[j]=gsl_vector_get(col,j);\n        current_outcol += col->size;\n    }\n}\n\n#define lo_set(var, dflt) .var = in.lo_s.var ? in.lo_s.var : dflt\n#define apop_strcmp(a, b) (((a)&&(b) && !strcmp((a), (b))) || (!(a) && !(b)))\n\nApop_settings_init(apop_loess,\n    Apop_assert(in.data, \"I need a .data element to allocate apop_loess_settings.\");\n    int n = in.data->matrix->size1;\n\tint\tmax_kd = n > 200 ? n : 200;\n    int p =  (in.data->vector)\n                ? in.data->matrix->size2\n                : in.data->matrix->size2-1;\n    out->lo_s = (struct loess_struct){\n        .in.n = n,\n        .in.p = p,\n\t    .in.y =  malloc(n * sizeof(double)),\n        .in.x =  malloc(n * p * sizeof(double)),\n\n\t    lo_set(model.degree , 2),\n\t    lo_set(model.normalize , 'y'),\n        .model.family = apop_strcmp(in.lo_s.model.family , \"symmetric\") ? \"symmetric\": \"gaussian\",\n        lo_set(model.span , 0.75),\n        //.model.span = in.span ? in.span : 0.75,\n\n        .control.surface = apop_strcmp(in.lo_s.control.surface , \"direct\") ? \"direct\" : \"interpolate\",\n        .control.statistics = apop_strcmp(in.lo_s.control.statistics , \"exact\") ? \"exact\" : \"approximate\",\n        lo_set(control.cell , 0.2),\n        .control.trace_hat = apop_strcmp(in.lo_s.control.trace_hat , \"exact\") ? \"exact\" \n                        : apop_strcmp(in.lo_s.control.trace_hat , \"approximate\") ? \"approximate\" \n                        : \"wait.to.decide\",\n        lo_set(control.iterations, (apop_strcmp(in.lo_s.model.family , \"symmetric\") ? 4 : 0)),\n\n        .out.fitted_values =  malloc(n * sizeof(double)),\n        .out.fitted_residuals =  malloc(n * sizeof(double)),\n        .out.pseudovalues =  malloc(n * sizeof(double)),\n        .out.diagonal =  malloc(n * sizeof(double)),\n        .out.robust =  malloc(n * sizeof(double)),\n        .out.divisor =  malloc(p * sizeof(double)),\n\n        .kd_tree.parameter =  malloc(7 * sizeof(long)),\n        .kd_tree.a =  malloc(max_kd * sizeof(long)),\n        .kd_tree.xi =  malloc(max_kd * sizeof(double)),\n        .kd_tree.vert =  malloc(p * 2 * sizeof(double)),\n        .kd_tree.vval =  malloc((p + 1) * max_kd * sizeof(double)),\n    };\n    Apop_varad_set(ci_level, 0.95);\n    struct loess_struct *lo = &(out->lo_s);\n    if (in.data->weights)\n        lo->in.weights = in.data->weights->data;\n    else {\n        lo->in.weights = malloc(n * sizeof(double));\n        for (int i = 0; i < n; i++) \n            lo->in.weights[i] = 1;\n    }\n    int startat = 0;\n    if (in.data->vector) //OK, then that's the dependent var.\n        memcpy(lo->in.y, in.data->vector->data, n*sizeof(double));\n    else {  //use the first col as the dep. var.\n        startat =1;\n        gsl_vector *col = Apop_cv(in.data, 0);\n        for (int j=0; j< col->size; j++)\n            lo->in.y[j] = gsl_vector_get(col,j);\n    }\n    matrix_to_FORTRAN(in.data->matrix, lo->in.x, startat);\n\tfor(int i = 0; i < 8; i++)\n        lo->model.parametric[i] = lo->model.drop_square[i] = 0;\n)\n\napop_data * loess_predict (apop_data *in, apop_model *m){\n    //Massage inputs to FORTRAN's format\n    double *eval_here = malloc(sizeof(double)*in->matrix->size1*(in->matrix->size2-1));\n    matrix_to_FORTRAN(in->matrix, eval_here, 1);\n    int want_cov = Apop_settings_get(m, apop_loess, want_predict_ci)=='y';\n    struct pred_struct pred = (struct pred_struct){ };\n\n    predict(eval_here, in->matrix->size1, &(Apop_settings_get(m, apop_loess, lo_s)), &pred, want_cov);\n\n    //Massage FORTRAN's output to Apophenia's formats\n    gsl_vector* firstcol = Apop_cv(in, 0);\n    gsl_vector_view v = gsl_vector_view_array(pred.fit, firstcol->size);\n    gsl_vector_memcpy(firstcol, &(v.vector));\n    apop_data *ci =  apop_data_add_page(in, apop_data_alloc(in->matrix->size1, 3), \"<Confidence>\");\n    apop_name_add(ci->names, \"standard error\", 'c');\n    apop_name_add(ci->names, \"lower CI\", 'c');\n    apop_name_add(ci->names, \"upper CI\", 'c');\n\n    if (want_cov){\n        //Find confidence intervals. Used to be in loess's pointwise().\n        double coverage = Apop_settings_get(m, apop_loess, ci_level);\n        double t_dist = qt(1 - (1 - coverage)/2, pred.df);\n        for(int i = 0; i < in->matrix->size1; i++) {\n            double limit = pred.se_fit[i] * t_dist;\n            apop_data_set(ci, i, 0, limit);\n            apop_data_set(ci, i, 1, pred.fit[i] + limit);\n            apop_data_set(ci, i, 2, pred.fit[i] - limit);\n        }\t\n    }\n\n    free(eval_here);\n    pred_free_mem(&pred);\n    return in;\n}\n\nstatic double onerow(gsl_vector *v, void *sd){ \n    return log(gsl_ran_gaussian_pdf(v->data[2], *((double*)sd))); \n}\n\n//Assumes one gaussian, unweighted.\nstatic long double loess_ll(apop_data *d, apop_model *m){\n    apop_data *exp = apop_data_get_page(d, \"<Predicted>\");\n    Apop_col_tv(exp, \"residual\", residuals);\n    double sd = sqrt(apop_vector_var(residuals));\n    return apop_map_sum(exp, .param=&sd, .part='r', .fn_vp= onerow);\n}\n\nstatic void apop_loess_est(apop_data *d, apop_model *out){\n    if (!Apop_settings_get_group(out, apop_loess))\n        Apop_model_add_group(out, apop_loess, .data=d);\n    out->data = d;\n    loess(&Apop_settings_get(out, apop_loess, lo_s));\n\n    //setup the expected matrix. In a perfect world, this wouldn't all be cut/pasted from apop_OLS.\n    //Also, it wouldn't be 14 lines.\n    apop_data *expect = apop_data_add_page(out->info, apop_data_alloc(d->matrix->size1, 3), \"<Predicted>\");\n    if (!out->info) out->info = expect;\n    apop_name_add(expect->names, (out->data->names->colct ? out->data->names->col[0] : \"Expected\"), 'c');\n    apop_name_add(expect->names, \"Predicted\", 'c');\n    apop_name_add(expect->names, \"Residual\", 'c');\n    gsl_vector *v = gsl_vector_alloc(d->matrix->size1);\n    double *holding = v->data;\n    v->data = Apop_settings_get(out, apop_loess, lo_s.in.y);\n    gsl_vector_memcpy(Apop_cv(expect, 0), v);\n    v->data = Apop_settings_get(out, apop_loess, lo_s.out.fitted_values);\n    gsl_vector_memcpy(Apop_cv(expect, 1), v);\n    v->data = Apop_settings_get(out, apop_loess, lo_s.out.fitted_residuals);\n    gsl_vector_memcpy(Apop_cv(expect, 2), v);\n    v->data = holding;\n    gsl_vector_free(v);\n}\n\nstatic void apop_loess_print(apop_model *in, FILE *out){\n    loess_summary(Apop_settings_get(in, apop_loess, lo_s), out);\n}\n\nstatic void loess_prep(apop_data *data, apop_model *params){\n    apop_predict_vtable_add(loess_predict, apop_loess);\n    apop_model_print_vtable_add(apop_loess_print, apop_loess);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_loess = &(apop_model){.name=\"Loess smoothing\", .vsize = -1, .dsize=1, \n        .estimate =apop_loess_est, .log_likelihood = loess_ll, .prep = loess_prep};\n"
  },
  {
    "path": "model/apop_multinomial.c",
    "content": "/* The binomial distribution as an \\c apop_model.\nCopyright (c) 2006--2007, 2010--11 by Ben Klemens.  Licensed under the GPLv2; see COPYING. \n\n \\amodel apop_binomial The multi-draw generalization of the Bernoulli, or the two-bin special case of the \\ref apop_multinomial \"Multinomial distribution\".\n\nIt is implemented as an alias of the \\ref apop_multinomial model, except that\nit has an explicit CDF, we know it has two parameters, and its draw method returns a scalar. I.e., \n<tt>.vsize==2</tt> and <tt>.dsize==1</tt>.\n\n\\adoc    Parameter_format   a vector, v[0]=\\f$n\\f$; v[1]=\\f$p_1\\f$. Thus, \\f$p_0\\f$\n        isn't written down; see \\ref apop_multinomial for further discussion.\n        If you input \\f$v[1]>1\\f$ and <tt>apop_opts.verbose >=1</tt>, the log likelihood\n        function will throw a warning.\n        Post-estimate, will have a <tt>\\<Covariance\\></tt> page with the covariance\n        matrix for the \\f$p\\f$s (\\f$n\\f$ effectively has no variance).\n\n\n\\adoc    Input_format Each row of the matrix is one observation, consisting of two elements.\n  The number of draws of type zero (sometimes read as `misses' or `failures') are in column zero, \n  the number of draws of type one (`hits', `successes') in column one.\n\n\\adoc    RNG The RNG returns a single number representing the success count, not a\n    vector of length two giving both the failure bin and success bin. This is notable\n    because it differs from the input data format, but it tends to be what people expect\n    from a Binomial RNG. For draws with both dimensions (or situations where draws are fed back into the model), use an \\ref apop_multinomial model\n    with <tt>.vsize =2</tt>.\n*/\n\n#include \"apop_internal.h\"\n\n/* \\adoc cdf At the moment, only implemented for the Binomial.\n  Let the first element of the data set (top of the vector or point (0,0) in the\n  matrix, your pick) be $L$; then I return the sum of the odds of a draw from the given\n  Binomial distribution returning $0, 1, \\dots, L$ hits.  */\nstatic long double binomial_cdf(apop_data *d, apop_model *est){\n    Nullcheck_mpd(d, est, GSL_NAN)\n    Get_vmsizes(d); //firstcol\n    double hitcount = apop_data_get(d, .col=firstcol);\n    double n = gsl_vector_get(est->parameters->vector, 0);\n    double p = gsl_vector_get(est->parameters->vector, 1);\n    return gsl_cdf_binomial_P(hitcount, p, n);\n}\n\nstatic void make_covar(apop_model *est){\n    int size = est->parameters->vector->size;\n    //the trick where we turn the params into a p-vector\n    double * pv = est->parameters->vector->data;\n    int n = pv[0];\n    pv[0] = 1 - (apop_sum(est->parameters->vector)-n);\n\n    apop_data *cov = apop_data_add_page(est->parameters, \n                            apop_data_alloc(size, size), \"<Covariance>\");\n    for (int i=0; i < size; i++){\n        double p = apop_data_get(est->parameters, i, -1);\n        apop_data_set(cov, i, i, n * p *(1-p));\n        for (int j=i+1; j < size; j++){\n            double pj = apop_data_get(est->parameters, j, -1);\n            double thiscell = -n*p*pj;\n            apop_data_set(cov, i, j, thiscell);\n            apop_data_set(cov, j, i, thiscell);\n        }\n    }\n    pv[0]=n;\n}\n\nstatic long double multinomial_constraint(apop_data *data, apop_model *b){\n  //constraint is that 0 < all elmts, and  1>all ps.\n    int size = b->parameters->vector->size;\n    static threadlocal apop_data *constr;\n    if (constr && constr->matrix->size2 != size)\n        apop_data_free(constr);\n    if (!constr){\n        constr = apop_data_calloc(size*2-1, size*2-1, size);\n\n        //top half: 0 < [param], including param 0\n        gsl_matrix_set_identity(Apop_subm(constr->matrix, 0, 0, size, size));\n\n        //bottom (almost) half: 1 >= [param], excluding param 0\n        for (int i=size; i < size*2-1; i++){\n            apop_data_set(constr, i, -1, -1);\n            apop_data_set(constr, i, i-size+1, -1);\n        }\n    }\n    return apop_linear_constraint(b->parameters->vector, constr);\n}\n\nstatic double binomial_ll(gsl_vector *hits, void *paramv){\n    return log(gsl_ran_binomial_pdf(hits->data[1], ((gsl_vector*)paramv)->data[1], ((gsl_vector*)paramv)->data[0]));\n}\n\nstatic double multinomial_ll(gsl_vector *v, void *params){\n    double *pv = ((apop_model*)params)->parameters->vector->data;\n    size_t size = ((apop_model*)params)->parameters->vector->size;\n    unsigned int hv[v->size]; //The GSL wants our hit count in an int*.\n    for (size_t i=0; i < v->size; i ++)\n        hv[i] = gsl_vector_get(v, i);\n    return gsl_ran_multinomial_lnpdf(size, pv, hv);\n}\n\nstatic long double multinomial_log_likelihood(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN);\n    double *pv = params->parameters->vector->data;\n    double n = pv[0]; \n    Apop_assert_c(params->parameters->vector->size>=2, GSL_NAN, 0, \"I need two or more input parameters \"\n                    \"representing [n, p_1, (...)].\");\n    Apop_assert_c(pv[1] <=1, GSL_NAN, 1, \"The input parameters should be [n, p_1, (...)], but \"\n        \"element 1 of the parameter vector is >1.\"); //mostly makes sense for the binomial.\n    if (n==2) return apop_map_sum(d, .fn_vp=binomial_ll, .param=params->parameters->vector);\n\n    pv[0] = 1 - (apop_sum(params->parameters->vector)-n);//making the params a p-vector. Put n back at the end.\n    double out = apop_map_sum(d, .fn_vp=multinomial_ll, .param=params);\n    pv[0]=n;\n    return out;\n}\n\n/*\nstatic apop_model *multinomial_paramdist(apop_data *d, apop_model *m){\n    apop_pm_settings *settings = Apop_settings_get_group(m, apop_pm);\n    if (settings->index!=-1){\n        int i = settings->index;\n        double mu = apop_data_get(m->parameters, i, -1);\n        double sigma = sqrt(apop_data_get(m->parameters, i, i, .page=\"<Covariance>\"));\n        int df = apop_data_get(m->info, .rowname=\"df\", .page = \"info\");\n        return apop_model_set_parameters(apop_t_distribution, mu, sigma, df);\n    }\n\n}\n*/\n\nstatic int multinomial_rng(double *out, gsl_rng *r, apop_model* est){\n    Nullcheck_mp(est, 1);\n    double * p = est->parameters->vector->data;\n    //the trick where we turn the params into a p-vector\n    int N = p[0];\n\n    if (est->parameters->vector->size == 2) {\n        *out = gsl_ran_binomial_knuth(r, 1-gsl_vector_get(est->parameters->vector, 1), N);\n        out[1] = N-*out;\n        goto done;\n    }\n    //else, multinomial\n    //cut/pasted/modded from the GSL. Copyright them.\n    p[0] = 1 - (apop_sum(est->parameters->vector)-N);\n    double sum_p = 0.0;\n    int sum_n = 0;\n\n    for (int i = 0; i < est->parameters->vector->size; i++) {\n        out[i] = (p[i] > 0)\n                ? gsl_ran_binomial (r, p[i] / (1 - sum_p), N - sum_n)\n                : 0;\n        sum_p += p[i];\n        sum_n += out[i];\n    }\n    done:\n    p[0] = N;\n    return 0;\n}\n\nstatic void multinomial_show(apop_model *est, FILE *out){\n    double * p = est->parameters->vector->data;\n    int N=p[0];\n    p[0] = 1 - (apop_sum(est->parameters->vector)-N);\n    fprintf(out, \"%s, with %i draws.\\nBin odds:\\n\", est->name, N);\n    apop_vector_print(est->parameters->vector, .output_pipe=out);\n    p[0]=N;\n}\n\ndouble avs(gsl_vector *v){return (double) apop_vector_sum(v);}\n\n/* \\amodel apop_multinomial The \\f$n\\f$--option generalization of the \\ref apop_binomial \"Binomial distribution\".\n\n\\adoc estimated_info   Reports <tt>log likelihood</tt>. */\nstatic void multinomial_estimate(apop_data * data,  apop_model *est){\n    Nullcheck_mpd(data, est, );\n    Get_vmsizes(data); //vsize, msize1\n    est->parameters= apop_map(data, .fn_v=avs, .part='c');\n    gsl_vector *v = est->parameters->vector;\n    int n = apop_sum(v)/data->matrix->size1; //size of one row\n    apop_vector_normalize(v);\n    apop_name_add(est->parameters->names, \"n\", 'r');\n    apop_data_set(est->parameters, .val=n); //zeroth item is now n, not p_0\n    char name[100];\n    for(int i=1; i < v->size; i++){\n        sprintf(name, \"p%i\", i);\n        apop_name_add(est->parameters->names, name, 'r');\n    }\n    est->dsize = n;\n    make_covar(est);\n    apop_data_add_named_elmt(est->info, \"log likelihood\", multinomial_log_likelihood(data, est));\n}\n\nstatic void multinom_prep(apop_data *data, apop_model *params){\n    apop_model_print_vtable_add(multinomial_show, params);\n    apop_model_clear(data, params);\n}\n\n/* \\adoc    Input_format Each row of the matrix is one observation: a set of draws from a single bin.\n  The number of draws of type zero are in column zero, the number of draws of type one in column one, et cetera.\n\n   \\li You may have a set of several Bernoulli-type draws, which could be summed together\nto form a single Binomial draw.  The \\ref apop_data_to_dummies function (using the\n<tt>.keep_first='y'</tt> option), to split a single column of numbers into a sequence\nof columns, may help with this.\n\n\\adoc    Parameter_format\n        The parameters are kept in the vector element of the \\c apop_model parameters element. \\c parameters->vector->data[0]==n;\n        \\c parameters->vector->data[1...]==p_1....\n\nThe numeraire is bin zero, meaning that \\f$p_0\\f$ is not explicitly listed, but is\n\\f$p_0=1-\\sum_{i=1}^{k-1} p_i\\f$, where \\f$k\\f$ is the number of bins. Conveniently enough,\nthe zeroth element of the parameters vector holds \\f$n\\f$, and so a full probability vector can\neasily be produced by overwriting that first element. For example:\n\\code \napop_model *estimated = apop_estimate(your_data, apop_multinomial);\nint n = apop_data_get(estimated->parameters); \napop_data_set(estimated->parameters, .val=1 - (apop_sum(estimated->parameters)-n)); \n\\endcode\nAnd now the parameter vector is a proper list of probabilities.\n\n\\li Because an observation is a single row, the number of bins, \\f$k\\f$ is set to equal\nthe length of the first row (counting both vector and matrix elements, as appropriate).\nThe covariance matrix will be \\f$k \\times k\\f$.\n\n\\li Each row should sum to \\f$N\\f$, the number of draws. The estimation routine doesn't check this, but instead uses the average sum across all rows.\n\n\\adoc    Estimate_results  Parameters are estimated. Covariance matrix is filled.   \n\\adoc    RNG Returns a single vector of length \\f$k\\f$, the result of an imaginary tossing \n        of \\f$N\\f$ balls into \\f$k\\f$ urns, with the given probabilities.\n            */\n\napop_model *apop_multinomial = &(apop_model){\"Multinomial distribution\", -1, .dsize=-1,\n\t.estimate = multinomial_estimate, .log_likelihood = multinomial_log_likelihood, \n   .constraint = multinomial_constraint, .draw = multinomial_rng, .prep=multinom_prep};\n\napop_model *apop_binomial = &(apop_model){\"Binomial distribution\", 2, .dsize=1,\n\t.estimate = multinomial_estimate, .log_likelihood = multinomial_log_likelihood, \n   .constraint = multinomial_constraint, .draw = multinomial_rng, .prep=multinom_prep, .cdf= binomial_cdf};\n"
  },
  {
    "path": "model/apop_multivariate_normal.c",
    "content": "/* apop_multivariate_normal.c  The multivariate Normal distribution.\nCopyright (c) 2007 by Ben Klemens.  Licensed under the GPLv2; see COPYING.\n\n\\amodel apop_multivariate_normal This is the multivariate generalization of the Normal distribution.\n\n\\adoc    Input_format     Each row of the matrix is an observation.\n\\adoc    Parameter_format  An \\c apop_data set whose vector element is the vector of\n                            means, and whose matrix is the covariances.\n\nIf you had only one dimension, the mean would be a vector of size one, and the covariance\nmatrix a \\f$1\\times 1\\f$ matrix. This differs from the setup for \\ref apop_normal, which\noutputs a single vector with \\f$\\mu\\f$ in element zero and \\f$\\sigma\\f$ in element one.\n\nAfter estimation, the <tt>\\<Covariance\\></tt> page gives the covariance matrix of the means.\n*/\n \n#include \"apop_internal.h\"\n\nstatic double x_prime_sigma_x(gsl_vector *x, gsl_matrix *sigma){\n    gsl_vector * sigma_dot_x = gsl_vector_calloc(x->size);\n    double the_result;\n    gsl_blas_dsymv(CblasUpper, 1, sigma, x, 0, sigma_dot_x); //sigma should be symmetric\n    gsl_blas_ddot(x, sigma_dot_x, &the_result);\n    gsl_vector_free(sigma_dot_x);\n    return the_result;\n}\n\nstatic long double apop_multinormal_ll(apop_data *data, apop_model * m){\n    Nullcheck_mpd(data, m, GSL_NAN);\n    double determinant = 0;\n    gsl_matrix* inverse = NULL;\n    int i, dimensions  = data->matrix->size2;\n    gsl_vector* x_minus_mu= gsl_vector_alloc(data->matrix->size2);\n    double ll = 0;\n    determinant = apop_det_and_inv(m->parameters->matrix, &inverse, 1,1);\n    Apop_stopif(isnan(determinant) || determinant == 0,\n        gsl_vector_free(x_minus_mu); return GSL_NEGINF, //tell maximizers to look elsewhere.\n         1, \"the determinant of the given covariance is zero or NaN. Returning GSL_NEGINF.\"); \n    Apop_stopif(determinant <= 0, return NAN, 0, \"The determinant of the covariance matrix you gave me \"\n            \"is negative, but a covariance matrix must always be positive semidefinite \"\n            \"(and so have nonnegative determinant). Maybe run apop_matrix_to_positive_semidefinite?\");\n    for (i=0; i< data->matrix->size1; i++){\n        gsl_vector_memcpy(x_minus_mu, Apop_rv(data, i));\n        gsl_vector_sub(x_minus_mu, m->parameters->vector);\n        ll += - x_prime_sigma_x(x_minus_mu, inverse) / 2;\n    }\n    ll -= data->matrix->size1 * (log(2 * M_PI)* dimensions/2. + .5 * log(determinant));\n    gsl_matrix_free(inverse);\n    gsl_vector_free(x_minus_mu);\n    return ll;\n}\n\nstatic double a_mean(gsl_vector * in){ return apop_vector_mean(in); }\n\n/* \\adoc estimated_info   Reports <tt>log likelihood</tt>.  */ \nstatic void multivariate_normal_estimate(apop_data * data, apop_model *p){\n    p->parameters = apop_map(data, .fn_v=a_mean, .part='c'); \n    apop_data *cov =  apop_data_covariance(data);\n    p->parameters->matrix =  cov->matrix;\n    cov->matrix = NULL; apop_data_free(cov);\n    apop_data_add_named_elmt(p->info, \"log likelihood\", apop_multinormal_ll(data, p));\n}\n\n/* \\adoc RNG From <a href=\"http://cgm.cs.mcgill.ca/~luc/mbookindex.html\">Devroye (1986)</a>, p 565.  */\nstatic int mvnrng(double *out, gsl_rng *r, apop_model *eps){\n    apop_data *params = eps->parameters;\n    gsl_vector *v = gsl_vector_alloc(params->vector->size);\n    gsl_vector *dotted = gsl_vector_calloc(params->vector->size);\n    for (size_t i=0; i< params->vector->size; i++)\n        gsl_vector_set(v, i, gsl_ran_gaussian(r, 1));\n    gsl_matrix *copy  = apop_matrix_copy(params->matrix);\n        gsl_linalg_cholesky_decomp(copy); //returns upper and lower triangle; we want just one.\n    for (size_t i=0; i< copy->size1; i++)\n        for (size_t j=i+1; j< copy->size2; j++)\n            gsl_matrix_set(copy, i, j, 0);\n    gsl_blas_dgemv(CblasNoTrans, 1, copy, v, 0, dotted);\n    for (size_t i=0; i< params->vector->size; i++)\n        out[i]  = gsl_vector_get(dotted, i) + gsl_vector_get(params->vector,i);\n    gsl_vector_free(v);\n    gsl_vector_free(dotted);\n    gsl_matrix_free(copy);\n    return 0;\n}\n\nstatic void mvn_prep(apop_data *d, apop_model *m){\n    if (d && d->matrix)    m->dsize = d->matrix->size2; \n    else if (m->vsize > 0) m->dsize = m->vsize;\n    apop_model_clear(d, m);\n}\n\nstatic long double mvn_constraint(apop_data *d, apop_model *m){\n    return apop_matrix_to_positive_semidefinite(m->parameters->matrix);\n}\n\napop_model *apop_multivariate_normal= &(apop_model){\"Multivariate normal distribution\", -1,-1,-1, .dsize=-2,\n     .estimate = multivariate_normal_estimate, .log_likelihood = apop_multinormal_ll, \n     .draw = mvnrng, .prep=mvn_prep, .constraint = mvn_constraint};\n"
  },
  {
    "path": "model/apop_normal.c",
    "content": "/* The Normal and Lognormal distributions.\n Copyright (c) 2005--2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n\\amodel apop_normal\n\nYou know it, it's your attractor in the limit, it's the Gaussian distribution.\n\n\\f$N(\\mu,\\sigma^2) = {1 \\over \\sqrt{2 \\pi \\sigma^2}} \\exp (-x^2 / 2\\sigma^2)\\f$\n\n\\f$\\ln N(\\mu,\\sigma^2) = (-(x-\\mu)^2 / 2\\sigma^2) - \\ln (2 \\pi \\sigma^2)/2 \\f$\n\n\\f$d\\ln N(\\mu,\\sigma^2)/d\\mu = (x-\\mu) / \\sigma^2 \\f$\n\n\\f$d\\ln N(\\mu,\\sigma^2)/d\\sigma^2 = ((x-\\mu)^2 / 2(\\sigma^2)^2) - 1/2\\sigma^2 \\f$\n\nSee also the \\ref apop_multivariate_normal.\n\n\\adoc    Input_format A scalar, in the \\c vector or \\c matrix elements of the input \\ref apop_data set.\n\\adoc    Settings   None.\n\\adoc    Parameter_format  Parameter zero (in the vector) is the mean, parmeter one is the standard deviation (i.e., the square root of the variance). \nAfter estimation, a page is added named <tt>\\<Covariance\\></tt> with the 2 \\f$\\times\\f$ 2 covariance matrix for these two parameters.\n\n\\adoc    Predict  <tt>apop_predict(NULL, estimated_normal_model)</tt> returns the expected value. The <tt>->more</tt>\n                 element holds an \\ref apop_data set with the title <tt>\\<Covariance\\></tt>, whose \n                 matrix holds the covariance of the mean.\n*/\n\n#include \"apop_internal.h\"\n\nstatic long double positive_sigma_constraint(apop_data *data, apop_model *v){\n    //constraint is 0 < beta_2\n    Staticdef(apop_data *, constraint, apop_data_falloc((1,1,2), 0, 0, 1));\n    return apop_linear_constraint(v->parameters->vector, constraint, 1e-5);\n}\n\n//This just takes the sum of (x-mu)^2. Using gsl_ran_gaussian_pdf\n//would be to calculate log(exp((x-mu)^2)) == slow.\nstatic double apply_me(double x, void *mu){ return x - *(double *)mu; }\n\nstatic double apply_me2(double x, void *mu){ return gsl_pow_2(x - *(double *)mu); }\n\nstatic long double normal_log_likelihood(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN);\n    Get_vmsizes(d)\n    double mu = gsl_vector_get(params->parameters->vector,0);\n    double sd = gsl_vector_get(params->parameters->vector,1);\n    long double ll  = -apop_map_sum(d, .fn_dp = apply_me2, .param = &mu)/(2*gsl_pow_2(sd));\n    ll -= tsize*((M_LNPI+M_LN2)/2+log(sd));\n\treturn ll;\n}\n\nvoid get_mu_var(apop_data *data, double *mu_out, double *var_out){\n    Get_vmsizes(data)\n    double mmean=0, mvar=0, vmean=0, vvar=0;\n    if (vsize){\n        vmean = apop_mean(data->vector);\n        vvar = apop_var(data->vector);\n    }\n    if (msize1) {\n        if (!vsize) apop_matrix_mean_and_var(data->matrix, &mmean, &mvar);\t\n        else        mmean = apop_matrix_mean(data->matrix);\t\n    }\n    *mu_out = mmean *(msize1*msize2/(tsize+0.0)) + vmean *(vsize/(tsize+0.0));\n    *var_out = 0;\n    if      (!vsize && !msize1) *var_out = 0;\n    else if (vsize && !msize1)  *var_out = vvar;\n    else if (!vsize && msize1)  *var_out = mvar;\n    else {\n        long double vv=0;\n        for (int i=-1; i< msize2; i++)\n            vv += gsl_pow_2(apop_data_get(data, i) - *mu_out);\n        *var_out = vv/tsize;\n    }\n}\n\n/* \\adoc estimated_info Reports the log likelihood.*/\nstatic void normal_estimate(apop_data * data, apop_model *est){\n    Nullcheck_mpd(data, est, );\n    Get_vmsizes(data); //tsize\n    double mean, var;\n    get_mu_var(data, &mean, &var);\n    est->parameters->vector->data[0] = mean;\n    est->parameters->vector->data[1] = sqrt(var);\n\tapop_name_add(est->parameters->names, \"μ\", 'r');\n\tapop_name_add(est->parameters->names, \"σ\",'r');\n\n    apop_lm_settings *p = apop_settings_get_group(est, apop_lm);\n    if (!p) p = Apop_model_add_group(est, apop_lm);\n\tif (!p || p->want_cov=='y'){\n        apop_data *cov = apop_data_get_page(est->parameters, \"<Covariance>\");\n        if (!cov) cov = apop_data_add_page(est->parameters, apop_data_calloc(2, 2), \"<Covariance>\");\n        apop_data_set(cov, 0, 0, mean/tsize);\n        apop_data_set(cov, 1, 1, 2*gsl_pow_2(var)/(tsize-1));\n    }\n    est->data = data;\n    apop_data_add_named_elmt(est->info, \"log likelihood\", normal_log_likelihood(data, est));\n}\n\nstatic long double normal_cdf(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN)\n    Get_vmsizes(d)  //vsize\n    double val = apop_data_get(d, 0, vsize ? -1 : 0);\n    double mu = gsl_vector_get(params->parameters->vector, 0);\n    double sd = gsl_vector_get(params->parameters->vector, 1);\n    return gsl_cdf_gaussian_P(val-mu, sd);\n}\n\nstatic void normal_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *params){    \n    Nullcheck_mpd(d, params, )\n    Get_vmsizes(d)\n    double mu = gsl_vector_get(params->parameters->vector,0),\n           sd = gsl_vector_get(params->parameters->vector,1),\n           dll, sll;\n    dll = apop_map_sum(d, .fn_dp = apply_me, .param=&mu);\n    sll = apop_map_sum(d, .fn_dp = apply_me2, .param=&mu);\n    gsl_vector_set(gradient, 0, dll/gsl_pow_2(sd));\n    gsl_vector_set(gradient, 1, sll/gsl_pow_3(sd)- tsize /sd);\n}\n\n/* \\adoc predict Returns the mean, regardless of the input data you give (including\n\\c NULL). The second page is <tt>\\<Covariance\\></tt> of the mean.*/ \napop_data * normal_predict(apop_data *dummy, apop_model *m){\n    apop_data *out = apop_data_alloc(1,1);\n    out->matrix->data[0] = m->parameters->vector->data[0];\n\n    apop_data *cov = apop_data_get_page(out, \"<Covariance>\");\n    if (!cov) cov = apop_data_add_page(out, apop_data_alloc(1,1), \"<Covariance>\");\n    if (m->data){\n           Get_vmsizes(m->data) //tsize\n           cov->matrix->data[0] = m->parameters->vector->data[1]/ sqrt(tsize);\n    } else cov->matrix->data[0] = 0;\n    return out;\n}\n\n/*\\adoc RNG A wrapper for the GSL's Normal RNG. */\nstatic int normal_rng(double *out, gsl_rng *r, apop_model *p){\n\t*out = gsl_ran_gaussian(r, p->parameters->vector->data[1]) + p->parameters->vector->data[0];\n    return 0;\n}\n\nstatic void normal_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(normal_dlog_likelihood, apop_normal);\n    apop_predict_vtable_add(normal_predict, apop_normal);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_normal = &(apop_model){\"Normal distribution\", 2, 0, 0, .dsize=1,\n .estimate = normal_estimate, .log_likelihood = normal_log_likelihood, \n .prep = normal_prep, .constraint = positive_sigma_constraint, \n .draw = normal_rng, .cdf = normal_cdf};\n\n\n/*\\amodel apop_lognormal\n\nThe log likelihood function for lognormal distributions:\n\n\\f$f = exp(-(ln(x)-\\mu)^2/(2\\sigma^2))/ (x\\sigma\\sqrt{2\\pi})\\f$\n\n\\f$ln f = -(ln(x)-\\mu)^2/(2\\sigma^2) - ln(x) - ln(\\sigma\\sqrt{2\\pi})\\f$\n\n\\adoc    Input_format     A scalar in the the matrix or vector element of the input \\ref apop_data set.\n\\adoc    Parameter_format  Zeroth vector element is the mean of the logged data set; first is the standard deviation of the logged data set.\n\\adoc    Estimate_results  Parameters are set. Log likelihood is calculated.\n\\adoc    settings   None.    \n*/\n\nstatic double lnx_minus_mu_squared(double x, void *mu_in){\n\treturn gsl_pow_2(log(x) - *(double *)mu_in);\n}\n\nstatic long double lognormal_log_likelihood(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN)\n    Get_vmsizes(d) //tsize\n    double mu = gsl_vector_get(params->parameters->vector, 0);\n    double sd = gsl_vector_get(params->parameters->vector, 1);\n    long double ll = -apop_map_sum(d, .fn_dp=lnx_minus_mu_squared, .param=&mu);\n      ll /= (2*gsl_pow_2(sd));\n      ll -= apop_map_sum(d, log);\n      ll -= tsize*((M_LNPI+M_LN2)/2+log(sd));\n\treturn ll;\n}\n\n/* \\adoc estimated_info   Reports <tt>log likelihood</tt>. */\nstatic void lognormal_estimate(apop_data * data, apop_model *est){\n    apop_data *cp = apop_data_copy(data);\n    Apop_stopif(!cp->matrix && !cp->vector, est->error='d'; return, \n            0, \"Neither matrix nor vector in the input data.\");\n    Get_vmsizes(cp); //vsize, msize1\n\n    if (vsize){\n        apop_vector_log(cp->vector);\n    }\n    if (msize2){\n        for (int i=0; i< msize2; i++)\n            apop_vector_log(Apop_cv(cp, i));\n    }\n    double mean, var;\n    get_mu_var(cp, &mean, &var);\n    apop_data_free(cp);\n    est->parameters->vector->data[0] = mean;\n    est->parameters->vector->data[1] = var < 0 ? 0 : sqrt(var); // -ε sometimes happens\n\n    apop_name_add(est->parameters->names, \"μ\", 'r');\n    apop_name_add(est->parameters->names, \"σ\", 'r');\n    apop_data_add_named_elmt(est->info, \"log likelihood\", lognormal_log_likelihood(data, est));\n}\n\nstatic long double lognormal_cdf(apop_data *d, apop_model *params){\n    Nullcheck_mpd(d, params, GSL_NAN)\n    Get_vmsizes(d)  //vsize\n    double val = apop_data_get(d, 0, vsize ? -1 : 0);\n    double mu = gsl_vector_get(params->parameters->vector, 0);\n    double sd = gsl_vector_get(params->parameters->vector, 1);\n    return gsl_cdf_lognormal_P(val, mu, sd);\n}\n\n/* \\adoc predict Returns the expeced value, \\f$E(x) = e^(mu + sigma^2/2)\\f$\n  in the (0, 0)th element of the matrix, regardless of the input data you give (including \\c NULL). */ \napop_data * lognormal_predict(apop_data *dummy, apop_model *m){\n    apop_data *out = apop_data_alloc(1,1);\n    out->matrix->data[0] = exp(m->parameters->vector->data[0] \n                                + gsl_pow_2(m->parameters->vector->data[1])/2);\n    return out;\n}\n\ndouble diff_sq(double x, void *mu){ return gsl_pow_2(log(x) - *(double*)mu); }\n\nstatic void lognormal_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *params){    \n    double mu = gsl_vector_get(params->parameters->vector,0),\n           sd = gsl_vector_get(params->parameters->vector,1);\n    Get_vmsizes(d); //tsize\n    double dll = apop_map_sum(d, log) - mu*tsize;\n    double sll = apop_map_sum(d, .fn_dp=diff_sq, .param=&mu);\n    gsl_vector_set(gradient, 0, dll/gsl_pow_2(sd));\n    gsl_vector_set(gradient, 1, sll/gsl_pow_3(sd)- tsize/sd);\n}\n\n/* \\adoc RNG An Apophenia wrapper for the GSL's Normal RNG, exponentiated.  */\nstatic int lognormal_rng(double *out, gsl_rng *r, apop_model *p){\n\t*out = exp(gsl_ran_gaussian(r, p->parameters->vector->data[1]) + p->parameters->vector->data[0]);\n    return 0;\n}\n\nstatic void lognormal_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(lognormal_dlog_likelihood, apop_lognormal);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_lognormal = &(apop_model){\"Lognormal distribution\", 2, 0, 0, .dsize=1,\n .estimate = lognormal_estimate, .log_likelihood = lognormal_log_likelihood,\n .prep = lognormal_prep, .constraint = positive_sigma_constraint, \n .draw = lognormal_rng, .cdf= lognormal_cdf};\n"
  },
  {
    "path": "model/apop_ols.c",
    "content": "/* OLS models. Much of the real work is done in apop_regression.c.\nCopyright (c) 2005--2007, 2010 by Ben Klemens.  Licensed under the GPLv2; see COPYING.\n\n\\amodel apop_ols Ordinary least squares. Weighted least squares is also handled by this model.\n\n\\adoc    Input_format   See the notes on the prep routine.\n\nIf you provide weights in \\c your_input_data->weights, then I will use them\nappropriately. That is, the \\ref apop_ols model really implements Weighted Least Squares,\nbut in most cases <tt>weights==NULL</tt> and the math reduces to the special case of\nOrdinary Least Squares.\n\n\\adoc    Parameter_format  A vector of OLS coefficients. Coefficient zero\n                refers to the constant column, if any. \n                The \\c vector of the output will therefore be of size <tt>data->size2</tt>.\n\nThe estimation routine appends a page to the <tt>parameters</tt> named\n<tt>\\<Covariance\\></tt>, giving the covariance matrix for the estimated parameters\n(not the data itself). If the predicted values are calculated (and appended to\nthe <tt>info</tt> page), then a page is appended to the <tt>parameters</tt>  named\n<tt>\\<Error variance\\></tt>, giving the variance of the error term.\n\n\n\\adoc    estimated_parameter_model  For the mean, a noncentral \\f$t\\f$ distribution (\\ref apop_t_distribution).\n\\adoc    Prep_routine      \nIf your input data has no \\c vector element, then column zero of the matrix is taken\nto be the dependent variable. This routine moves the dependent variable to the \\c vector, and replaces\ncolumn zero with a column of all ones, indicating a constant term. This is the norm\nfor OLS, and is probably what you want. The easiest way to generate data for this sort\nof process is via a query like <tt>apop_query_to_matrix(\"select depvar, independent_var1,\nindependent_var2 from dataset\")</tt>.\n\nIf your data has a \\c vector element, then the prep routines won't try to force something\nto be there. That is, nothing will be moved, and no constant column generated. If\nyou don't want to use a constant column, or your data has already been prepped by an\nestimation, then this is what you want. See \\ref apop_query_to_mixed_data for an easy\nway to generate a data set like this via queries.\n\n\n\\adoc    settings  \\ref apop_lm_settings \n\\adoc    Examples \\ref gentle opens with a sample program using OLS. For quick reference,\nhere is the program, but see that page for a full discussion.\n\n\\include ols.c\n*/\n\n#include \"apop_internal.h\"\n\nstatic void ols_score(apop_data *d, gsl_vector *gradient, apop_model *p);\napop_model *ols_param_models(apop_data *d, apop_model *m);\napop_data *ols_predict(apop_data *in, apop_model *m);\nvoid ols_print(apop_model *m, FILE *ap);\n\nApop_settings_copy(apop_lm,\n    out->instruments = apop_data_copy(in->instruments);\n    if (in->input_distribution)\n        out->input_distribution = apop_model_copy(in->input_distribution);\n)\n\nApop_settings_free(apop_lm,\n    apop_model_free(in->input_distribution);\n) \n\nApop_settings_init(apop_lm,\n    if (out->want_cov == 1 || !out->want_cov) out->want_cov = 'y';\n    if (out->want_expected_value == 1 || !out->want_expected_value) out->want_expected_value = 'y';\n    if (!out->input_distribution) \n       out->input_distribution = apop_model_copy(apop_improper_uniform);\n)\n\n//shift first col to depvar, rename first col \"one\".\nstatic void prep_names (apop_model *e){\n    apop_lm_settings *p = apop_settings_get_group(e, apop_lm);\n    apop_parts_wanted_settings *pwant = apop_settings_get_group(e, apop_parts_wanted);\n    apop_data *predicted = apop_data_get_page(e->info, \"<Predicted>\");\n    if (predicted){\n        apop_name_add(predicted->names, (e->data->names->colct ? e->data->names->col[0] : \"Observed\"), 'c');\n        apop_name_add(predicted->names, \"Predicted\", 'c');\n        apop_name_add(predicted->names, \"Residual\", 'c');\n    }\n\tif (e->data->names->vector) { //this is post ols shuffle.\n        if (e->parameters)\n            Asprintf(&e->parameters->names->title, \"Regression of %s\", e->data->names->vector);\n        apop_name_add(e->parameters->names, \"parameters\", 'v');\n        for(int i=0; i< e->data->names->colct; i++)\n            apop_name_add(e->parameters->names, e->data->names->col[i], 'r');\n        if ((pwant && pwant->covariance) || (!pwant && p && p->want_cov== 'y')){\n            apop_data *cov = apop_data_get_page(e->parameters, \"<Covariance>\");\n            if (cov && e->data->names){\n                apop_name_stack(cov->names, e->data->names, 'c');\n                apop_name_stack(cov->names, e->data->names, 'r', 'c');\n            }\n        }\n\t}\n}\n\nstatic void ols_shuffle(apop_data *d){\n    if (!d) return;\n    if (!d->vector){\n        gsl_vector *independent = Apop_cv(d, 0);\n        d->vector = apop_vector_copy(independent);\n        gsl_vector_set_all(independent, 1);     //affine; first column is ones.\n        if (d->names->colct > 0) {\t\t\n            apop_name_add(d->names, d->names->col[0], 'v');\n            sprintf(d->names->col[0], \"1\");\n        }\n    }\n}\n\nstatic void ols_prep(apop_data *d, apop_model *m){\n    apop_score_vtable_add(ols_score, apop_ols);\n    apop_parameter_model_vtable_add(ols_param_models, apop_ols);\n    apop_predict_vtable_add(ols_predict, apop_ols);\n    apop_model_print_vtable_add(ols_print, apop_ols);\n    if (m->data && m->info) return; //already prepped; re-prep must be a no-op\n    Apop_stopif(!d || (!d->vector && !d->matrix), m->error='d'; return, 0, \"No data for regression.\");\n    ols_shuffle(d);\n    void *mpt = m->prep; //also use the defaults.\n    m->prep = NULL;\n    apop_prep(d, m);\n    m->prep = mpt;\n}\n\n/* The assumption that makes a log likelihood possible is that the\nerrors are normally distributed.\n\nThis function is a bit inefficient, in that it calculates the error terms,\nwhich you may have already done in the OLS estimation.  */\nstatic long double ols_log_likelihood (apop_data *d, apop_model *p){ \n    Nullcheck_mpd(d, p, GSL_NAN); Nullcheck(d->matrix, GSL_NAN);\n  long double ll = 0; \n  long double sigma, actual, weight;\n  double expected, x_prob;\n  apop_lm_settings *lms = Apop_settings_get_group(p, apop_lm);\n  apop_model *input_distribution = lms ? lms->input_distribution : NULL;\n  gsl_matrix *data = d->matrix;\n  gsl_vector *errors;\n\n    apop_data *pred = apop_data_get_page(p->info, \"<Predicted>\");\n    if (pred && d==p->data){ //use already-stored errors for this data set.\n        gsl_vector *as_errors = Apop_cv(pred, 2);\n\t\t\t\terrors = gsl_vector_calloc(data->size1);\n\t\t\t\tfor (size_t i=0;i< GSL_MIN(as_errors->size,data->size1); i++)\n\t\t\t\t\tgsl_vector_set(errors, i, gsl_vector_get(as_errors, i));\n    } else {\n        errors = gsl_vector_alloc(data->size1);\n        for (size_t i=0;i< data->size1; i++){\n            gsl_blas_ddot(p->parameters->vector, Apop_rv(d, i), &expected);\n            if (d->vector){ //then this has been prepped\n                actual = apop_data_get(d,i, -1);\n            } else {\n                actual = gsl_matrix_get(data,i, 0);\n                expected += gsl_vector_get(p->parameters->vector,0) * (1 - actual); //data isn't affine.\n            }\n            gsl_vector_set(errors, i, expected-actual);\n        }\n    }\n\n    apop_data *err = apop_data_get_page(p->parameters, \"<Error variance>\");\n    sigma = err ? sqrt(apop_data_get(err)) : sqrt(apop_vector_var(errors));\n\n    for(size_t i=0; i< data->size1; i++){\n        apop_data *justarow = Apop_r(d, i);\n        justarow->vector = NULL;\n        x_prob = (input_distribution)\n                    ? apop_p(justarow, input_distribution) //probably improper uniform, and so just 1 anyway.\n                    : 1;\n        weight = d->weights ? gsl_vector_get(d->weights, i) : 1; \n        ll += logl(gsl_ran_gaussian_pdf(gsl_vector_get(errors, i), sigma)* weight * x_prob);\n    }\n    return ll;\n}\n\n/* $\\partial {\\cal N}(x\\beta - y)/\\partial \\beta_i = \\sum{x_i} \\partial {\\cal N}(K)/\\partial K$ (at $K=x\\beta -y$) */\nstatic void ols_score(apop_data *d, gsl_vector *gradient, apop_model *p){ \n    Nullcheck_mpd(d, p, ); Nullcheck(d->matrix, );\n  long double sigma, actual, weight;\n  double expected;\n  gsl_matrix *data\t= d->matrix;\n  gsl_vector *errors = gsl_vector_alloc(data->size1);\n  gsl_vector *normscore = gsl_vector_alloc(2);\n  apop_data  *subdata  = apop_data_alloc(1,1);\n\tfor(size_t i=0;i< data->size1; i++){\n        gsl_blas_ddot(p->parameters->vector, Apop_rv(d, i), &expected);\n        if (d->vector){ //then this has been prepped\n            actual       = apop_data_get(d,i, -1);\n        } else {\n            actual       = gsl_matrix_get(data,i, 0);\n            expected    +=  gsl_vector_get(p->parameters->vector,0) * (1 - actual); //data isn't affine.\n        }\n        gsl_vector_set(errors, i, expected-actual);\n    }\n    sigma   = sqrt(apop_vector_var(errors));\n    apop_model *norm = apop_model_set_parameters(apop_normal, 0.0, sigma);\n    gsl_vector_set_all(gradient, 0);\n\tfor(size_t i=0;i< data->size1; i++){\n        apop_data_set(subdata, 0, 0, gsl_vector_get(errors, i));\n        apop_score(subdata, normscore, norm);\n        weight = d->weights ? gsl_vector_get(d->weights, i) : 1; \n        for(size_t j=0; j< data->size2; j++)\n            *gsl_vector_ptr(gradient, j) += weight * apop_data_get(d, i, j) * gsl_vector_get(normscore, 0);\n\t} \n    gsl_vector_free(errors);\n    apop_model_free(norm);\n}\n\n//xpx may be destroyed by the HH transformation.\nstatic void xpxinvxpy(apop_data const*data, gsl_matrix *xpx, apop_data const* xpy, apop_model *out){\n    apop_lm_settings   *p =  apop_settings_get_group(out, apop_lm);\n    apop_parts_wanted_settings *pwant = apop_settings_get_group(out, apop_parts_wanted);\n\tif ( (pwant && pwant->covariance!='y' && pwant->predicted != 'y') \n       ||(!pwant && p && p->want_cov!='y' && p->want_expected_value != 'y')){\t\n\t\t//then don't calculate (X'X)^{-1}\n\t\tgsl_linalg_HH_solve (xpx, xpy->vector, out->parameters->vector);\n\t\treturn;\n\t} //else:\n    double s_sq;\n    gsl_vector const *y_data = data->vector; //just an alias\n    apop_data *cov = apop_data_alloc();\n    double det = apop_det_and_inv(xpx, &cov->matrix, 1, 1);// not yet cov, just (X'X)^-1.\n    if (det < 1e-4) Apop_notify(1, \"Determinant of X'X is small (%g), so matrix is near singular. \"\n                        \"Expect the covariance matrix [based on (X'X)^-1] to be garbage.\", det);\n    apop_data_free(out->parameters);\n    out->parameters = apop_dot(cov, xpy);               // \\beta=(X'X)^{-1}X'Y\n    apop_data *error = apop_dot(data, out->parameters); // X\\beta ==predicted (not yet error)\n\tgsl_vector_sub(error->vector, y_data);              // X'\\beta - Y == error\n    gsl_blas_ddot(error->vector, error->vector, &s_sq); // e'e\n    s_sq /= data->matrix->size1 - data->matrix->size2;  // \\sigma^2 = e'e / df\n\tgsl_matrix_scale(cov->matrix, s_sq);                // cov = \\sigma^2 (X'X)^{-1}\n\tif ((pwant && pwant->predicted) || (!pwant && p && p->want_expected_value)){\n        apop_data *predicted_page = apop_data_get_page(out->info, \"<Predicted>\");\n        gsl_matrix_set_col(predicted_page->matrix, 0, y_data);\n        gsl_matrix_set_col(predicted_page->matrix, 2, error->vector);\n        gsl_vector *predicted = Apop_cv(predicted_page, 1);\n        gsl_vector_memcpy(predicted, y_data);\n        gsl_vector_add(predicted, error->vector); //pred = y_data + error\n    }\n    apop_data_free(error);\n    if (apop_data_get_page(out->parameters, \"<Covariance>\"))\n        apop_data_rm_page(out->parameters, \"<Covariance>\");\n    apop_data_add_page(out->parameters, cov, \"<Covariance>\");\n    apop_data_add_page(out->parameters, apop_data_falloc((1), s_sq), \"<Error variance>\");\n}\n\n/* \\adoc    RNG  Linear models are typically only partially defined probability models. For\nOLS, we know that \\f$P(Y|X\\beta) \\sim {\\cal N}(X\\beta, \\sigma)\\f$, because this is\nan assumption about the error process, but we don't know much of anything about the\ndistribution of \\f$X\\f$.\n\nThe \\ref apop_lm_settings group includes an \\ref apop_model element named \\c\ninput_distribution. This is the distribution of the independent/predictor/X columns\nof the data set.\n\nThe default is that <tt>input_distribution = apop_improper_uniform </tt>, meaning that\n\\f$P(X)=1\\f$ for all \\f$X\\f$. So \\f$P(Y, X) = P(Y|X)P(X) = P(Y|X)\\f$. This seems to\nbe how many people use linear models: the \\f$X\\f$ values are taken as certain (as with\nactually observed data) and the only question is the odds of the dependent variable. If\nthat's what you're looking for, just leave the default. This is sufficient for getting\nlog likelihoods under the typical assumption that the observed data has probability one.\n\n<em>But</em> you can't draw from an improper uniform. So if you draw from a linear\nmodel with a default <tt>input_distribution</tt>, then you'll get an error.\n\nAlternatively, you may know something about the distribution of the input data.\n     For example, the data model may simply be a PMF from the actual data:\n     \\code\n    apop_settings_set(your_model, apop_lm, input_distribution, apop_estimate(inset, apop_pmf));\n     \\endcode\nNow, random draws are taken from the input data, and the dependent variable value calculated via \\f$X\\beta+\\epsilon\\f$, where \\f$X\\f$ is the drawn value, \\f$\\beta\\f$ the previously-estimated parameters and \\f$\\epsilon\\f$ is a Normally-distributed random draw. Or change the PMF to any\nother appropriate distribution, such as a \\ref apop_multivariate_normal,\nor an \\ref apop_pmf filled in with more data, or perhaps something from\nhttp://en.wikipedia.org/wiki/Errors-in-variables_models , as desired.  */\nstatic int ols_rng(double *out, gsl_rng *r, apop_model *m){\n    //X is drawn from the input distribution, then Y = X\\beta + epsilon\n    apop_lm_settings *olp =  apop_settings_get_group(m, apop_lm);\n    Apop_stopif(!olp, return 1, 0, \"no apop_lm settings group attached. Has this model been estimated yet?\");\n\n    gsl_vector *tempdata = gsl_vector_alloc(m->parameters->vector->size);\n    Apop_stopif(apop_draw(tempdata->data, r, olp->input_distribution), return 2,\n            0, \"Couldn't draw from the distribution of the input data.\");\n    gsl_blas_ddot(tempdata, m->parameters->vector, out);\n\n    double sigma_sq = apop_data_get(m->info, .rowname=\"SSE\")/m->data->matrix->size1;\n    out[0] += gsl_ran_gaussian(r, sqrt(sigma_sq));\n\n    if (m->dsize > 1) memcpy(out+1, tempdata->data, sizeof(double)*tempdata->size);\n    gsl_vector_free(tempdata);\n    return 0;\n}\n\n/* \\adoc estimated_data You can specify whether the data is modified with an \\ref apop_lm_settings group. Else, left unchanged.\n\n\\adoc estimated_info Reports log likelihood, and runs \\ref apop_estimate_coefficient_of_determination \nto add \\f$R^2\\f$-type information (SSE, SSR, \\&c) to the info page.\n\nResiduals: I add a page named <tt>\\<Predicted\\></tt>, with three columns. \nThe first column is the dependent variable from the input data. Let our model\nbe \\f$ Y = \\beta X + \\epsilon\\f$. Then the second column is the predicted values:\n\\f$\\beta X\\f$, and the third column is the residuals: \\f$\\epsilon\\f$. The third column\nis therefore always the first minus the second.\n\nGiven your estimate \\c est, the zeroth element is one of <br> \n<tt> apop_data_get(est->info, .page= \"Predicted\", .row=0, .colname=\"observed\"),</tt><br>\n<tt> apop_data_get(est->info, .page= \"Predicted\", .row=0, .colname=\"predicted\") or</tt><br>\n<tt> apop_data_get(est->info, .page= \"Predicted\", .row=0, .colname=\"residual\").</tt><br>\n*/\nstatic void apop_estimate_OLS(apop_data *inset, apop_model *ep){\n    Nullcheck_mpd(inset, ep, );\n    Apop_stopif(ep->error, return, 0, \"Not estimating the model due to a previous error\");\n    apop_data *set;\n    apop_lm_settings *olp =  apop_settings_get_group(ep, apop_lm);\n    apop_parts_wanted_settings *pwant = apop_settings_get_group(ep, apop_parts_wanted);\n    if (!olp) \n        olp = Apop_model_add_group(ep, apop_lm);\n    ep->data = inset;\n    set = olp->destroy_data ? inset : apop_data_copy(inset); \n    \n    gsl_vector *weights = olp->destroy_data      //this may be NULL.\n                           ? ep->data->weights \n                           : apop_vector_copy(ep->data->weights);\n    if (weights)\n        for (size_t i =0; i< weights->size; i++)\n            gsl_vector_set(weights, i, sqrt(gsl_vector_get(weights, i)));\n\n    if ((pwant &&pwant->predicted) || (!pwant && olp && olp->want_expected_value=='y'))\n        apop_data_add_page(ep->info, apop_data_alloc(0, set->matrix->size1, 3), \"<Predicted>\");\n    if ((pwant &&pwant->covariance) || (!pwant && olp && olp->want_cov=='y'))\n        apop_data_add_page(ep->parameters, apop_data_alloc(0, set->matrix->size2, set->matrix->size2), \"<Covariance>\");\n    if (weights)\n        for (int i = -1; i < set->matrix->size2; i++)\n            gsl_vector_mul(Apop_cv(set, i), weights);\n\n    apop_data *xpx_d = apop_dot(set, set, .form1='t'); //(X'X)\n    apop_data *xpy_d = apop_dot(set, set, .form1='t', .form2='v'); //(X'y)\n    xpxinvxpy(set, xpx_d->matrix, xpy_d, ep);\n    prep_names(ep);\n    apop_data_free(xpx_d);\n    apop_data_free(xpy_d);\n\n    if ((pwant &&pwant->covariance) || (!pwant && olp && olp->want_cov=='y'))\n        apop_estimate_parameter_tests(ep);\n\n    add_info_criteria(ep->data, ep, ep, apop_log_likelihood(ep->data, ep), set->matrix->size2); //in apop_mle.c\n\n    apop_data *r_sq = apop_estimate_coefficient_of_determination(ep); //Add R^2-type info to info page.\n    apop_data_stack(ep->info, r_sq, .inplace='y');\n\n    apop_data_free(r_sq);\n    if (!olp->destroy_data){\n        if (weights) gsl_vector_free(weights);\n        apop_data_free(set);\n    }\n}\n\n/* \\adoc predict This function is limited to taking in a data set with a matrix, and\nfilling the vector with \\f$X\\beta\\f$. Like, the OLS estimation will shuffle a matrix around\nto insert a column of ones (see the discussion on the \\ref apop_ols prep routine).\n */\napop_data *ols_predict(apop_data *in, apop_model *m){\n    Nullcheck_mpd(in, m, NULL);\n    if (!in->vector)  ols_shuffle(in);  \n\n    //find x dot y\n    gsl_blas_dgemv (CblasNoTrans, 1, in->matrix, m->parameters->vector, 0, in->vector);\n    return in;\n}\n\napop_model *ols_param_models(apop_data *d, apop_model *m){\n    Nullcheck_mpd(d, m, NULL);\n    apop_pm_settings *settings = Apop_settings_get_group(m, apop_pm);\n    if (settings->index!=-1){\n        int i = settings->index;\n        double mu = apop_data_get(m->parameters, i, -1);\n        double sigma = sqrt(apop_data_get(m->parameters, i, i, .page=\"<Covariance>\"));\n        int df = apop_data_get(m->info, .rowname=\"df\");\n        return apop_model_set_parameters(apop_t_distribution, mu, sigma, df);\n    }\n    //else run the default\n    apop_parameter_model_vtable_drop(m);\n    apop_model *out = apop_parameter_model(d, m);\n    apop_parameter_model_vtable_add(ols_param_models, m);\n    return out;\n}\n\nvoid ols_print(apop_model *m, FILE *ap){\n    fprintf(ap, \"Parameters:\\n\");\n    apop_data_print(m->parameters, .output_pipe=(ap? ap : stdout));\n    apop_data *predict = apop_data_rm_page(m->info, \"<Predicted>\", .free_p='n');\n    apop_data_print(m->info, .output_pipe=(ap? ap : stdout));\n    if (predict) apop_data_add_page(m->info, predict, predict->names->title);\n}\n\napop_model *apop_ols = &(apop_model){.name=\"Ordinary Least Squares\", .vsize = -1, .dsize=-1, .estimate=apop_estimate_OLS, \n            .log_likelihood = ols_log_likelihood, .prep = ols_prep, .draw=ols_rng};\n\n\n/*\\amodel apop_iv Instrumental variable regression\n\nOperates much like the \\ref apop_ols model, but the input parameters also need to have\na table of substitutions (like the addition of the <tt>.instruments</tt> setting in\nthe example below).\n\nWhich columns substitute where can be specified in your choice of two ways. The first\nis to use the vector element of the \\ref apop_data set to list the column numbers\nto be substituted (the dependent variable is zero; first independent column is one),\nand then one column for each item to substitute.\n\nThe second method, if the vector of the instrument \\ref apop_data set is \\c NULL, is to\nuse the column names to find the matching columns in the base data to substitute. This\nis generally more robust and/or convenient.\n\n\\li If the \\c instruments data set is \\c NULL or empty, I'll just run OLS. \n\n\\li The \\ref apop_lm_settings group has a \\c destroy_data setting. If\nyou set that to \\c 'y', I will overwrite the column in place, saving the trouble of\ncopying the entire data set.\n\n\\adoc    Input_format  See the discussion on the \\ref apop_ols page regarding its prep routine. See above regarding the <tt>.instruments</tt> elment of the attached \\ref apop_lm_settings group.\n\\adoc    Parameter_format  As per \\ref apop_ols \n\\adoc    Estimate_results  As per \\ref apop_ols \n\\adoc    Prep_routine  See the discussion on the \\ref apop_ols page regarding its prep routine.\n\\adoc    settings  \\ref apop_lm_settings \n\\adoc Examples \n\\include  iv.c\n*/\n\nstatic apop_data *prep_z(apop_data *x, apop_data *instruments){\n    apop_data *out = apop_data_copy(x);\n    if (instruments->vector)\n        for (int i=0; i< instruments->vector->size; i++){\n            gsl_vector *inv  = Apop_cv(instruments, i);\n            gsl_vector *outv = Apop_cv(out, instruments->vector->data[i]);\n            gsl_vector_memcpy(outv, inv);\n        }\n    else if (instruments->names->colct)\n        for (int i=0; i< instruments->names->colct; i++){\n            int colnumber = apop_name_find(x->names, instruments->names->col[i], 'c');\n            Apop_assert(colnumber != -2, \"You asked me to substitute instrument column %i \"\n                    \"for the data column named %s, but I could find no such name.\",  i, instruments->names->col[i]);\n            gsl_vector_memcpy(Apop_cv(out, colnumber), Apop_cv(instruments, i));\n        }\n    else Apop_assert(0, \"Your instrument matrix has data, but neither a vector element \"\n                       \"nor column names indicating what columns in the original data should be replaced.\");\n    return out;\n}\n\nstatic void apop_estimate_IV(apop_data *inset, apop_model *ep){\n    Nullcheck_mpd(inset, ep, );\n    apop_lm_settings   *olp =  apop_settings_get_group(ep, apop_lm);\n    apop_parts_wanted_settings *pwant = apop_settings_get_group(ep, apop_parts_wanted);\n    if (!olp) olp = Apop_model_add_group(ep, apop_lm);\n    if (!olp->instruments || !(olp->instruments->matrix || olp->instruments->vector)) \n        apop_ols->estimate(inset, ep);\n    ep->data = inset;\n    if (ep->parameters) apop_data_free(ep->parameters);\n    ep->parameters = apop_data_alloc(inset->matrix->size2);\n    apop_data *set = olp->destroy_data ? inset : apop_data_copy(inset); \n    apop_data *z = prep_z(inset, olp->instruments);\n    \n    gsl_vector *weights = olp->destroy_data      //the weights may be NULL.\n                             ? ep->data->weights \n                             : apop_vector_copy(ep->data->weights);\n    if (weights)\n        for (int i =0; i< weights->size; i++)\n            gsl_vector_set(weights, i, sqrt(gsl_vector_get(weights, i)));\n\n    if ((pwant && pwant->predicted) || (!pwant && olp && olp->want_expected_value))\n        apop_data_add_page(ep->info, apop_data_alloc(set->matrix->size1, 3), \"<Predicted>\");\n    prep_names(ep);\n    if (weights){\n        gsl_vector_mul(set->vector, weights);\n        for (int i = 0; i < set->matrix->size2; i++)\n            gsl_vector_mul(Apop_cv(set, i), weights);\n    }\n\n    apop_data *zpx = apop_dot(z, set, .form1='t');\n    apop_data *zpy = apop_dot(z, set, .form1='t', .form2='v'); //z'y\n\n    xpxinvxpy(inset, zpx->matrix, zpy, ep);\n\n    //covariance matrix right now is sigma (Z'X)^-1. We need\n    //sigma (Z'X)^-1 (Z'Z) (X'Z)^-1\n\n    apop_data *zpz = apop_dot(z, z, .form1='t');\n    apop_data zpxinv = (apop_data) {.matrix=apop_matrix_inverse(zpx->matrix)};\n    apop_data *zpz_xpzinv = apop_dot(zpz, &zpxinv, .form2='t');\n    apop_data *halfcov = apop_data_get_page(ep->parameters, \"<Covariance>\");\n    apop_data *cov = apop_dot(halfcov, zpz_xpzinv);\n    apop_data_rm_page(ep->parameters, \"<Covariance>\");\n    apop_data_add_page(ep->parameters, cov, \"<Covariance>\");\n\n    gsl_matrix_free(zpxinv.matrix);\n    apop_data_free(zpx);\n    apop_data_free(zpy);\n    apop_data_free(zpz);\n\n\n/*\n    apop_data *zpxinv = apop_matrix_to_data(apop_matrix_inverse(zpx->matrix));\n    ep->parameters = apop_dot(zpxinv, zpy);\n    //cov = sigma^2 (Z'X)^-1 Z'Z (X'Z)^-1\n    */\n\n    /*\n    if ((pwant &&pwant->covariance) || (!pwant && olp && olp->want_cov=='y')){\n        apop_data *zpz = apop_dot(z, z, .form1='t');\n        apop_data *zpz_zpxinv = apop_dot(zpz, zpxinv, .form2='t');\n        apop_data_add_page(ep->parameters, apop_dot(zpx, zpz_zpxinv)\n                , \"<Covariance>\");\n        apop_data_free(zpz); apop_data_free(zpz_zpxinv);\n    }\n    */\n\n    apop_data_free(zpx);// apop_data_free(zpxinv);\n    apop_data_free(zpy);\n\n    if (!olp->destroy_data) apop_data_free(set);\n}\n\napop_model *apop_iv = &(apop_model){.name=\"instrumental variables\", .vsize = -1, .dsize=-1,\n    .estimate =apop_estimate_IV, .prep=ols_prep,\n    .log_likelihood = ols_log_likelihood};\n"
  },
  {
    "path": "model/apop_pmf.c",
    "content": "/* Probability mass functions \nCopyright (c) 2011 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n\\amodel apop_pmf A probability mass function is commonly known as a histogram, or still more commonly,\na bar chart. It indicates that at a given coordinate, there is a given mass.\n\nEach row of the PMF's data set holds the coordinates, and the\n<em>weights vector</em> holds the mass at the given point. This is in contrast to the\ncrosstab format, where the location is simply given by the position of the data point\nin the grid.\n\nFor example, here is a typical crosstab:\n\n<table>\n<tr>            <td></td><td> col 0</td><td> col 1</td><td> col 2</td></tr>\n<tr><td>row 0 </td><td>0</td><td> 8.1</td><td> 3.2</td></tr>\n<tr><td>row 1 </td><td>0</td><td> 0</td><td> 2.2</td></tr>\n<tr><td>row 2 </td><td>0</td><td> 7.3</td><td> 1.2</td></tr>\n</table>\n\nHere it is as a sparse listing:\n\n<table>\n<tr> <td> dimension 1</td><td> dimension 2</td><td> value</td></tr>\n<tr>  <td>0</td> <td>1</td> <td>8.1</td></tr>\n<tr>  <td>0</td> <td>2</td> <td>3.2</td></tr>\n<tr>  <td>1</td> <td>2</td> <td>2.2</td></tr>\n<tr>  <td>2</td> <td>1</td> <td>7.3</td></tr>\n<tr>  <td>2</td> <td>2</td> <td>1.2</td></tr>\n</table>\n\nThe \\c apop_pmf internally represents data in this manner, with the dimensions \nin the \\c matrix, \\c vector, and \\c text element of the data set, and the cell values\nare held in the \\c weights element (not the vector).\n\nIf your data is in a crosstab (with observation coordinates in the matrix element for 2-D data or the\nvector for 1-D data), then use \\ref apop_crosstab_to_db to make the conversion. See also <a href=\"https://github.com/b-k/Apophenia/wiki/Crosstab-to-PMF\">the wiki</a> for another crosstab-to-PMF function.\n\nIf your data is already in the sparse listing format (which is probably the case for 3-\nor more dimensional data), then estimate the model via:\n\n\\code\napop_model *my_pmf = apop_estimate(in_data, apop_pmf);\n\\endcode\n\n\\li If the \\c weights element is \\c NULL, then I assume that all rows of the data set are\nequally probable.\n\\li If the \\c weights are present but sum to a not-finite value, the model's \\c error element is set to \\c 'w' when the estimation is run, and a warning printed.\n\n\\adoc Input_format   One observation per row, with coordinates in the \\c vector, \\c matrix, and/or \\c text, \n                    and the density at that point in the \\c weights. If <tt>weights==NULL</tt>, all observations are equiprobable.\n\n\\adoc Parameter_format  None. The list of observations and their weights are in the \\c data set, not the \\c parameters.\n\\adoc Settings   \\ref apop_pmf_settings\n*/\n\n#include \"apop_internal.h\"\n\nApop_settings_copy(apop_pmf,\n    (*out->cmf_refct)++;\n)\n\nApop_settings_free(apop_pmf,\n    if (!--*in->cmf_refct) {\n        gsl_vector_free(in->cmf);\n        free(in->cmf_refct);\n    }\n) \n\nApop_settings_init(apop_pmf,\n    Apop_varad_set(draw_index, 'n')\n    out->cmf_refct = calloc(1, sizeof(int));\n    (*out->cmf_refct)++;\n)\n\n\n/* \\adoc    estimated_data  The data you sent in is linked to (not copied).\n\\adoc    estimated_parameters  Still \\c NULL.    */\nstatic void estim (apop_data *d, apop_model *out){\n    out->data = d;\n    apop_data_free(out->parameters); //may have been auto-alloced by prep.\n\n    apop_pmf_settings *settings = Apop_settings_get_group(out, apop_pmf);\n    if (!settings) settings = Apop_model_add_group(out, apop_pmf);\n    if (d->weights) {\n        settings->total_weight = apop_sum(d->weights);\n        Apop_stopif(!isfinite(settings->total_weight),\n            out->error='w', 0, \"total weight in the input data is %Lg.\\n\", settings->total_weight);\n    }\n}\n\nstatic void setup_cmf(apop_model *m){\n    //already assumed a weights vector in the data\n    apop_pmf_settings *settings = Apop_settings_get_group(m, apop_pmf);\n    size_t maxsize = m->data->weights->size;\n    settings->cmf = gsl_vector_alloc(maxsize);\n    Apop_stopif(!settings->cmf, m->error='a'; return,\n            0, \"Allocation error setting up the CMF.\");\n    gsl_vector *cdf = settings->cmf; //alias.\n    cdf->data[0] = m->data->weights->data[0];\n    for (int i=1; i< maxsize; i++)\n        cdf->data[i] = m->data->weights->data[i] + cdf->data[i-1];\n    //Now make sure the last entry is one.\n    Apop_stopif(cdf->data[maxsize-1]==0 || isnan(cdf->data[maxsize-1]), m->error='f'; return,\n            0, \"Bad density in the PMF.\");\n    gsl_vector_scale(cdf, 1./cdf->data[maxsize-1]);\n    Apop_stopif(!isfinite(cdf->data[maxsize-1]), m->error='f';  return,\n            0, \"Bad density in the PMF.\");\n}\n\n/* \\adoc    RNG  Return the data in a random row of the PMF's data set. If there is a\n      weights vector, I will use that to make draws; else all rows are equiprobable.\n\n\\li If you set \\c draw_index to \\c 'y', e.g., \n\n\\code\nApop_settings_add(your_model, apop_pmf, draw_index, 'y');\n\\endcode\n\nthen I will return the row number of the draw, not the data in that row. Because \\ref\napop_draw only returns numeric data, this is the only meaningful way to make draws\nfrom text data.\n\n\\li  The first time you draw from a PMF with uneven weights, I will generate a\nvector tallying the cumulative mass. Subsequent draws will have no computational\noverhead. Because the  vector is built using the data on the first call to this or\nthe \\c cdf method, do not rearrange or modify the data after the first call. I.e.,\nif you choose to use \\ref apop_data_sort or \\ref apop_data_pmf_compress on your data,\ndo it before the first draw or CDF calculation.\n\n\\exception m->error='f' There is zero or NaN density in the CMF. I set the model's \\c error element to \\c 'f' and set <tt>out=NAN</tt>.\n\\exception m->error='a' Allocation error. I set the model's \\c error element to \\c 'a' and set <tt>out=NAN</tt>. Maybe try \\ref apop_data_pmf_compress first?\n*/\nstatic int draw (double *out, gsl_rng *r, apop_model *m){\n    Nullcheck_m(m, 1) Nullcheck_d(m->data, 1)\n    apop_pmf_settings *settings = Apop_settings_get_group(m, apop_pmf);\n    #pragma omp critical (pmfsetup)\n    if (!settings) settings = Apop_model_add_group(m, apop_pmf);\n    Get_vmsizes(m->data) //maxsize\n    size_t current; \n    if (!m->data->weights) //all rows are equiprobable\n        current = gsl_rng_uniform(r)* (maxsize-1);\n    else {\n        size_t size = m->data->weights->size;\n        #pragma omp critical (pmfsetuptwo)\n        if (!settings->cmf) setup_cmf(m);\n        Apop_stopif(m->error=='f', *out=GSL_NAN; return 1, 0, \"Zero or NaN density in the PMF.\");\n        double draw = gsl_rng_uniform(r);\n        //do a binary search for where draw is in the CDF.\n        double *cdf = settings->cmf->data; //alias.\n        size_t top = size-1, bottom = 0; \n        current = (top+bottom)/2.;\n        if (current==0){//array of size one or two\n            if (size!=1) \n                if (cdf[0] < draw)\n                    current = 1;\n        } else while (!(cdf[current]>=draw && cdf[current-1] < draw)){\n            if (cdf[current] < draw){ //step up\n                bottom = current;\n                if (current == top-1)\n                    current ++;\n                else\n                    current = (bottom + top)/2.;\n            } else if (cdf[current-1] >= draw){ //step down\n                top = current;\n                if (current == bottom+1)\n                    current --;\n                else\n                    current = (bottom + top)/2.;\n            }\n            if (current==0 && cdf[0] >= draw) break;\n        }\n    }\n    //Done searching. Current should now be the right row index.\n    if (settings->draw_index=='y'){\n        *out = current;\n        return 0;\n    }\n    apop_data *outrow = Apop_r(m->data, current);\n    int i = 0;\n    if (outrow->vector)\n        out[i++] = outrow->vector->data[0];\n    if (outrow->matrix)\n        for( ; i < outrow->matrix->size2; i ++)\n            out[i] = gsl_matrix_get(outrow->matrix, 0, i);\n    return 0;\n}\n\n\nstatic int are_equal(apop_data *left, apop_data *right){\n    /* Intended by use for apop_data_pmf_compress and .p, below.\n      That means we aren't bothering with comparing names, and weights are likely to be\n      different, because we're using those to tally data elements. If the data set has\n      a longer matrix than vector, say, then one side may have the vector element and\n      the other not, so we still check that there's a match in presence of each element.\n\n     */\n    if (left->vector){\n        if (!right->vector ||\n              (*left->vector->data != *right->vector->data \n               && !(gsl_isnan(*left->vector->data) && gsl_isnan(*right->vector->data))))\n            return 0;\n    } else if (right->vector) return 0;\n\n    if (left->matrix){\n        if (!right->matrix ||\n              left->matrix->size2 != right->matrix->size2) return 0;\n        for (int i=0; i< left->matrix->size2; i++){\n            double L = apop_data_get(left, 0, i);\n            double R = apop_data_get(right, 0, i);\n            if (L != R && !(gsl_isnan(L) && gsl_isnan(R))) return 0;\n        }\n    }\n    else if (right->matrix) return 0;\n\n    if (left->textsize[1]){\n        if (left->textsize[1] != right->textsize[1]) return 0;\n        for (int i=0; i< left->textsize[1]; i++)\n            if (strcmp(left->text[0][i], right->text[0][i])) return 0;\n    }\n    else if (right->textsize[1]) return 0;\n    return 1;\n}\n\nstatic int find_in_data(apop_data *searchme, apop_data *findme){//findme is one row tall.\n    Get_vmsizes(searchme)\n    for(int i=0; i < GSL_MAX(vsize, GSL_MAX(searchme->textsize[0], msize1)); i++)\n        if (are_equal(findme, Apop_r(searchme, i)))\n            return i;\n    return -1;\n}\n\nstatic long double pmf_p(apop_data *d, apop_model *m){\n    apop_pmf_settings *settings = Apop_settings_get_group(m, apop_pmf);\n    Nullcheck_d(d, GSL_NAN) \n    Nullcheck_m(m, GSL_NAN) \n    int model_pmf_length;\n    {\n        Get_vmsizes(m->data);//maxsize\n        model_pmf_length = maxsize;\n    }\n    Get_vmsizes(d)//maxsize\n    long double p = 1;\n    for (int i=0; i< maxsize; i++){\n        int elmt = find_in_data(m->data, Apop_r(d, i));\n        if (elmt == -1) return 0; //Can't find one observation: prob=0;\n        p *= m->data->weights\n                 ? m->data->weights->data[elmt] /settings->total_weight \n                 : 1./model_pmf_length; //no weights means any known event is equiprobable\n    }\n    return p;\n}\n\n/* \\adoc    CDF  <em>Assuming the data is sorted in a meaningful manner</em>, find the total mass up to a given data point.\n\nThat is, a CDF only makes sense if the data space is totally ordered. The sorting you\ndefine using \\ref apop_data_sort defines that ordering.\n\n\\li The input data should have the same number of columns as the data set used to construct the PMF. I use only the first row.\n\n\\li If the observation is not found in the data, return zero.\n\n\\li  The first time you get a CDF from from a data set with uneven weights, I\nwill generate a vector tallying the cumulative mass. Subsequent draws will have no\ncomputational overhead. Because the  vector is built using the data on the first call\nto this or the \\c cdf method, do not rearrange or modify the data after the first\ncall. I.e., if you choose to use \\ref apop_data_sort or \\ref apop_data_sort on\nyour data, do it before the first draw or CDF calculation.\n */\nstatic long double pmf_cmf(apop_data *d, apop_model *m){\n    Get_vmsizes(m->data); //maxsize\n    int elmt = find_in_data(m->data, Apop_r(d, 0));\n    if (elmt == -1) return 0; //Can't find one observation: prob=0;\n    if (!m->data->weights) return (elmt+0.0)/maxsize;\n    else {\n        apop_pmf_settings *settings = Apop_settings_get_group(m, apop_pmf);\n        if (!settings) settings = Apop_model_add_group(m, apop_pmf);\n        if (!settings->cmf) setup_cmf(m);\n        Apop_stopif(m->error=='f', return GSL_NAN, 0, \"Zero or NaN density in the PMF.\");\n        gsl_vector_view v = gsl_vector_subvector(settings->cmf, 0, elmt+1);\n        return apop_sum(&v.vector);\n    }\n}\n\nstatic void pmf_print(apop_model *est, FILE *out){ apop_data_print(est->data, .output_pipe=out); }\n\nstatic void pmf_prep(apop_data * data, apop_model *model){\n    if (model->data) return; //already prepped, and reprep is a no-op.\n    apop_model_print_vtable_add(pmf_print, apop_pmf);\n    Get_vmsizes(data) //msize2, firstcol\n    int width = msize2 ? msize2 : -firstcol;//use the vector only if there's no matrix.\n    if (Apop_settings_get_group(model, apop_pmf) && Apop_settings_get(model, apop_pmf, draw_index)=='y' && !width) model->dsize=0;\n    apop_model_clear(data, model);\n}\n\napop_model *apop_pmf = &(apop_model){\"PDF or sparse matrix\", .dsize=-1, .estimate = estim, \n                .draw = draw, .p=pmf_p, .prep=pmf_prep, .cdf=pmf_cmf};\n\n\n/** Say that you have added a long list of observations to a single \\ref apop_data set,\n  meaning that each row has weight one. There are a huge number of duplicates, perhaps because there are a handful of \n  types that keep repeating:\n\n<table frame=box>\n<tr>\n<td>Vector value</td><td> Text name</td><td>Weights</td>\n</tr><tr valign=bottom><td></td> </tr>\n<tr><td>12</td><td>Dozen</td><td>1</td></tr>\n<tr><td>1</td><td>Single</td><td>1</td></tr>\n<tr><td>2</td><td>Pair</td><td>1</td></tr>\n<tr><td>2</td><td>Pair</td><td>1</td></tr>\n<tr><td>1</td><td>Single</td><td>1</td></tr>\n<tr><td>1</td><td>Single</td><td>1</td></tr>\n<tr><td>2</td><td>Pair</td><td>1</td></tr>\n<tr><td>2</td><td>Pair</td><td>1</td></tr>\n</table>\n\nUse this function to reduce this to a set of distinct values, with their weights adjusted accordingly:\n\n<table frame=box>\n<tr>\n<td>Vector value</td><td> Text name</td><td>Weights</td>\n</tr>\n<tr valign=bottom><td></td></tr>\n<tr><td>12</td><td>Dozen</td><td>1</td></tr>\n<tr><td>1</td><td>Single</td><td>3</td></tr>\n<tr><td>2</td><td>Pair</td><td>4</td></tr>\n</table>\n\n\\param in An \\ref apop_data set that may have duplicate rows. As above, the data may\n    be in text and/or numeric formats.\n\n\\return Your input is changed in place, via \\ref apop_data_rm_rows, so use \\ref\napop_data_copy before calling this function if you need to retain the original\nformat. For your convenience, this function returns a pointer to your original data,\nwhich has now been pruned.  If there is a \\c weights vector, I will add those weights\ntogether as duplicates are merged. If there is no \\c weights vector, I will create one,\nwhich is initially set to one for all values, and then aggregated as above.\n*/\napop_data *apop_data_pmf_compress(apop_data *in){\n    Apop_assert_c(in, NULL, 1,  \"You sent me a NULL input data set; returning NULL output.\");\n    Get_vmsizes(in); //maxsize\n    Apop_assert_c(maxsize, in, 1, \"You sent a non-NULL data set, but the vector, matrix, and text are all of length zero. Returning the original data set unchanged.\");\n    if (!in->weights){\n        in->weights = gsl_vector_alloc(maxsize);\n        gsl_vector_set_all(in->weights, 1);\n    }\n    if (maxsize==1) return in; //optional check.\n    int *cutme = calloc(maxsize, sizeof(int));\n    int not_done = 1; //if we do a full j-loop and everything is to be cut, we're done.\n    for (int i=0; i< maxsize && not_done; i++){\n        if (cutme[i]) continue;\n        Apop_row(in, i, subject);\n        not_done = 0;\n        for (int j=i+1; j< maxsize; j++){\n            if (cutme[j]) continue;\n            not_done = 1;\n            Apop_row(in, j, compare_me);\n            if (are_equal(subject, compare_me)){\n                *gsl_vector_ptr(subject->weights, 0) += gsl_vector_get(compare_me->weights, 0);\n                cutme[j]=1;\n            }\n        }\n    }\n    apop_data_rm_rows(in, cutme);\n    free(cutme);\n    return in;\n}\n"
  },
  {
    "path": "model/apop_poisson.c",
    "content": "/* The Poisson distribution.\n Copyright (c) 2006--2007, 2010 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n  \n\\amodel apop_poisson\n\n\\f$p(k) = {\\mu^k \\over k!} \\exp(-\\mu). \\f$\n\n\\adoc    Input_format One scalar observation per row (in the \\c matrix or \\c vector).  \n\\adoc    Parameter_format  One parameter, the zeroth element of the vector (<tt>double mu = apop_data_get(estimated_model->parameters)</tt>).\n\\adoc    settings   \\ref apop_parts_wanted_settings, for the \\c .want_cov element.  */\n\n#include \"apop_internal.h\"\n\nstatic double apply_me(double x, void *in){\n    if (x < 0) return -INFINITY;\n    if ((x - (int)x) > 1e-4) return -INFINITY;\n    double *ln_l = in;\n    return x==0 ? 0 : *ln_l *x - gsl_sf_lngamma(x+1);\n}\n\nstatic long double poisson_log_likelihood(apop_data *d, apop_model * p){\n    Nullcheck_mpd(d, p, GSL_NAN)\n    Get_vmsizes(d) //tsize\n    double lambda = apop_data_get(p->parameters);\n    double ln_l = log(lambda);\n    double ll = apop_map_sum(d, .fn_dp = apply_me, .param=&ln_l);\n    return ll - tsize*lambda;\n}\n\nstatic double data_mean(apop_data *d){\n    Get_vmsizes(d)\n    if (vsize && !msize1) return apop_vector_mean(d->vector);\n    if (!vsize && msize1) return apop_matrix_mean(d->matrix);\n    return apop_matrix_mean(d->matrix)*(msize1*msize2+0.0)/tsize +\n               + apop_vector_mean(d->vector)*(vsize+0.0)/tsize;\n}\n\n/* \\adoc estimated_parameters \nUnless you decline it by adding the \\ref apop_parts_wanted_settings group, I will also give you the variance of the parameter, via bootstrap, stored in a page named <tt>\\<Covariance\\></tt>.\n\n\\adoc estimated_info   Reports <tt>log likelihood</tt>. */\nstatic void poisson_estimate(apop_data * data,  apop_model *est){\n    Nullcheck_mpd(data, est, );\n    double mean = data_mean(data);\n\tapop_data_set(est->parameters, .val=mean);\n    apop_data_add_names(est->parameters, 'r', \"λ\");\n    apop_data_add_named_elmt(est->info, \"log likelihood\", poisson_log_likelihood(data, est));\n    //to prevent an infinite loop, the bootstrap needs to be flagged to not run itself. \n    apop_parts_wanted_settings *p = apop_settings_get_group(est, apop_parts_wanted);\n    if (!p || p->covariance=='y'){\n        if (!p) Apop_model_add_group(est, apop_parts_wanted);\n        else p->covariance='n';\n\n        apop_data_add_page(est->parameters, apop_bootstrap_cov(data, est), \"<Covariance>\");\n\n        if (!p) Apop_settings_rm_group(est, apop_parts_wanted);\n        else p->covariance='y';\n    }\n}\n\nstatic long double positive_beta_constraint(apop_data *returned_beta, apop_model *v){\n    //constraint is 0 < beta_1\n    return apop_linear_constraint(v->parameters->vector, .margin = 1e-4);\n}\n\nstatic void poisson_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *p){\n    Get_vmsizes(d) //tsize\n    Nullcheck_mpd(d, p, )\n    double     lambda = apop_data_get(p->parameters);\n    gsl_matrix *data = d->matrix;\n    double     d_a = apop_matrix_sum(data)/lambda - tsize;\n    gsl_vector_set(gradient,0, d_a);\n}\n\n/* \\adoc RNG A wrapper for \\c gsl_ran_poisson. Sets a single scalar.*/\nstatic int poisson_rng(double *out, gsl_rng* r, apop_model *p){\n    *out = gsl_ran_poisson(r, *p->parameters->vector->data);\n    return 0;\n}\n\nstatic void poisson_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(poisson_dlog_likelihood, apop_poisson);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_poisson = &(apop_model){\"Poisson distribution\", 1, 0, 0, .dsize=1,\n     .estimate = poisson_estimate, .log_likelihood = poisson_log_likelihood, \n     .prep = poisson_prep, .constraint = positive_beta_constraint, \n     .draw = poisson_rng};\n"
  },
  {
    "path": "model/apop_probit.c",
    "content": "/* Probit and Logit. \nCopyright (c) 2005--2008, 2010 by Ben Klemens.  Licensed under the GPLv2; see COPYING. \n\n\\amodel apop_probit\n\nApophenia makes no distinction between the Bivariate Probit and the Multinomial\nProbit. This one does both.\n\n\\adoc    Input_format  \nThe first column of the data matrix this model expects is zeros, ones, ..., enumerating\nthe factors; see the prep routine. The remaining columns are values of the\nindependent variables. Thus, the model will return [(data columns)-1]\\f$\\times\\f$[(option\ncount)-1] parameters.  Column names are options; row names are input variables.\n\n\\adoc    Parameter_format  As above \n\\adoc    Prep_routine The initial column of data should be a set of \nfactors, set up via \\ref apop_data_to_factors. If I find a factor page, I will use\nthat info; if not, then I will run \\ref apop_data_to_factors on the left-most column\n(the vector if there is one, else the first column of the matrix.)\n\nAlso, if there is no vector, then I will move the first column of the matrix, and\nreplace that matrix column with a constant column of ones, just like with OLS.\n\n\\adoc    settings   None, but see above about seeking a factor page in the input data.\n\\adoc    RNG  See \\ref apop_ols; this one is similar but produces a category number instead of OLS's continuous draw.\n*/\n\n#include \"apop_internal.h\"\n\nstatic void probit_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *p);\n\nstatic apop_data *get_category_table(apop_data *d){\n    int first_col = d->vector ? -1 : 0;\n    apop_data *out = apop_data_get_factor_names(d, .col=first_col);\n    if (!out) {\n        apop_data_to_factors(d, .intype='d', .incol=first_col, .outcol=first_col);\n        out = apop_data_get_factor_names(d, .col=first_col);\n    }\n    return out;\n}\n\nstatic void probit_prep(apop_data *d, apop_model *m){\n    if (m->data && m->parameters) return; //already prepped; re-prep is a no-op.\n    apop_data *factor_list = get_category_table(d);\n    apop_score_vtable_add(probit_dlog_likelihood, apop_probit);\n    //apop_score_vtable_add(logit_dlog_likelihood, apop_logit);\n    apop_ols->prep(d, m);//also runs the default apop_model_clear.\n    int count = factor_list->textsize[0];\n    m->parameters = apop_data_alloc(d->matrix->size2, count-1);\n    apop_name_stack(m->parameters->names, d->names, 'r', 'c');\n    for (int i=1; i< count; i++) \n        apop_name_add(m->parameters->names, factor_list->text[i][0], 'c');\n    gsl_matrix_set_all(m->parameters->matrix, 1);\n    char *tmp = strdup(m->name);\n    snprintf(m->name, 100, \"%s with %s as numeraire\", tmp, factor_list->text[0][0]);\n    free(tmp);\n\n    apop_mle_settings *sets = apop_settings_get_group(m, apop_mle);\n    if (sets && sets->starting_pt) return;\n    /*Because of the exponentiation, it's easy to get overflows. If the user\n      didn't set a starting point, pick one that is of the same order of \n      magnitude as the average data element. \n      If a data point is zero, we more-or-less ignore it.\n      */\n    size_t matrix_cols = m->data->matrix->size2;\n    for (size_t i=0; i< matrix_cols; i++){\n        gsl_vector *onecol = Apop_cv(m->data, i);\n        long double logtotal = 0;\n        for (int i=0; i< onecol->size; i++){\n            double val =gsl_vector_get(onecol, i);\n            logtotal += val ? logl(fabs(val)): 0;\n        }\n        logtotal /= onecol->size; //we now have average log magnitude.\n        Apop_stopif(!isfinite(logtotal), m->error='d'; return, 0, \"Not-finite data (maybe NaN) in column %zu\", i);\n        Apop_row_v(m->parameters, i, betas_i);\n        gsl_vector_set_all(betas_i, expl(logtotal));\n    }\n    if (!sets) sets = Apop_model_add_group(m, apop_mle);\n    gsl_vector *params_as_vector=apop_data_pack(m->parameters); //li'l leak.\n    sets->starting_pt= params_as_vector->data;\n}\n\nstatic double biprobit_ll_row(apop_data *r){\n    long double n = gsl_cdf_gaussian_P(-gsl_matrix_get(r->matrix, 0, 0),1);\n    n = n ? n : 1e-10; //prevent -inf in the next step.\n    n = n<1 ? n : 1-1e-10; \n    return r->vector->data[0] ?  log(1-n): log(n);\n}\n\n//The case where outcome is a single zero/one option.\nstatic long double biprobit_log_likelihood(apop_data *d, apop_model *p){\n    apop_data *betadotx = apop_dot(d, p->parameters); \n    betadotx->vector = d->vector;\n    double total_prob = apop_map_sum(betadotx, .fn_r=biprobit_ll_row);\n    betadotx->vector = NULL;\n    apop_data_free(betadotx);\n\treturn total_prob;\n}\n\nstatic threadlocal double val;\nstatic double unordered(double in){ return in == val; }\n\n// This is just a for loop that runs a probit on each column.\nstatic long double multiprobit_log_likelihood(apop_data *d, apop_model *p){\n    Nullcheck_mpd(d, p, GSL_NAN)\n    gsl_vector *val_vector = get_category_table(d)->vector;\n    if (val_vector->size==2) return biprobit_log_likelihood(d, p);\n    //else, multinomial loop\n    static threadlocal apop_model *spare_probit = NULL;\n    if (!spare_probit){\n        spare_probit = apop_model_copy(apop_probit);\n        spare_probit->parameters = apop_data_alloc();\n    }\n    Staticdef(apop_data *, working_data, apop_data_alloc());\n    working_data->matrix = d->matrix;\n    gsl_vector *original_outcome = d->vector;\n    double ll = 0;\n    double *vals = val_vector->data;\n    for(size_t i=0; i < p->parameters->matrix->size2; i++){\n        val = vals[i];\n        working_data->vector = apop_vector_map(original_outcome, unordered);\n        spare_probit->parameters->matrix = apop_vector_to_matrix(Apop_cv(p->parameters, 1));\n        ll  += apop_log_likelihood(working_data, spare_probit);\n        gsl_vector_free(working_data->vector); //yup. It's inefficient.\n        gsl_matrix_free(spare_probit->parameters->matrix);\n    }\n\treturn ll;\n}\n\nstatic void probit_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *p){\n    Nullcheck_mp(p, )\n    gsl_vector *val_vector = get_category_table(p->data)->vector;\n    if (val_vector->size!=2){\n        gsl_vector * numeric_default = apop_numerical_gradient(d, p);\n        gsl_vector_memcpy(gradient, numeric_default);\n        gsl_vector_free(numeric_default);\n        return;\n    }\n    long double\tcdf, betax, deriv_base;\n    apop_data *betadotx = apop_dot(d, p->parameters); \n    gsl_vector_set_all(gradient, 0);\n    for (size_t i=0; i< d->matrix->size1; i++){\n        betax = apop_data_get(betadotx, i, 0);\n        cdf = gsl_cdf_gaussian_P(-betax, 1);\n        cdf = cdf ? cdf : 1e-10; //prevent -inf in the next step.\n        cdf = cdf<1 ? cdf : 1-1e-10; \n        deriv_base = apop_data_get(d, i, -1)\n                       ?  gsl_ran_gaussian_pdf(-betax, 1) /(1-cdf)\n                       : -gsl_ran_gaussian_pdf(-betax, 1) / cdf;\n        for (size_t j=0; j< d->matrix->size2; j++)\n            *gsl_vector_ptr(gradient, j) += apop_data_get(d, i, j) * deriv_base;\n\t}\n\tapop_data_free(betadotx);\n}\n\napop_model *apop_probit = &(apop_model){\"Probit\", .log_likelihood = multiprobit_log_likelihood,\n    .dsize=-1, .prep = probit_prep};\n\n\n/* amodel apop_multinomial_probit The Multinomial Probit model.\n\n  \\deprecated  Use \\ref apop_probit, which handles multiple options.*/\n\n/////////  Multinomial Logit (plain logit is a special case)\n\nstatic apop_data *multilogit_expected(apop_data *in, apop_model *m){\n    Nullcheck_mpd(in, m, NULL)\n    gsl_matrix *params = m->parameters->matrix;\n    apop_data *out = apop_data_alloc(in->matrix->size1, in->matrix->size1, params->size2+1);\n    for (size_t i=0; i < in->matrix->size1; i ++){\n        Apop_row_v(in, i, observation);\n        Apop_row_v(out, i, outrow);\n        double oneterm;\n        int bestindex = 0;\n        double bestscore = 0;\n        gsl_vector_set(outrow, 0, 1);\n        for (size_t j=0; j < params->size2+1; j ++){\n            if (j == 0){\n                oneterm = 0;\n                gsl_vector_set(outrow, j, 1);\n            } else {\n                gsl_vector *p = Apop_cv(m->parameters, j-1);\n                gsl_blas_ddot(observation, p, &oneterm);\n                gsl_vector_set(outrow, j, exp(oneterm));\n            }\n            if (oneterm > bestscore){\n                bestindex = j;\n                bestscore = oneterm;\n            }\n        }\n        double total = apop_sum(outrow);\n        gsl_vector_scale(outrow, 1/total);\n        apop_data_set(out, i, -1, bestindex);\n    }\n    apop_data *factor_list = get_category_table(m->data);\n    apop_name_add(out->names, factor_list->text[0][0], 'c');\n    apop_name_stack(out->names, m->parameters->names, 'c');\n    return out;\n}\n\nstatic void logit_prep(apop_data *d, apop_model *m){\n    probit_prep(d, m);\n    apop_predict_vtable_add(multilogit_expected, apop_logit);\n}\n\nstatic size_t find_index(double in, double *m, size_t max){\n    size_t i = 0;\n    while (in !=m[i] && i<max) i++;\n    return i;\n}\n\ndouble one_logit_row(apop_data *thisobservation, void *factor_list){\n    //get the $x\\beta_j$ numerator for the appropriate choice:\n    size_t index   = find_index(gsl_vector_get(thisobservation->vector, 0), \n                                factor_list, thisobservation->matrix->size2);\n    Apop_row_v(thisobservation, 0, thisrow);\n    double num = (index==0) ? 0 : gsl_vector_get(thisrow, index-1);\n\n    /* Get the denominator, ln(sum(exp(xbeta))) using the subtract-the-max trick \n     mentioned in the documentation.  Don't forget the implicit beta_0, fixed at \n     zero (so we need to add exp(0-max)). */\n\n    double max = gsl_vector_max(thisrow);\n    gsl_vector_add_constant(thisrow, -max);\n    apop_vector_exp(thisrow);\n    //return num - (max + log(apop_vector_sum(thisrow) +exp(-max)));\n    long double expmax = expl(-max);\n    return num - (max + (isfinite(expmax)? logl(apop_vector_sum(thisrow) +  expmax) : -max) );\n}\n\nstatic long double multilogit_log_likelihood(apop_data *d, apop_model *p){\n    Nullcheck_mpd(d, p, GSL_NAN)\n    Nullcheck(d->matrix, GSL_NAN)\n    //Find X\\beta_i for each row of X and each column of \\beta.\n    apop_data  *xbeta = apop_dot(d, p->parameters);\n    double* factor_list = get_category_table(p->data)->vector->data;\n    xbeta->vector = d->vector; //we'll need this in one_logit_row\n    long double ll = apop_map_sum(xbeta, .fn_rp = one_logit_row, .param=factor_list);\n    xbeta->vector = NULL;\n    apop_data_free(xbeta);\n\treturn ll;\n}\n\n/*\nstatic void dlogit_foreach(apop_data *x, apop_data *gmat, gsl_matrix *beta, apop_data *factor_list){\n  //\\beta_this = choice for the row.\n  //dLL/d\\beta_ij = [(\\beta_i==\\beta_this) ? x_j : 0] - x_i e^(x\\beta_j)/\\sum_k e^(x\\beta_k)\n  //that last term simplifies: x / \\sum_k e^(x(\\beta_k - \\beta_i))\n    Apop_row_v(x, 0, xdata);\n    assert(gmat->matrix->size1 == x->matrix->size2);     //the j index---input vars (incl. 1 column)\n    assert(gmat->matrix->size2 == beta->size2); //the i index---choices\n    assert(xdata->size == beta->size1);//cols of data=variables; rows of output=var.s (cols=choices)\n    size_t choice = find_index(gsl_vector_get(x->vector, 0), factor_list->vector->data, factor_list->vector->size);\n    for (int i=0; i < beta->size2; i++) { //go through choices.\n        gsl_vector *denom = gsl_vector_alloc(beta->size1);\n        gsl_vector_set_all(denom, 1); //see below\n        for (int other=-1; other < (int)beta->size2; other++){\n            if (other != i) { //this block calculates exp(x (otherbeta-thisbeta))\n                Apop_matrix_col(beta, i, thisbeta); \n                gsl_vector *diff = apop_vector_copy(thisbeta);\n                gsl_vector_scale(diff, -1);\n                if (other >=0){//the phantom beta_0 == 0, so that term is e^x(0-beta|i)\n                    Apop_matrix_col(beta, other, otherbeta);\n                    gsl_vector_add(diff, otherbeta);\n                }\n                gsl_vector_mul(diff, xdata);\n                apop_vector_exp(diff);\n                gsl_vector_add(denom, diff);\n                gsl_vector_free(diff);\n            } //else, other==i, and \\beta_i - \\beta_i = 0, and e^{0x} = 1. Thus the gsl_vector_set_all(denom, 1).\n        }\n        for (int j=0; j< xdata->size; j++){ //add to each coefficient of the gradient matrix \n            double pick = (choice-1 == i) ? gsl_vector_get(xdata,j) : 0; //numeraire has no betas.\n            *gsl_matrix_ptr(gmat->matrix, j, i) += pick - gsl_vector_get(xdata, j)/gsl_vector_get(denom, j));\n        }\n        gsl_vector_free(denom);\n    }\n}\n\nstatic void logit_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *p){\n    Nullcheck_mpd(d, p, );\n    apop_data *gradient_matrix = apop_data_calloc(p->parameters->matrix->size1, p->parameters->matrix->size2);\n    apop_data *cats = get_category_table(d);\n    for (int i=0; i< d->matrix->size1; i++)\n        dlogit_foreach(Apop_r(d, i), gradient_matrix, p->parameters->matrix, cats);\n    apop_data_pack(gradient_matrix, gradient);\n    apop_data_free(gradient_matrix);\n}\n*/\n\n//Should this be available everywhere?\nstatic size_t get_draw_size(apop_model *in){\n    Get_vmsizes(in->data); //msize2, firstcol\n    size_t datasize = (in->dsize == -1)\n                        ? msize2-firstcol\n                        : in->dsize;\n    Apop_assert(datasize > 0, \"I don't know the size of the X \"\n                              \"vector to draw for drawing your logit. \"\n                              \"See the apop_ols RNG documentation for details.\");\n    return datasize;\n}\n\nstatic int logit_rng(double *out, gsl_rng *r, apop_model *m){\n    //X is drawn from the input distribution, then Y = X\\beta + epsilon\n    apop_lm_settings *olp = apop_settings_get_group(m, apop_lm);\n    if (!olp) olp=Apop_model_add_group(m, apop_lm\n                                , .input_distribution= apop_estimate(m->data, apop_pmf));\n\n    size_t datasize = get_draw_size(olp->input_distribution);\n    apop_data *x = apop_data_alloc(datasize);\n    apop_draw(x->vector->data, r, olp->input_distribution);\n\n    apop_data *xbeta = apop_dot(x, m->parameters);\n    apop_data *zero = apop_data_calloc(1);\n    apop_data *xbeta_w_numeraire = apop_data_stack(zero, xbeta, 'r');\n    apop_data_free(xbeta);\n    apop_data_free(zero);\n    apop_vector_exp(xbeta_w_numeraire->vector);\n    apop_vector_normalize(xbeta_w_numeraire->vector);\n    xbeta_w_numeraire->weights = xbeta_w_numeraire->vector;\n    xbeta_w_numeraire->vector = NULL;\n\n    Staticdef(apop_model*, a_pmf, apop_model_copy(apop_pmf))\n    a_pmf->dsize = 0; //so draws produce a row number\n    a_pmf->data = xbeta_w_numeraire;\n    Apop_stopif(apop_draw(out, r, a_pmf), return 1, \n                        0, \"Couldn't draw from a PMF populated using X'β.\");\n    if (m->dsize>1) memcpy(out+1, x->vector->data, datasize *sizeof(double));\n    apop_data_free(x);\n    return 0;\n}\n\n\n/* \\amodel apop_logit\n\nApophenia makes no distinction between the bivariate logit and the multinomial logit. This does both.\n\n  The likelihood of choosing item \\f$j\\f$ is:\n  \\f$e^{x\\beta_j}/ (\\sum_i{e^{x\\beta_i}})\\f$\n\n  so the log likelihood is \n  \\f$x\\beta_j  - ln(\\sum_i{e^{x\\beta_i}})\\f$\n\n\\adoc    Input_format  The first column of the data matrix this model expects is zeros,\nones, ..., enumerating the factors; to get there, try \\ref apop_data_to_factors; if\nyou  forget to run it, I'll run it on the first data column for you.  The remaining\ncolumns are values of the independent variables. Thus, the model will return [(data\ncolumns)-1]\\f$\\times\\f$[(option count)-1] parameters.  Column names list factors in the dependent variables;\nrow names list the independent variables.\n\n\\adoc    Parameter_format  As above.    \n\\adoc    Prep_routine You will probably want to convert some column of your data into\nfactors, via \\ref apop_data_to_factors. If you do, then that adds a page of factors\nto your data set (and of course adjusts the data itself). If I find a factor page,\nI will use that info; if not, then I will run \\ref apop_data_to_factors on the first\ncolumn (the vector if there is one, else the first column of the matrix.)\n\nAlso, if there is no vector, then I will move the first column of the matrix, and\nreplace that matrix column with a constant column of ones, just like with OLS.\n\n\\adoc    settings   None, but see above about seeking a factor page in the input data.\n\n\\adoc RNG Much like the \\ref apop_ols RNG, qv. Returns the category drawn.\n\n<!--\n\\li PS: Here is a nice trick used in the implementation. let \\f$y_i = x\\beta_i\\f$.\n  Then\n\\f[ln(\\sum_i{e^{x\\beta_i}}) = max(y_i) + ln(\\sum_i{e^{y_i - max(y_i)}}).\\f]\n\nThe elements of the sum are all now exp(something negative), so \noverflow won't happen, and if there's underflow, then that term\nmust not have been very important. [This trick is attributed to Tom\nMinka, who implemented it in his Lightspeed Matlab toolkit.]\n-->\n\nHere is an artifical example which clarifies the simplest use of the model:\n\n\\include fake_logit.c\n\nHere is an example using data from a U.S. Congressional vote, including one text\nvariable that has to be converted to factors, and one to convert to dummies.\nA loop then calculates the customary p-values.\n\n\\include logit.c\n*/\napop_model *apop_logit = &(apop_model){.name=\"Logit\", .log_likelihood = multilogit_log_likelihood, .dsize=-1,\n/*.score = logit_dlog_likelihood,*/ .prep = logit_prep, .draw=logit_rng\n};\n"
  },
  {
    "path": "model/apop_t.c",
    "content": "/* apop_t.c: the t-distribution, for modeling purposes.\nCopyright (c) 2009, 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n#include \"apop_internal.h\"\n\n//There used to be a χ^2 and F model, but nobody used them and they were largely untested.\n//They last appeared in commit 2b4715111 if you'd like to recover them.\n\nstatic void apop_t_estimate(apop_data *d, apop_model *m){\n    Apop_stopif(!d, m->error='d'; return, 0, \"No data with which to count df. (the default estimation method)\");\n    Get_vmsizes(d); //vsize, msize1, msize2, tsize\n    double vmu = vsize ? apop_mean(d->vector) : 0;\n    double v_sum_sq = vsize ? apop_var(d->vector)*(vsize-1) : 0;\n    double m_sum_sq = 0;\n    double mmu = 0;\n   if (msize1) {\n       apop_matrix_mean_and_var(d->matrix, &mmu, &m_sum_sq);\n       m_sum_sq *= msize1*msize2-1;\n   }\n    apop_data_add_named_elmt(m->parameters, \"μ\", (vmu *vsize + mmu * msize1*msize2)/tsize);\n    apop_data_add_named_elmt(m->parameters, \"σ\", sqrt(((tsize-3.)/(tsize-1)) * (v_sum_sq + m_sum_sq)/(tsize-1))); \n    apop_data_add_named_elmt(m->parameters, \"df\", tsize-1);\n    apop_data_add_named_elmt(m->info, \"log likelihood\", m->log_likelihood(d, m));\n}\n\nstatic double one_t(double in, void *params){ \n    double mu = ((double*)params)[0];\n    double sigma = ((double*)params)[1];\n    double df = ((double*)params)[2];\n    return log(gsl_ran_tdist_pdf((in-mu)/sigma, df)); \n}\n\nstatic long double apop_tdist_llike(apop_data *d, apop_model *m){ \n    Nullcheck_mpd(d, m, GSL_NAN);\n    double *params = m->parameters->vector->data;\n    double sigma = params[1];\n    Get_vmsizes(d); //tsize\n    return apop_map_sum(d, .fn_dp=one_t, .param=params) - tsize * log(sigma);\n}\n\nint apop_t_dist_draw(double *out, gsl_rng *r, apop_model *m){ \n    Nullcheck_mp(m, 1);\n    double mu = m->parameters->vector->data[0];\n    double sigma = m->parameters->vector->data[1];\n    double df = m->parameters->vector->data[2];\n    *out = gsl_ran_tdist(r, df) * sigma + mu;\n    return 0;\n}\n\nstatic long double apop_t_dist_cdf(apop_data *in, apop_model *m){\n    Nullcheck_mp(m, GSL_NAN);\n    double val = in->vector ? apop_data_get(in, 0, -1) : apop_data_get(in, 0, 0);\n    double mu = m->parameters->vector->data[0];\n    double sigma = m->parameters->vector->data[1];\n    double df = m->parameters->vector->data[2];\n    return gsl_cdf_tdist_P ((val-mu)/sigma, df);\n}\n\nstatic long double apop_t_dist_constraint(apop_data *beta, apop_model *m){\n    Staticdef(apop_data *, d_constr, apop_data_falloc((2,2,3),\n                             0, 0, 1, 0,  //0 < sigma\n                            .9, 0, 0, 1)); //.9 < df\n    double out= apop_linear_constraint(m->parameters->vector, d_constr);\n    return out;\n}\n\n\n/*\\amodel apop_t_distribution The t distribution, primarily for descriptive purposes.\n\nIf you want to test a hypothesis, you probably don't need this, and should instead\nuse \\ref apop_test.\n\nIn that world, the \\f$t\\f$ distribution is parameter free. The data are\nassumed to be normalized to be based on a mean zero, variance one process, you get\nthe degrees of freedom from the size of the data, and the distribution is thus fixed.\n\nFor modeling purposes, more could be done. For example, the t-distribution is a favorite\nproxy for Normal-like situations where there are fat tails relative to the Normal\n(i.e., high kurtosis). Or, you may just prefer not to take the step of normalizing\nyour data---one could easily rewrite the theorems underlying the t-distribution without\nthe normalizations.\n\nIn such a case, the researcher would not want to fix the \\f$df\\f$, because \\f$df\\f$\nindicates the fatness of the tails, which has some optimal value given the data.\nThus, there are two modes of use for these distributions:\n\n\\li Parameterized, testing style: the degrees of freedom are determined\nfrom the data, and all necessary normalizations are assumed. Thus, this code---\n\n\\code\napop_data *t_for_testing = apop_estimate(data, apop_t)\n\\endcode\n\n---will return exactly the type of \\f$t\\f$-distribution one would use for testing. \n\n\\li By removing the \\c estimate method---\n\\code\napop_model *spare_t = apop_model_copy(apop_t);\nspare_t->estimate = NULL;\napop_model *best_fitting_t = apop_estimate(your_data, spare_t);\n\\endcode\n---I will find the best \\f$df\\f$ via maximum likelihood, which may be desirable for\nto find the best-fitting model for descriptive purposes.\n\n\\adoc    Input_format     Unordered list of scalars in the matrix and/or vector.     \n\\adoc    Parameter_format  Three scalars in the \\c vector element:<br>\n<tt>double mu=apop_data_get(estimated_model->parameters, 0)</tt><br>\n<tt>double sigma=apop_data_get(estimated_model->parameters, 1)</tt><br>\n<tt>double df=apop_data_get(estimated_model->parameters, 2)</tt>\n\\adoc    Estimate_results  I'll just count elements and set \\f$df = n-1\\f$. If you set the \\c estimate method to \\c NULL, via MLE.\n\\adoc    settings   \\ref apop_mle_settings, \\ref apop_parts_wanted_settings   \n*/\n\napop_model *apop_t_distribution  = &(apop_model){\"t distribution\", 3, .dsize=1, .estimate = apop_t_estimate, \n         .log_likelihood = apop_tdist_llike, .draw=apop_t_dist_draw, .cdf=apop_t_dist_cdf,\n         .constraint=apop_t_dist_constraint };\n"
  },
  {
    "path": "model/apop_uniform.c",
    "content": "/* apop_uniform.c \n Copyright (c) 2007, 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\n\n/* \\amodel apop_uniform  This is the two-parameter version of the Uniform,\nexpressing a uniform distribution over [a, b].\n\nThe MLE of this distribution is simply a = min(your data); b = max(your data).\nOften useful for the RNG, such as when you have a Uniform prior model.\n\n\\adoc    Input_format     One scalar observation per row (in the \\c matrix or \\c vector).  \n\\adoc    Parameter_format  Zeroth vector element is \\f$a\\f$, the min;\n                          element one is \\f$b\\f$, the max. \n\\adoc    settings  None.    \n      */\n\nstatic void getminmax(apop_data *d, double *min, double *max){\n    Get_vmsizes(d) //msize1, vsize\n    *min = GSL_MIN(msize1 ? gsl_matrix_min(d->matrix) : GSL_POSINF,\n                    vsize ? gsl_vector_min(d->vector) : GSL_POSINF);\n    *max = GSL_MAX(msize1 ? gsl_matrix_max(d->matrix) : GSL_NEGINF,\n                    vsize ? gsl_vector_max(d->vector) : GSL_NEGINF);\n}\n\nstatic long double unif_ll(apop_data *d, apop_model *m){\n    Nullcheck_mpd(d, m, GSL_NAN);\n    Get_vmsizes(d) //tsize\n    double min, max;\n    getminmax(d, &min, &max);\n    if (min>= m->parameters->vector->data[0] && max <= m->parameters->vector->data[1])\n        return -log(m->parameters->vector->data[1] - m->parameters->vector->data[0]) * tsize;\n    return GSL_NEGINF;\n}\n\nstatic long double unif_p(apop_data *d, apop_model *m){\n    Nullcheck_mpd(d, m, GSL_NAN);\n    Get_vmsizes(d) //tsize\n    double min, max;\n    getminmax(d, &min, &max);\n    if (min>= m->parameters->vector->data[0] && max<= m->parameters->vector->data[1])\n        return pow(m->parameters->vector->data[1] - m->parameters->vector->data[0], -tsize);\n    return 0;\n}\n\n/* \\adoc estimated_info   Reports <tt>log likelihood</tt>. */\nstatic void uniform_estimate(apop_data * data,  apop_model *est){\n    Nullcheck_d(data, );\n    apop_name_add(est->parameters->names, \"min\", 'r');\n    apop_name_add(est->parameters->names, \"max\", 'r');\n    getminmax(data, est->parameters->vector->data+0, est->parameters->vector->data+1);\n    apop_data_add_named_elmt(est->info, \"log likelihood\", unif_ll(data, est));\n}\n\nstatic long double unif_cdf(apop_data *d, apop_model *m){\n    Nullcheck_mpd(d, m, GSL_NAN);\n    Get_vmsizes(d) //tsize\n    double min = m->parameters->vector->data[0];\n    double max = m->parameters->vector->data[1];\n    double val = apop_data_get(d, 0, vsize ? -1: 0);\n    if (val <= min) return 0;\n    if (val >=max)  return 1;\n    return (val-min)/(max-min);\n}\n\nstatic int uniform_rng(double *out, gsl_rng *r, apop_model* eps){\n    *out =  gsl_rng_uniform(r) *(eps->parameters->vector->data[1]- eps->parameters->vector->data[0])+ eps->parameters->vector->data[0];\n    return 0;\n}\n\napop_model *apop_uniform = &(apop_model){\"Uniform distribution\", 2, 0, 0,  .dsize=1,\n    .estimate = uniform_estimate,  .p = unif_p,.log_likelihood = unif_ll,   \n    .draw = uniform_rng, .cdf = unif_cdf};\n\n/* \\amodel apop_improper_uniform The improper uniform returns \\f$P(x) = 1\\f$ for every value of x, all the\ntime (and thus, log likelihood(x)=0).  It has zero parameters.\n\n\\li See also the \\ref apop_uniform model.\n\n\\adoc    Input_format      Ignored.\n\\adoc    Parameter_format  \\c NULL \n\\adoc    estimated_parameters   \\c NULL\n\\adoc    RNG The \\c draw function makes no sense, and therefore sets the value in <tt>*out</tt> to \\c NAN, returns 1, and prints a warning if <tt>apop_opts.verbose >=1</tt>.\n\\adoc    CDF Half of the distribution is less than every given point, so the CDF always\n             returns 0.5. One could perhaps make an argument that this should really be\n             infinity, but a half is more in the spirit of the distribution's\n             use to represent a lack of information. \n\\adoc    settings None. \n          */\n\nstatic void improper_uniform_estimate(apop_data * data,  apop_model *m){ }\n\nstatic long double improper_unif_ll(apop_data *d, apop_model *m){ return 0; }\nstatic long double improper_unif_cdf(apop_data *d, apop_model *m){ return 0.5; }\nstatic long double improper_unif_p (apop_data *d, apop_model *m){ return 1; }\n\nstatic int improper_uniform_rng(double *out, gsl_rng *r, apop_model* eps){\n    Apop_stopif(1, *out=GSL_NAN; return 1, 1, \"It doesn't make sense to make random draws from an improper Uniform.\");\n}\n\napop_model *apop_improper_uniform = &(apop_model){\"Improper uniform distribution\", 2, 0, 0,  .dsize=1,\n    .estimate = improper_uniform_estimate,  .p = improper_unif_p,\n    .log_likelihood = improper_unif_ll,  .draw = improper_uniform_rng,\n    .cdf = improper_unif_cdf};\n"
  },
  {
    "path": "model/apop_wishart.c",
    "content": "/* apop_wishart.c: the Wishart distribution, for modeling purposes.\nCopyright (c) 2009 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n#include \"apop_internal.h\"\n\n#if 0\n\nstatic long double pos_def(apop_data *data, apop_model *candidate){\n    return apop_matrix_to_positive_semidefinite(candidate->parameters->matrix);\n}\n\ntypedef struct{\n    double df;\n    gsl_matrix *paraminv;\n    int len;\n} wishartstruct_t;\n\nstatic double one_wishart_row(gsl_vector *in, void *ws_in){\n    wishartstruct_t *ws = ws_in;\n    gsl_matrix *invparams_dot_data = gsl_matrix_alloc(ws->len, ws->len);\n    apop_data *square= apop_data_alloc(ws->len, ws->len);\n    apop_data_unpack(in, square);\n    double datadet = apop_matrix_determinant(square->matrix);\n    assert(datadet);\n\n    gsl_blas_dgemm(CblasNoTrans, CblasNoTrans, 1, ws->paraminv, square->matrix, 0, invparams_dot_data);   \n    gsl_vector_view diag = gsl_matrix_diagonal(invparams_dot_data);\n    double trace = apop_sum(&diag.vector);\n    gsl_matrix_free(invparams_dot_data);\n    apop_data_free(square);\n    double out= log(datadet) * (ws->df - ws->len -1.)/2. - trace*ws->df/2.;\n    assert(isfinite(out));\n    return out;\n}\n\nstatic long double wishart_ll(apop_data *in, apop_model *m){\n    Nullcheck_mpd(in, m, GSL_NAN);\n    wishartstruct_t ws = {\n            .paraminv = apop_matrix_inverse(m->parameters->matrix),\n            .len = sqrt(in->matrix->size2),\n            .df = m->parameters->vector->data[0]\n        };\n    double paramdet = apop_matrix_determinant(m->parameters->matrix);\n    if (paramdet < 1e-3) return GSL_NEGINF;\n    double ll =  apop_map_sum(in, .fn_vp = one_wishart_row, .param=&ws, .part='r');\n    double k = log(ws.df)*ws.df/2.;\n    k -= M_LN2 * ws.len* ws.df/2.;\n    k -= log(paramdet) * ws.df/2.;\n    k -= apop_multivariate_lngamma(ws.df/2., ws.len);\n    return ll + k*in->matrix->size1;\n}\n\nstatic int apop_wishart_draw(double *out, gsl_rng *r, apop_model *m){\n    /*\nTranslated from the Fortran by BK. Fortran comments:\n\nC          SUBROUTINE DWSHRT(D, N, NP, NNP, SB, SA)\nC\nC       ALGORITHM AS 53  APPL. STATIST. (1972) VOL.21, NO.3\nC\nC     Wishart variate generator.  On output, SA is an upper-triangular\nC     matrix of size NP * NP [...]\nC     whose elements have a Wishart(N, SIGMA) distribution.\n*/\n    Nullcheck_mp(m, );\n    int np = m->parameters->matrix->size1;\n    int n = m->parameters->vector->data[0];\n    if (!m->more) { \n        gsl_matrix *ccc = apop_matrix_copy(m->parameters->matrix);\n        gsl_linalg_cholesky_decomp(ccc);\n        for (int i=0; i < ccc->size1; i++) //zero out the upper diagonal\n            for (int j=i+1; j < ccc->size2; j++)\n                gsl_matrix_set(ccc, i, j, 0);\n        m->more = apop_matrix_to_data(ccc);\n        m->more_size = sizeof(apop_data);\n    }\n    apop_data *Chol = m->more;\n    apop_data *rmatrix = apop_data_calloc(np, np);\n    Staticdef(apop_model *, std_normal, apop_model_set_parameters(apop_normal, 0, 1));\n\n//C     Load diagonal elements with square root of chi-square variates\n    for(int i = 0; i< np; i++){\n        int DF = n - i;\n        apop_data_set(rmatrix, i, i, sqrt(gsl_ran_chisq(r, DF)));\n    }\n    \n    for(int i = 1; i< np; i++) //off-diagonal triangles: Normals.\n          for(int j = 0; j< i; j++){\n            double ndraw;\n            apop_draw(&ndraw, r, std_normal);\n            assert (!gsl_isnan(ndraw));\n            apop_data_set(rmatrix, i, j, ndraw);\n          }\n    //Now find C * rand * rand' * C'\n    apop_data *cr = apop_dot(Chol, rmatrix);\n    apop_data *crr = apop_dot(cr, rmatrix, .form2='t');\n    apop_data *crrc = apop_dot(crr, Chol, .form2='t');\n    memmove(out, crrc->matrix->data, sizeof(double)*np*np);\n    apop_data_free(rmatrix); apop_data_free(cr);\n    apop_data_free(crrc);    apop_data_free(crr);\n    return 0;\n}\n\nstatic long double wishart_constraint(apop_data *d, apop_model *m){\n    double out= apop_matrix_to_positive_semidefinite(m->parameters->matrix);\n    double df_minus_dim = m->parameters->vector->data[0] - (m->parameters->matrix->size1-2)-1e-4;\n    if (df_minus_dim <= 0){\n        out += df_minus_dim;\n        m->parameters->vector->data[0] = m->parameters->matrix->size1+2+1e-4;\n    }\n    return out;\n}\n\nstatic void wishart_prep(apop_data *d, apop_model *m){\n    if (m->parameters) return;//already prepped\n     m->parameters = apop_data_alloc(1,sqrt(d->matrix->size2),sqrt(d->matrix->size2));\n}\n\nstatic long double fixed_wishart_ll(apop_data *in, apop_model *m){\n    //Let the mean of the input covariances be CM.\n    //We need to estimate the df via MLE.\n    //However, the right value of the wishart covariance grid is CM/df.\n    //So, for a value of df that we're trying, scale CM appropriately.\n\n    gsl_matrix_scale(m->parameters->matrix, 1./m->parameters->vector->data[0]);\n    double out = wishart_ll(in, m);\n    gsl_matrix_scale(m->parameters->matrix, m->parameters->vector->data[0]);\n    return out;\n}\n\nstatic void wishart_estimate(apop_data *d, apop_model *m){\n    Nullcheck_m(m, );\n    //apop_data_set(m->parameters, 0, -1, d->matrix->size1);\n    //Start with cov matrix via mean of inputs; df=NaN\n    apop_data_set(m->parameters, 0, -1, GSL_NAN);\n    apop_data *summ=apop_data_summarize(d);\n    Apop_col_t(summ, \"mean\", means);\n    gsl_vector *t = m->parameters->vector; //mask this while unpacking\n    m->parameters->vector=NULL;\n    apop_data_unpack(means, m->parameters);\n    m->parameters->vector=t;\n\n    //Estimate a model with fixed cov matrix and blank (NaN) df.\n    apop_model *modified_wish = apop_model_copy(m);\n    modified_wish->log_likelihood = fixed_wishart_ll;\n    apop_model *fixed_wish = apop_model_fix_params(modified_wish);\n    apop_model *est_via_fix = apop_estimate(d, fixed_wish);\n\n    //copy df from fixed version to the real thing; clean up.\n    t->data[0] = apop_data_get(est_via_fix->parameters, 0, -1);\n    gsl_matrix_scale(m->parameters->matrix, 1./t->data[0]);\n    apop_data_free(summ);\n    apop_model_free(modified_wish);\n    apop_model_free(fixed_wish);\n}\n\n/* amodel apop_wishart The Wishart distribution, which is currently somewhat untested. \n\nHere's the likelihood function. \\f$p\\f$ is the dimension of the data and covariance\nmatrix, \\f$n\\f$ is the degrees of freedom, \\f$\\mathbf{V}\\f$ is the \\f$p\\times p\\f$\nmatrix of Wishart parameters, and \\f${\\mathbf{W}}\\f$ is the \\f$p\\times p\\f$ matrix whose\nlikelihood is being evaluated.  \\f$\\Gamma_p(\\cdot)\\f$ is the \\ref apop_multivariate_gamma\n\"multivariate gamma function\".\n\n\\f[\nP(\\mathbf{W}, \\mathbf{V}) = \\frac{\\left|\\mathbf{W}\\right|^\\frac{n-p-1}{2}}\n                         {2^\\frac{np}{2}\\left|{\\mathbf V}\\right|^\\frac{n}{2}\\Gamma_p(\\frac{n}{2})} \\exp\\left(-\\frac{1}{2}{\\rm Tr}({\\mathbf V}^{-1}\\mathbf{W})\\right)\\f]\n\nSee also notes in \\ref tfchi.\n\nadoc    Input_format     Each row of the input matrix is a single square matrix,\n                      flattened; use \\ref apop_data_pack to convert your\n                      sequence of matrices into rows.     \nadoc    Parameter_format  \\f$N\\f$ (the degrees of freedom) is the zeroth element of the vector. The matrix holds the matrix of parameters.\nadoc    Estimate_results  Via MLE.    \nadoc    Prep_routine   Allocates the parameters based on the size of the input data.       \nadoc    RNG  You can use this to generate random covariance matrices, should you need them. See example below. \nadoc    settings   \\ref apop_mle_settings, \\ref apop_parts_wanted_settings    \nadoc    Examples Making some random draws:\n\n\\code\napop_model *m = apop_estimate(yr_data, apop_wishart);\ngsl_matrix *rmatrix = gsl_matrix_alloc(10, 10);\ngsl_rng *r = apop_rng_alloc(8765);\nfor (int i=0; i< 1e8; i++){\n    apop_draw(rmatrix->data, r, m);\n    do_math_with_matrix(rmatrix);\n}\n\\endcode */\napop_model *apop_wishart  =\n    &(apop_model){\"Wishart distribution\", 1, -1, -1, .dsize=-1, .estimate=wishart_estimate, .draw = apop_wishart_draw,\n         .log_likelihood = wishart_ll, .constraint = pos_def, .prep=wishart_prep, .constraint=wishart_constraint};\n#endif\n"
  },
  {
    "path": "model/apop_yule.c",
    "content": "/* The Yule distribution. A special case of the Waring.\n\nCopyright (c) 2005--2007, 2009, 2011 by Ben Klemens.  Licensed under the GPLv2; see COPYING. \n\n\\amodel apop_yule\nThe special case of the \\ref apop_waring \"Waring\" where \\f$ \\alpha = 0.\t\\f$<br>\n\n\\f$ Y(x, b) \t= (b-1) \\gamma(b) \\gamma(k) / \\gamma(k+b)\t\t\t\\f$\n\n\\f$ \\ln Y(x, b)\t= \\ln(b-1) + ln\\gamma(b) + \\ln\\gamma(k) - \\ln\\gamma(k+b)\t\\f$\n\n\\f$ d\\ln Y/db\t= 1/(b-1)  + \\psi(b) - \\psi(k+b)\t\t\t\t\\f$\n\n\\adoc    Input_format     \nOne scalar observation per row (in the \\c matrix or \\c vector).  \nSee also \\ref apop_data_rank_compress for means of dealing with one more input data format.\n\n\\adoc    Parameter_format  One element in the parameter set's vector.\n\\adoc    Settings   MLE-type: \\ref apop_mle_settings, \\ref apop_parts_wanted_settings    */\n\n#include \"apop_internal.h\"\n\nstatic long double yule_constraint(apop_data *returned_beta, apop_model *m){\n  Nullcheck_mp(m, GSL_NAN);\n    //constraint is 1 < beta_1\n  Staticdef(apop_data *, constraint, apop_data_falloc((1,1,1), 1, 1));\n    return apop_linear_constraint(m->parameters->vector, constraint, 1e-4);\n}\n\nstatic double apply_me(double pt, void *bb){\n    double ln_k = (pt>=1) \n                   ? gsl_sf_lngamma(pt)\n                   : 0;\n    double ln_bb_k = gsl_sf_lngamma(pt+*(double*)bb);\n    return ln_k - ln_bb_k;\n}\n\nstatic double dapply_me(double pt, void *bb){ return -gsl_sf_psi(pt+*(double*)bb); }\n\nstatic long double yule_log_likelihood(apop_data *d, apop_model *m){\n  Nullcheck_mpd(d, m, GSL_NAN);\n  Get_vmsizes(d) //tsize\n    double bb = gsl_vector_get(m->parameters->vector, 0);\n    long double ln_bb        = gsl_sf_lngamma(bb),\n                ln_bb_less_1 = log(bb-1);\n    double      likelihood   = apop_map_sum(d, .fn_dp = apply_me,.param= &bb);\n\treturn likelihood + (ln_bb_less_1 + ln_bb) * tsize;\n}\n\nstatic void yule_dlog_likelihood(apop_data *d, gsl_vector *gradient, apop_model *m){\n  Nullcheck_mpd(d, m, );\n  Get_vmsizes(d) //tsize\n\t//Psi is the derivative of the log gamma function.\n    double bb  = gsl_vector_get(m->parameters->vector, 0);\n    long double bb_minus_one_inv= 1/(bb-1),\n\t\t        psi_bb\t        = gsl_sf_psi(bb);\n    double d_bb  = apop_map_sum(d, .fn_dp=dapply_me, .param=&bb);\n    d_bb += (bb_minus_one_inv + psi_bb) * tsize;\n\tgsl_vector_set(gradient, 0, d_bb);\n}\n\n/* \\adoc RNG From <a href=\"http://cgm.cs.mcgill.ca/~luc/mbookindex.html\">Devroye (1986)</a>, p 553.  */\nstatic int yule_rng( double *out, gsl_rng * r, apop_model *a){\n\tdouble e1 = gsl_ran_exponential(r, 1);\n\tdouble e2 = gsl_ran_exponential(r, 1);\n\tint x = GSL_MAX((int) (- e1  / log(1 - exp(-e2 / (*a->parameters->vector->data -1)))), 0);\n\t*out =  x + 1;\t//we rounded down to floor, but want ceil.\n    return 0;\n}\n\nstatic void yule_prep(apop_data *data, apop_model *params){\n    apop_score_vtable_add(yule_dlog_likelihood, apop_yule);\n    apop_model_clear(data, params);\n}\n\napop_model *apop_yule = &(apop_model){\"Yule distribution\", 1,0,0, .dsize=1, .log_likelihood = yule_log_likelihood, \n    .prep = yule_prep, .constraint = yule_constraint, .draw = yule_rng};\n"
  },
  {
    "path": "model/apop_zipf.c",
    "content": "/* The Zipf distribution.\n\nCopyright (c) 2005--2009, 2011 by Ben Klemens.  Licensed under the GPLv2; see COPYING.\n\n\\amodel apop_zipf\nWikipedia has notes on the <a href=\"http://en.wikipedia.org/wiki/Zipf_distribution\">Zipf distribution</a>. \n\n\\f$Z(a)   = {1\\over \\zeta(a) * i^a}        \\f$\n\n\\f$lnZ(a) = -(\\log(\\zeta(a)) + a \\log(i))    \\f$\n\n\\adoc    Input_format     One scalar observation per row (in the \\c matrix or \\c vector).  \nSee also \\ref apop_data_rank_compress for means of dealing with one more input data format.\n\nSee also \\ref apop_data_rank_compress for means of dealing with one more input data format.\n\n\\adoc    Parameter_format One item in the parameter set's vector.    \n\\adoc    Settings  \\ref apop_mle_settings\n*/\n\n#include \"apop_internal.h\"\n#include <gsl/gsl_sf_zeta.h>\n\nstatic long double zipf_constraint(apop_data *returned_beta, apop_model *m){\n    //constraint is 1 < beta_1\n    Nullcheck_mp(m, GSL_NAN);\n    Staticdef(apop_data *, constraint, apop_data_falloc((1,1,1), 1, 1));\n    return apop_linear_constraint(m->parameters->vector, constraint, 1e-4);\n}\n\nstatic long double zipf_log_likelihood(apop_data *d, apop_model *m){\n    Nullcheck_mpd(d, m, GSL_NAN);\n    Get_vmsizes(d) //tsize\n    long double bb = apop_data_get(m->parameters, 0, -1);\n    Apop_stopif(isnan(bb) || bb < 1, return GSL_NAN, 0, \"Zipf needs a parameter >=1; \"\n                                              \"got %Lg. Returning NaN.\", bb); \n    double like = -apop_map_sum(d, log) * bb;\n    like -= log(gsl_sf_zeta(bb)) * tsize;\n    return like;\n}    \n\n/*  \\adoc RNG Returns an ordinal ranking, starting from 1.\n\nFrom <a href=\"http://cgm.cs.mcgill.ca/~luc/mbookindex.html\">Devroye (1986)</a>, Chapter 10, p 551.  */\nstatic int zipf_rng(double *out, gsl_rng* r, apop_model *param){\n    Nullcheck_mp(param, 1);\n    double a = apop_data_get(param->parameters, 0, -1);\n    Apop_stopif(isnan(a) || a < 1, *out=GSL_NAN; return 1, \n            0, \"Zipf needs a parameter >=1; got %g. Setting *out to NAN.\", a); \n    int x;\n    long double u, v, t, \n            b    = powl(2, a-1), \n            ainv = -(1.0/(a-1));\n    do {\n        u = gsl_rng_uniform(r);\n        v = gsl_rng_uniform(r);\n        x = powl(u, ainv);\n        t = powl((1.0 + 1.0/x), (a-1));\n    } while (v * x * (t-1.0)/(b-1) > t/b);\n    *out = x;\n    return 0;\n}\n\napop_model *apop_zipf = &(apop_model){\"Zipf distribution\", 1,0,0, .dsize=1,\n     .log_likelihood = zipf_log_likelihood, .constraint = zipf_constraint, .draw = zipf_rng};\n"
  },
  {
    "path": "tests/Makefile.am",
    "content": "\nif EXTENDED_TESTS\nEXTRA_TESTS = distribution_tests \\\n\tlognormal_test \\\n\trake_test \\\n\ttest_kernel_ll \\\n\tupdate_via_rng \\\n\t$(top_builddir)/eg/cross_models \\\n\t$(top_builddir)/eg/dconstrain \\\n\t$(top_builddir)/eg/entropy_model \\\n\t$(top_builddir)/eg/faithful \\\n\t$(top_builddir)/eg/f_test \\\n\t$(top_builddir)/eg/fix_params \\\n\t$(top_builddir)/eg/hills2 \\\n\t$(top_builddir)/eg/jack \\\n\t$(top_builddir)/eg/jacobian \\\n\t$(top_builddir)/eg/ml_imputation \\\n\t$(top_builddir)/eg/pmf_test \\\n\t$(top_builddir)/eg/some_cdfs \\\n\t$(top_builddir)/eg/test_kl_divergence \\\n\t$(top_builddir)/eg/test_ranks \\\n\t$(top_builddir)/eg/test_updating \\\n\t$(top_builddir)/eg/transform\nelse\n\tEXTRA_TESTS =\nendif\n\ncheck_PROGRAMS= \\\n\tdb_tests \\\n\terror_test \\\n\tfactors \\\n\tnist_tests \\\n\tsort_example \\\n\ttest_apop \\\n\t$(top_builddir)/eg/apop_map_row \\\n\t$(top_builddir)/eg/binning \\\n\t$(top_builddir)/eg/boot_clt \\\n\t$(top_builddir)/eg/data_fill \\\n\t$(top_builddir)/eg/draw_to_db \\\n\t$(top_builddir)/eg/db_fns \\\n\t$(top_builddir)/eg/dot_products \\\n\t$(top_builddir)/eg/entropy_vector \\\n\t$(top_builddir)/eg/iv \\\n\t$(top_builddir)/eg/ks_tests \\\n\t$(top_builddir)/eg/logit \\\n\t$(top_builddir)/eg/fake_logit \\\n\t$(top_builddir)/eg/normalization_demo \\\n\t$(top_builddir)/eg/ols \\\n\t$(top_builddir)/eg/ols_oneliner \\\n\t$(top_builddir)/eg/parameterization \\\n\t$(top_builddir)/eg/simple_subsets \\\n\t$(top_builddir)/eg/t_test_by_rows \\\n\t$(top_builddir)/eg/test_distances \\\n\t$(top_builddir)/eg/test_fisher \\\n\t$(top_builddir)/eg/test_harmonic \\\n\t$(top_builddir)/eg/test_pruning \\\n\t$(top_builddir)/eg/test_regex $(EXTRA_TESTS)\n\nTESTS = \\\n\tutilities_test \\\n\t$(check_PROGRAMS)\n\nAM_CFLAGS = \\\n\t-DTesting \\\n\t-DDatadir=\\\"$(top_srcdir)/tests/\\\" \\\n\t-I$(top_srcdir)/tests \\\n\t-I$(top_srcdir) \\\n\t$(GSL_CFLAGS)\n\nAM_LDFLAGS = \\\n\t$(top_builddir)/libapophenia.la \\\n\t$(GSL_LIBS)\n\nDATA_DIST = \\\n\tdata \\\n\tdata-mixed \\\n\tprinting_sample \\\n\ttest_data \\\n\ttest_data2 \\\n\ttest_data_nans\\\n\ttest_data_fixed_width \\\n\tamash_vote_analysis.csv \\\n\tnumacc4.dat \\\n\tpontius.dat \\\n\twampler1.dat \\\n\tfaith.data \\\n\tsort_tests.c\n\nEXTRA_DIST = \\\n\tReadme \\\n\t$(DATA_DIST)\n\nCLEANFILES = \\\n\tff.db \\\n\truns.db \\\n\ttd.db \\\n\tdraws-k \\\n\tdraws-k2 \\\n\tdraws-mvN \\\n\tdraws-N \\\n\tdraws-std_multinormal \\\n\tdraws-std_normal \\\n\tthe_data.txt \\\n\tprint_test.out \\\n\txxx\n"
  },
  {
    "path": "tests/Readme",
    "content": "Much of this directory runs tests from NIST. You can look at\nnist_tests.c to see the level of precision at which various operations work.\nhttp://www.itl.nist.gov/div898/strd/\n"
  },
  {
    "path": "tests/amash_vote_analysis.csv",
    "content": "# Votes for the Amash amendment. Complied by Josh Tauberer of govtrack.us\r\n# From http://razor.occams.info/pubdocs/amash_vote_analysis.csv\r\n# For the story, see http://razor.occams.info/blog/2013/07/27/defense-dollars-arent-a-better-predictor-of-the-amash-vote/ \r\n\r\nid,party,ideology,vote,contribs\r\n412453,Republican,0.633449015632,Aye,74998.0\r\n409888,Republican,0.737561258181,No,65000.0\r\n412385,Democrat,0.386997814924,Aye,2000.0\r\n412388,Republican,0.779774806764,Aye,18000.0\r\n412396,Democrat,0.442960658479,No,36500.0\r\n400111,Democrat,0.313157072686,Aye,0\r\n412402,Republican,0.767316041639,No,32500.0\r\n412404,Democrat,0.327316657141,Aye,14500.0\r\n412410,Republican,0.642958482579,No,17500.0\r\n412463,Republican,0.775210676898,No,1000.0\r\n412412,Democrat,0.315818586434,No,4000.0\r\n412417,Republican,0.78421175545,No,58648.0\r\n400130,Democrat,0.258395918926,Aye,40000.0\r\n412186,Democrat,0.262751220382,No,0\r\n412189,Democrat,0.330712461821,No,13750.0\r\n412190,Republican,0.633628825101,No,71500.0\r\n412191,Republican,0.971967863122,Aye,85000.0\r\n412192,Democrat,0.444992905342,Aye,46831.0\r\n412193,Democrat,0.318419824791,Aye,130500.0\r\n412195,Democrat,0.317778763044,No,17000.0\r\n412196,Republican,0.741698931437,Aye,13750.0\r\n412199,Democrat,0.2318590508,No,31500.0\r\n412202,Republican,0.716232616055,No,31950.0\r\n412208,Democrat,0.350111226694,Aye,17600.0\r\n412209,Democrat,0.392394106402,Aye,53000.0\r\n412211,Democrat,0.316038883693,Aye,1250.0\r\n412212,Democrat,0.313620000108,Aye,5000.0\r\n412213,Republican,0.835546154968,No,11250.0\r\n412214,Democrat,0.420491623161,Aye,0\r\n412215,Democrat,0.101675714552,Aye,2750.0\r\n412216,Republican,0.888233857998,No,25135.0\r\n412217,Republican,0.737158743863,No,20000.0\r\n412219,Democrat,0.301092702818,Aye,0\r\n412221,Democrat,0.149509939453,Aye,30000.0\r\n412226,Republican,0.839413863187,Aye,15000.0\r\n412236,Democrat,0.179084410388,Aye,0\r\n412239,Democrat,0.339249391982,Aye,11000.0\r\n400142,Republican,0.59452380674,No,125275.0\r\n412250,Republican,0.745439589338,No,18250.0\r\n412252,Republican,0.950378655835,Aye,9706.0\r\n412254,Democrat,0.339863393245,Aye,51950.0\r\n412255,Republican,0.785856468472,No,153950.0\r\n412256,Republican,0.957681795702,No,5700.0\r\n412257,Democrat,0.446999723386,No,9900.0\r\n412258,Democrat,0.246815371523,Aye,15500.0\r\n412259,Democrat,0.292152702704,Aye,2000.0\r\n412261,Republican,0.838261768588,Aye,3000.0\r\n412263,Democrat,0.194343949395,Aye,12750.0\r\n412269,Republican,0.770114479696,Aye,4500.0\r\n412270,Republican,0.87865578318,Aye,19500.0\r\n412271,Republican,0.863935315349,Aye,96400.0\r\n412272,Democrat,0.313647617467,Aye,122000.0\r\n412275,Republican,0.907414444524,Aye,51000.0\r\n412276,Democrat,0.31858316015,Aye,975.0\r\n412278,Republican,0.753575385035,No,24500.0\r\n412280,Republican,0.796002864479,No,35500.0\r\n412282,Democrat,0.363092026526,No,24500.0\r\n412283,Republican,0.788273256836,No,136350.0\r\n412284,Republican,0.840618200537,Aye,20500.0\r\n412286,Democrat,0.466205891968,No,0\r\n412290,Republican,0.675744204617,No,3500.0\r\n400004,Republican,0.722570709357,No,149500.0\r\n412293,Democrat,0.371875545552,Aye,17000.0\r\n412294,Republican,0.87195436028,Aye,4000.0\r\n412295,Republican,0.8729233955,Aye,4650.0\r\n400008,Democrat,0.377181360419,No,113200.0\r\n412297,Democrat,0.37312695611,Aye,1250.0\r\n400010,Republican,0.802225500917,Aye,18500.0\r\n412302,Republican,0.912445417024,No,35800.0\r\n412303,Republican,0.705250844305,No,10500.0\r\n412305,Democrat,0.404966447131,No,10800.0\r\n400018,Republican,0.784850033758,Aye,15500.0\r\n408211,Democrat,0.463132076194,Aye,0\r\n412308,Democrat,0.259544069912,Aye,2000.0\r\n400021,Democrat,0.376660088147,Aye,28000.0\r\n412310,Republican,0.940905821166,Aye,0\r\n412311,Republican,0.80712090203,No,68900.0\r\n412315,Democrat,0.507478109592,Aye,13550.0\r\n400029,Republican,0.899317636613,Aye,25000.0\r\n412318,Democrat,0.358530167132,No,3000.0\r\n400031,Democrat,0.320638371706,No,10000.0\r\n400032,Republican,0.941508911211,Aye,28500.0\r\n400033,Democrat,0.177182398765,Aye,7000.0\r\n400036,Republican,0.612856209592,No,131100.0\r\n400038,Republican,0.779315220737,No,85850.0\r\n412327,Democrat,0.243478127512,Aye,11500.0\r\n410396,Republican,0.656991799857,Aye,7500.0\r\n412331,Democrat,0.342769313515,No,1000.0\r\n400046,Republican,0.822682394783,No,24000.0\r\n400047,Democrat,0.294414173146,Aye,16000.0\r\n400048,Democrat,0.268361881558,No,0\r\n400052,Republican,0.780638305727,Aye,16000.0\r\n412446,Republican,0.696355621229,No,79199.0\r\n400057,Republican,0.811240990331,No,99000.0\r\n400058,Republican,0.640769966415,No,47500.0\r\n400060,Republican,0.637541658631,No,102900.0\r\n400061,Republican,0.728577812671,No,2000.0\r\n400062,Democrat,0.205430228646,Aye,17400.0\r\n400063,Democrat,0.280886859632,Aye,11500.0\r\n400068,Republican,0.865426792786,No,45500.0\r\n400071,Republican,0.784332043502,Aye,20000.0\r\n400074,Democrat,0.236853985849,Aye,17000.0\r\n400075,Democrat,0.366655688557,Aye,77000.0\r\n400077,Republican,0.824379583474,No,68750.0\r\n400080,Democrat,0.125297407949,Aye,3000.0\r\n400081,Democrat,0.481156915544,No,23000.0\r\n412451,Republican,0.630887196029,No,9750.0\r\n400086,Republican,0.686649941605,No,83500.0\r\n400087,Democrat,0.328404367322,Aye,37750.0\r\n400089,Republican,0.842330334403,No,39500.0\r\n400090,Democrat,0.223235841841,Aye,13500.0\r\n412379,Democrat,0.225241217304,Aye,5000.0\r\n400093,Democrat,0.20932221331,Aye,3000.0\r\n412382,Democrat,0.298422335826,Aye,41750.0\r\n412383,Democrat,0.537548508056,Aye,71624.0\r\n400097,Democrat,0.282208667287,No,57750.0\r\n400100,Democrat,0.353900406958,Aye,6000.0\r\n400101,Democrat,0.26337606557,Aye,6500.0\r\n400103,Democrat,0.21832136586,Aye,35500.0\r\n412392,Republican,0.762477323848,No,21000.0\r\n412393,Republican,0.668639004817,No,19000.0\r\n412394,Republican,0.691130489967,No,54000.0\r\n412395,Republican,0.777100178691,No,195020.0\r\n400108,Republican,0.672686607463,No,21500.0\r\n412397,Republican,0.782511902597,Aye,10750.0\r\n400110,Democrat,0.36622359035,Aye,17500.0\r\n412399,Republican,0.766160625854,Aye,28250.0\r\n412400,Republican,0.784416643206,No,1000.0\r\n412401,Republican,0.863464505847,Aye,47134.0\r\n400114,Democrat,0.328119770674,Aye,11800.0\r\n412403,Republican,0.671613219719,No,4250.0\r\n400116,Republican,0.820006509482,Aye,10000.0\r\n412405,Republican,0.708668221131,Aye,9000.0\r\n412406,Republican,0.736611707351,Aye,17000.0\r\n412407,Democrat,0.491034747076,No,9000.0\r\n412408,Republican,0.763939771126,Aye,13786.0\r\n412409,Republican,0.777051435944,Aye,4000.0\r\n400122,Democrat,0.320333779259,No,9250.0\r\n412411,Republican,0.861345969369,Aye,10500.0\r\n400124,Democrat,0.316128913741,Aye,20250.0\r\n412416,Republican,0.667703270678,No,3000.0\r\n400129,Democrat,0.182218402692,Aye,6000.0\r\n412418,Democrat,0.450107051291,No,88300.0\r\n412419,Republican,0.708246117324,Aye,5000.0\r\n412421,Republican,0.76124935019,No,30250.0\r\n412422,Republican,0.757636007975,Aye,6000.0\r\n400137,Republican,0.851223396228,No,122800.0\r\n412427,Republican,0.754853599197,No,7250.0\r\n412428,Republican,0.699844500787,No,54250.0\r\n400141,Republican,0.972388148495,No,61500.0\r\n412430,Republican,0.763675253418,Aye,16500.0\r\n412431,Republican,0.776980311094,No,34500.0\r\n412432,Democrat,0.454503663914,Aye,23500.0\r\n400145,Republican,0.803868643819,Aye,17500.0\r\n412434,Republican,0.828808374328,Aye,1500.0\r\n412435,Democrat,0.41922554433,Aye,27500.0\r\n412436,Republican,0.807108781924,No,11500.0\r\n412437,Republican,0.842878686518,Aye,26500.0\r\n412438,Republican,0.607954848243,Aye,1400.0\r\n400151,Republican,0.905933347751,No,28500.0\r\n412442,Republican,0.852407021906,No,48000.0\r\n412443,Republican,0.771525744936,No,93500.0\r\n412444,Republican,0.838359873989,No,63134.0\r\n400157,Republican,0.722594102009,No,172950.0\r\n400158,Republican,0.778727995793,No,11000.0\r\n400160,Democrat,0.409116938928,Aye,15500.0\r\n412449,Republican,0.628231533454,No,100348.0\r\n400162,Democrat,0.0,Aye,6500.0\r\n400163,Democrat,0.224180925401,No,3500.0\r\n400165,Republican,0.81310059491,Aye,44500.0\r\n412454,Republican,0.646213547286,No,7500.0\r\n412457,Republican,0.789292489079,No,11500.0\r\n400170,Democrat,0.207849650852,Aye,2000.0\r\n400171,Republican,0.666110968727,No,23800.0\r\n412460,Republican,0.869346951119,Aye,21000.0\r\n412461,Republican,0.807816716105,No,24300.0\r\n412462,Republican,0.666028722742,No,19750.0\r\n400175,Republican,0.846287358236,No,23000.0\r\n412464,Republican,0.788605788227,No,8500.0\r\n412465,Republican,0.741078115427,No,12000.0\r\n412466,Republican,0.659562119923,No,32200.0\r\n400179,Democrat,0.344907784345,No,4500.0\r\n412468,Republican,0.696227768179,No,25000.0\r\n412470,Democrat,0.300617780554,Aye,34999.0\r\n412472,Republican,0.895682045542,Aye,22000.0\r\n412473,Republican,0.743679622471,Aye,14000.0\r\n412474,Republican,0.801460256717,Aye,6500.0\r\n412475,Republican,0.72172336358,No,21000.0\r\n412476,Republican,0.720258398797,Aye,15000.0\r\n400189,Democrat,0.423394654282,No,142700.0\r\n412478,Republican,0.80996470523,Aye,16500.0\r\n412479,Republican,0.749495212837,Aye,0\r\n412480,Republican,0.856770850973,No,13000.0\r\n412482,Republican,0.785597947899,Aye,4250.0\r\n400195,Democrat,0.264303842266,No,93500.0\r\n412484,Republican,0.691510586563,No,42400.0\r\n412485,Republican,0.708054883653,Aye,23000.0\r\n400199,Democrat,0.19914694027,No,4000.0\r\n412488,Republican,0.675729889598,Aye,12000.0\r\n412489,Republican,0.81047586135,Aye,23000.0\r\n400204,Democrat,0.266125044748,No,15500.0\r\n400206,Republican,0.851642962918,No,39250.0\r\n400184,Democrat,0.194468515203,Aye,8250.0\r\n412498,Democrat,0.329612796991,Aye,30750.0\r\n400211,Democrat,0.314128125351,No,80500.0\r\n412500,Republican,0.72171107532,Aye,12250.0\r\n412501,Democrat,0.392419206989,Aye,15750.0\r\n412502,Democrat,0.497069850567,No,35500.0\r\n400185,Democrat,0.158720897776,Aye,23750.0\r\n412505,Democrat,0.482700956526,Aye,11500.0\r\n400218,Democrat,0.472047201779,No,10500.0\r\n400219,Republican,0.584678418461,No,92500.0\r\n400220,Republican,0.889375962938,No,3250.0\r\n400221,Republican,0.865440604542,Aye,109500.0\r\n412510,Republican,0.65487651768,Aye,0\r\n412511,Democrat,0.393060129242,Aye,0\r\n400224,Republican,0.941271540887,No,45250.0\r\n412513,Republican,0.546641942754,No,4500.0\r\n412514,Democrat,0.446734004096,Aye,0\r\n412515,Republican,0.569941175049,No,1000.0\r\n412516,Democrat,0.463517403728,No,0\r\n412517,Democrat,0.439665305646,Aye,3000.0\r\n400230,Democrat,0.312498022403,No,119750.0\r\n412519,Democrat,0.498373875472,No,500.0\r\n400232,Democrat,0.368804758979,No,79000.0\r\n400233,Democrat,0.355945535295,Aye,73450.0\r\n400234,Republican,0.697475238017,No,83650.0\r\n412523,Democrat,0.475342570967,No,8000.0\r\n412524,Democrat,0.445503463604,No,8250.0\r\n400237,Democrat,0.037794256511,Aye,2250.0\r\n400238,Democrat,0.312217659644,No,59500.0\r\n412477,Republican,0.769868095511,Aye,4000.0\r\n400240,Democrat,0.176212439532,Aye,21000.0\r\n412529,Democrat,0.452271591079,No,2500.0\r\n412530,Democrat,0.504194131082,No,0\r\n412531,Republican,0.607612975733,No,1000.0\r\n400244,Republican,0.591679735675,No,80750.0\r\n400245,Democrat,0.25282466989,Aye,10000.0\r\n400246,Democrat,0.276521616745,No,36500.0\r\n400247,Republican,0.723874439628,No,14250.0\r\n412536,Republican,0.62771746916,Aye,2250.0\r\n400249,Democrat,0.323063207852,Aye,13000.0\r\n412538,Republican,0.582643788288,No,6500.0\r\n400251,Democrat,0.213485996529,Aye,8500.0\r\n412540,Republican,0.605528999096,No,1000.0\r\n412541,Republican,0.637730814516,No,0\r\n400255,Democrat,0.608736043888,No,26500.0\r\n412544,Democrat,0.482162723574,No,3700.0\r\n412546,Democrat,0.471634605114,Aye,0\r\n400259,Democrat,0.207355510011,Aye,6000.0\r\n412548,Republican,0.605930647872,No,30250.0\r\n412549,Republican,0.604563574824,Aye,15000.0\r\n400262,Democrat,0.188125283579,Aye,11500.0\r\n400263,Democrat,0.158140168688,Aye,16000.0\r\n412552,Republican,0.667781444044,Aye,1000.0\r\n412553,Republican,0.593822547736,No,2000.0\r\n400266,Democrat,0.603640889435,No,95124.0\r\n400267,Republican,0.732267004605,No,526600.0\r\n412557,Democrat,0.459118656054,No,1250.0\r\n412558,Democrat,0.473701372657,Aye,9500.0\r\n400271,Democrat,0.272492426763,No,15000.0\r\n412560,Democrat,0.443494335415,No,0\r\n400273,Republican,0.679423030659,Aye,36450.0\r\n400274,Democrat,0.382602565576,Aye,6500.0\r\n412483,Republican,0.696412189795,No,122349.0\r\n400276,Republican,0.736600949344,No,10750.0\r\n400277,Republican,0.758704612492,Aye,4000.0\r\n400278,Democrat,0.232511204146,Aye,0\r\n400279,Republican,0.880620651091,No,109250.0\r\n412568,Republican,0.637728511061,Aye,3250.0\r\n400196,Republican,0.729346173094,No,69800.0\r\n412570,Republican,0.602810920554,Aye,8000.0\r\n400283,Democrat,0.203325691459,Aye,152500.0\r\n412572,Republican,0.577525426843,Aye,15000.0\r\n400285,Republican,0.654959270917,No,14500.0\r\n412574,Republican,0.629341603642,Aye,5500.0\r\n412575,Democrat,0.453219410134,Aye,2500.0\r\n412576,Democrat,0.483016519489,No,17500.0\r\n400289,Democrat,0.189697940161,Aye,0\r\n400290,Democrat,0.215081480166,Aye,12000.0\r\n400291,Democrat,0.396550119521,Aye,53000.0\r\n412580,Democrat,0.481197511987,Aye,5000.0\r\n412581,Republican,0.629199344503,Aye,1500.0\r\n412583,Democrat,0.510911574814,No,15750.0\r\n412584,Democrat,0.480959327335,No,30450.0\r\n400297,Republican,0.688758989779,No,26500.0\r\n412487,Republican,0.760150945862,No,33750.0\r\n412595,Democrat,0.480613037565,No,0\r\n412596,Republican,0.556967071993,Aye,0\r\n400309,Democrat,0.32654204439,Aye,44950.0\r\n400310,Democrat,0.344849319312,Aye,67500.0\r\n400313,Republican,0.819086317456,Aye,22000.0\r\n400314,Democrat,0.452042918971,No,47000.0\r\n400316,Democrat,0.560759967384,No,4000.0\r\n400318,Republican,0.63555089845,Aye,16000.0\r\n400320,Republican,0.866405391646,No,6000.0\r\n400326,Democrat,0.280712192071,No,67850.0\r\n400331,Democrat,0.482466772772,Aye,12500.0\r\n400333,Democrat,0.175407723492,Aye,2000.0\r\n400340,Republican,0.714242262109,No,146250.0\r\n400341,Republican,0.744501421474,No,136700.0\r\n400342,Republican,0.691150522828,No,109750.0\r\n400343,Republican,0.676540502443,Aye,1650.0\r\n400344,Republican,0.562589639804,No,54750.0\r\n400347,Democrat,0.236181606938,Aye,10000.0\r\n400348,Republican,0.684595086792,No,28400.0\r\n400349,Democrat,0.451041803133,No,220550.0\r\n400350,Democrat,0.245942421993,Aye,4000.0\r\n400351,Republican,0.687564347528,No,30700.0\r\n400352,Democrat,0.357075098702,No,59500.0\r\n400355,Democrat,0.262053812287,Aye,2000.0\r\n400356,Democrat,0.357752347059,Aye,95000.0\r\n400209,Republican,0.81715374444,Aye,36000.0\r\n400360,Democrat,0.0998854836947,No,4000.0\r\n400361,Democrat,0.306850623829,Aye,46000.0\r\n400363,Democrat,0.449209744767,No,10000.0\r\n400364,Democrat,0.238737548947,Aye,9750.0\r\n400365,Republican,0.742037926918,Aye,5000.0\r\n400366,Democrat,0.179143979138,Aye,4000.0\r\n400367,Republican,0.81596659786,No,27500.0\r\n400371,Democrat,0.35830674349,Aye,16000.0\r\n400373,Republican,0.740663949932,No,18500.0\r\n400376,Republican,0.733162843616,No,34300.0\r\n400378,Democrat,0.245818017009,No,49500.0\r\n400379,Democrat,0.359848224236,No,186500.0\r\n400380,Republican,0.629892467077,Aye,2000.0\r\n400381,Republican,0.773249359554,No,26750.0\r\n412503,Republican,0.574492645864,Aye,3000.0\r\n400400,Republican,0.748779631161,No,28300.0\r\n400402,Democrat,0.333591351244,Aye,79250.0\r\n400403,Democrat,0.39181409691,No,19250.0\r\n400404,Republican,0.76112072336,No,159600.0\r\n400406,Republican,0.727631100142,No,49300.0\r\n400407,Democrat,0.29755864933,Aye,14750.0\r\n400411,Republican,0.72262710147,No,127775.0\r\n412506,Democrat,0.390579463012,No,2000.0\r\n400414,Republican,0.653906207602,No,28500.0\r\n400415,Democrat,0.260211973236,No,60500.0\r\n400416,Democrat,0.319374845359,Aye,5000.0\r\n400417,Democrat,0.434913265767,No,84000.0\r\n400419,Republican,0.71231574564,No,37500.0\r\n400422,Democrat,0.25392555263,Aye,500.0\r\n400424,Democrat,0.324962890134,Aye,9000.0\r\n400425,Democrat,0.269154761376,Aye,12500.0\r\n400431,Republican,0.680659639453,No,23000.0\r\n400433,Republican,0.914641874914,Aye,100100.0\r\n400435,Republican,0.647184644568,No,83946.0\r\n400439,Republican,0.70206855368,No,216860.0\r\n400440,Republican,0.682598026328,Aye,6250.0\r\n400441,Republican,0.890611177712,No,12000.0\r\n412547,Republican,0.664726002344,Aye,1000.0\r\n412512,Democrat,0.467605676022,No,12700.0\r\n412520,Democrat,0.39933629378,Aye,1450.0\r\n412521,Democrat,0.411251023229,Aye,3250.0\r\n412522,Democrat,0.458474250569,No,10250.0\r\n412525,Republican,0.646894912117,Aye,1000.0\r\n412526,Republican,0.624895370933,Aye,6250.0\r\n412527,Democrat,0.492573156262,No,4200.0\r\n412528,Republican,0.626975155647,Aye,2000.0\r\n412532,Democrat,0.47884749185,Aye,12500.0\r\n412533,Democrat,0.498821333049,No,21950.0\r\n412534,Democrat,0.499930902708,No,4000.0\r\n412535,Democrat,0.491047342092,No,3000.0\r\n400606,Republican,0.636999965603,Aye,14200.0\r\n400607,Republican,0.520751796265,Aye,0\r\n412539,Republican,0.582738523883,No,1000.0\r\n400616,Democrat,0.347544345113,No,33500.0\r\n400618,Democrat,0.476293756022,No,500.0\r\n400623,Democrat,0.308581997037,No,39900.0\r\n400626,Republican,0.822370654574,Aye,11000.0\r\n400627,Republican,1.0,No,39000.0\r\n400628,Democrat,0.558332806937,No,41750.0\r\n400630,Democrat,0.4610985905,No,11000.0\r\n412543,Democrat,0.463718341838,No,6750.0\r\n400636,Republican,0.769507879502,No,50800.0\r\n400639,Democrat,0.293536310255,Aye,12000.0\r\n400640,Republican,0.712300978323,No,17250.0\r\n400641,Democrat,0.320181960892,No,18500.0\r\n400643,Republican,0.795239166424,No,3000.0\r\n400644,Republican,0.792269903448,Aye,4000.0\r\n400646,Republican,0.652346797325,Aye,27850.0\r\n400647,Democrat,0.327925490426,No,16750.0\r\n400648,Republican,0.636195450276,No,44500.0\r\n400651,Republican,0.924549852959,Aye,2000.0\r\n400652,Republican,0.850436794593,Aye,8000.0\r\n400653,Democrat,0.260906670259,No,0.0\r\n400654,Republican,0.802186148974,No,66000.0\r\n400655,Republican,0.900665621735,No,75000.0\r\n400656,Republican,0.947698807979,Aye,26000.0\r\n400657,Democrat,0.5166015143,No,50000.0\r\n400659,Republican,0.785993874523,Aye,26100.0\r\n400660,Republican,0.591045455683,No,30500.0\r\n400661,Democrat,0.168471768421,Aye,14000.0\r\n400663,Democrat,0.273095686625,Aye,13000.0\r\n400147,Republican,0.669706868292,No,24583.0\r\n412550,Republican,0.599943778747,No,7500.0\r\n412551,Republican,0.639638761923,No,2000.0\r\n412555,Republican,0.655106410245,Aye,6000.0\r\n412561,Democrat,0.472609303866,Aye,0\r\n412562,Democrat,0.488761890167,No,0\r\n412563,Republican,0.593160119501,No,2000.0\r\n412564,Republican,0.605793765392,No,5250.0\r\n412566,Republican,0.594613446047,No,2500.0\r\n412567,Republican,0.627406819367,Aye,5000.0\r\n412569,Republican,0.591626646106,Aye,3250.0\r\n412571,Democrat,0.402521880565,Aye,3000.0\r\n412577,Democrat,0.509654055481,No,3500.0\r\n412578,Republican,0.569227381362,Aye,24200.0\r\n412579,Democrat,0.451985447171,No,11200.0\r\n412429,Republican,0.817419406168,Aye,4000.0\r\n400154,Republican,0.801907441642,No,13000.0\r\n412585,Democrat,0.393765109674,Aye,750.0\r\n412292,Republican,0.848036909281,No,6500.0\r\n400006,Republican,0.731009480239,No,25000.0\r\n412508,Republican,0.672259202,No,23750.0\r\n412509,Democrat,0.491463018869,No,1750.0\r\n412307,Democrat,0.264211362272,Aye,31000.0\r\n412309,Republican,0.908259432527,Aye,80200.0\r\n412317,Republican,0.777387431372,Aye,5750.0\r\n400030,Democrat,0.398620810189,No,64500.0\r\n412319,Democrat,0.255376566646,Aye,12500.0\r\n412445,Republican,0.851912863512,No,5500.0\r\n"
  },
  {
    "path": "tests/data",
    "content": "Y, X_1, X_2, X_3\n2,3,4,5\n1,2,9,3\n4,7,9,0\n2,4,8,16\n1,4,2,9\n9,8,7,6\n"
  },
  {
    "path": "tests/data-mixed",
    "content": "rsid | chr  | a_allele | b_allele | aa   | ab   | bb   | treatment \nrs1933024|1|A|G|0|37|1830|cases\nrs11497407|1|A|G|0|39|1828|cases\nrs12565286|1|C|G|2|190|1674|cases\nrs\\'11804171|1|A|T|1|179|1686|cases\nrs2977656|1|C|T|1836|31|0|cases\nrs12138618|1|A|G|0|201|1660|cases\nrs3094315|1|A|G|1230|570|60|cases\nrs17160906|1|A|G|33|437|1396|cases\nrs2519016|1|C|T|0|133|1731|cases\n"
  },
  {
    "path": "tests/db_tests.c",
    "content": "/*\nThese are database-related tests. If you have mySQL/mariaDB set up correctly (see below) then you can run them both for SQLite and m... database engines.\n\nSQLite is the default for Apophenia, and this works out of the box for it.\nFor mySQL/mariaDB, you will need to either hack this to include\napop_opts.user=[your username]\napop_opts.pass=[your password]\n\nor set up a config file, like pasting this to ~/.my.cnf :\n[client]\nuser=[your name]\npass=[your pass]\n\nThen, either hack this again to include\napop_opts.db_engine='m';\n\nor at the command line:\nexport APOP_DB_ENGINE=mysql\n*/\n\n#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n#include <unistd.h>\nint verbose = 1;\n\n#define Diff(L, R, eps) {double left=(L), right=(R); Apop_stopif(isnan(left-right) || fabs((left)-(right))>(eps), abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", (double)(left), (double)(right), eps);}\n\nvoid test_data_to_db() {\n  int i, j;\n    if (!apop_table_exists(\"snps\"))\n        apop_text_to_db( DATADIR \"/\" \"data-mixed\" , \"snps\");\n    apop_data *d = apop_query_to_mixed_data(\"tvttmmmt\", \"select * from snps\");\n    apop_data_print(d, \"snps2\", .output_type='d');\n    apop_data *d2 = apop_query_to_mixed_data(\"vmmmtttt\", \"select * from snps2\");\n    for (i=0; i< d2->vector->size; i++)\n        assert(d->vector->data[i] == d2->vector->data[i]);\n    for (i=0; i< d2->matrix->size1; i++)\n        for (j=0; j< d2->matrix->size2; j++)\n            assert(gsl_matrix_get(d->matrix, i, j) ==  gsl_matrix_get(d2->matrix, i, j));\n    for (i=0; i< d2->textsize[0]; i++)\n        for (j=0; j< d2->textsize[1]; j++)\n            assert(!strcmp(d->text[i][j],d2->text[i][j]));  \n    unlink(\"snps2\");\n}\n\nvoid test_uniform(apop_data *d){\n    Apop_col_tv(d, \"ab\", abcol);\n    apop_data ab_d = (apop_data){.vector=abcol};\n    apop_model *u = apop_estimate(&ab_d, apop_uniform);\n    Diff(log(apop_p(&ab_d, u)), apop_log_likelihood(&ab_d, u), 1e-5);\n\n    apop_data_add_names(&ab_d, 'v', \"a vector\");\n    apop_data_set(&ab_d, .colname=\"a vector\", .row=0, .val=-297);\n    assert(apop_p(&ab_d, u) == 0);\n    assert(isinf(apop_log_likelihood(&ab_d, u)));\n\n    apop_model *iu = apop_estimate(&ab_d, apop_improper_uniform);\n    assert(apop_p(&ab_d, iu) == 1);\n    assert(apop_log_likelihood(&ab_d, iu)==0);\n\n    int verbosity = apop_opts.verbose;\n    apop_opts.verbose = -1;\n    double draw;\n    apop_draw(&draw, NULL, iu);\n    apop_opts.verbose = verbosity;\n    assert(isnan(draw));\n}\n\n\nvoid db_to_text(){\n    apop_db_close();\n    apop_db_open(NULL);\n    if (!apop_table_exists(\"d\")){\n        apop_data *field_params = apop_text_alloc(NULL,2,2);\n        apop_text_fill(field_params,\n                \"[ab][ab]\", \"numeric\",\n                \".*\",     apop_opts.db_engine =='s' ? \"character\": \"varchar(20)\"\n                );\n        apop_text_to_db( DATADIR \"/\" \"data-mixed\" , \"d\", 0, 1, NULL, .field_params=field_params);\n    }\n    apop_data *d = apop_query_to_mixed_data (\"tmttmmmt\", \"select * from d\");\n    int b_allele_col = apop_name_find(d->names, \"b_allele\", 't');\n    assert(!strcmp(\"T\",  d->text[3][b_allele_col]));\n    int rsid_col = apop_name_find(d->names, \"rsid\", 't');\n    assert(!strcmp(\"rs2977656\",  d->text[4][rsid_col]));\n    assert(apop_data_get(d, .row=5, .colname=\"ab\")==201);\n\n    assert(!strcmp(d->text[3][rsid_col], \"rs'11804171\"));\n\n    apop_data *dcc = apop_data_copy(d); //test apop_data_copy\n    assert(!strcmp(\"T\",  dcc->text[3][b_allele_col]));\n    assert(!strcmp(\"rs2977656\",  dcc->text[4][rsid_col]));\n    assert(apop_data_get(dcc, 5, .colname=\"ab\")==201);\n\n    apop_data *dd = apop_query_to_text (\"select * from d\");\n    b_allele_col = apop_name_find(dd->names, \"b_allele\", 't');\n    assert(!strcmp(\"T\",  dd->text[3][b_allele_col]));\n    rsid_col = apop_name_find(dd->names, \"rsid\", 't');\n    assert(!strcmp(\"rs2977656\",  dd->text[4][rsid_col]));\n    \n    apop_data *dc = apop_data_copy(d);\n    b_allele_col = apop_name_find(dc->names, \"b_allele\", 't');\n    assert(!strcmp(\"T\",  dc->text[3][b_allele_col]));\n    rsid_col = apop_name_find(dc->names, \"rsid\", 't');\n    assert(!strcmp(\"rs2977656\",  dc->text[4][rsid_col]));\n    assert(apop_data_get(dc, 5, .colname=\"ab\")==201);\n\n    apop_data_print(dc, \"mixedtest\", .output_type='d');\n    apop_data *de = apop_query_to_mixed_data(\"mmmmtttt\",\"select * from mixedtest\");\n    b_allele_col = apop_name_find(de->names, \"b_allele\", 't');\n    assert(!strcmp(\"T\",  de->text[3][b_allele_col]));\n    rsid_col = apop_name_find(de->names, \"rsid\", 't');\n    assert(!strcmp(\"rs2977656\",  de->text[4][rsid_col]));\n    assert(apop_data_get(de, 5, .colname=\"ab\")==201);\n    unlink(\"mixedtest\");\n\n    apop_data *as_matrix = apop_query_to_data(\"select ab from d\");\n    gsl_vector *as_vector = apop_query_to_vector(\"select ab from d\");\n    Apop_matrix_col(as_matrix->matrix, 0, mv);\n    gsl_vector_sub(as_vector, mv);\n    Diff(apop_sum(as_vector), 0, 1e-10);\n\n    test_uniform(d);\n    apop_data_free(dc); apop_data_free(dd); \n    apop_data_free(dcc); apop_data_free(d); \n}\n\nvoid test_blank_db_queries(){\n    apop_db_close();\n    apop_db_open(NULL);\n    apop_table_exists(\"t\", 'd');\n    apop_query(\"create table t (a integer, b integer, c integer)\");\n    apop_data *d = apop_query_to_data(\"select * from t\");\n    apop_data *e = apop_query_to_text(\"select * from t\");\n    gsl_vector *g = apop_query_to_vector(\"select * from t\");\n    double h = apop_query_to_float(\"select * from t\");\n    assert(d==NULL);\n    assert(e==NULL);\n    assert(g==NULL);\n    assert(gsl_isnan(h));\n}\n\nvoid test_nan_data(){\n    apop_table_exists(\"nandata\", 'd');\n    apop_table_exists(\"fw\", 'd');\n    apop_table_exists(\"fww\", 'd');\n    apop_text_to_db( DATADIR \"/\" \"test_data_nans\" , \"nandata\");\n    apop_opts.db_name_column = \"head\";\n    apop_opts.nan_string = \"nan\";\n    apop_data *d = apop_query_to_data(\"select * from nandata\");\n    apop_data_print(d, \"nantest\", .output_type='d');\n    apop_data_free(d);\n    apop_data *d2  = apop_query_to_data(\"select * from nantest\");\n    assert(gsl_isnan(apop_data_get(d2, .rowname=\"second\", .colname=\"c\")));\n    assert(gsl_isnan(apop_data_get(d2, .rowname=\"third\", .colname=\"b\")));\n    assert(!apop_data_get(d2, .rowname=\"fourth\", .colname=\"b\"));\n    apop_data_free(d2);\n    apop_opts.nan_string = \"NaN\";\n\n    //while we're here, test querying just names & no data.\n    apop_data *justnames = apop_query_to_data(\"select head from nandata\");\n    assert(justnames->names->rowct == 4);\n    assert(!justnames->vector && !justnames->matrix);\n    apop_data_free(justnames);\n\n    //Oh, and let's test fixed-width inputs.\n    apop_text_to_db( DATADIR \"/\" \"test_data_fixed_width\" , .tabname=\"fw\", .has_col_names='n', .field_ends=(int[]){3,6});\n    assert(apop_query_to_float(\"select col_2 from fw\")==3.14159);\n    apop_data *t=apop_query_to_text(\"select col_1 from fw\");\n    assert(!strcmp(*t->text[0], \"A#C\"));\n    assert(!strcmp(*t->text[1], \" BC\"));\n    apop_text_to_db( DATADIR \"/\" \"test_data_fixed_width\" , .tabname=\"fww\", .field_names=(char*[]){\"number\", \"text\", \"foat\"}, .field_ends=(int[]){3,6});\n    assert(apop_query_to_float(\"select number from fww where number<0\")==-21);\n    assert(apop_query_to_float(\"select foat from fww where text=' BC'\")==2.71828);\n    unlink(\"nantest\");\n}\n\n#include <sys/wait.h> \nstatic void test_printing(){\n    //This compares printed output to the printed output in the attached file. \n    char outfile[] = \"print_test.out\";\n\n    if (!apop_table_exists(\"nandata\"))\n        test_nan_data();\n    apop_opts.db_name_column = \"head\";\n    apop_data *m  = apop_query_to_data(\"select * from nandata\");\n    apop_matrix_print(m->matrix, .output_name=outfile, .output_append='w');\n\napop_system(\"cp %s xxx\", outfile);\n\n    if (!apop_table_exists(\"d\"))\n        db_to_text();\n    apop_data *d = apop_query_to_mixed_data (\"tvttmmwt\", \"select * from d\");\n    FILE *f = fopen(outfile, \"a\");\n    fprintf(f, \"\\nand a full vector+matrix+text+weights data set, formatted for computer reading:\\n\");\n    strcpy(apop_opts.output_delimiter, \"\\t| \");\n    apop_name_add(d->names, \"Some SNPS\", 'h');\n    apop_data_print(d, .output_pipe =f);\n\n    fprintf(f, \"\\nand just the names:\\n\");\n    fclose(f);\n    //need to redirect stdout.\n    int status;\n    if (fork() == 0){\n        freopen(outfile, \"a\", stdout);\n        apop_name_print(d->names);\n        fclose(stdout);\n        exit(0);\n    }\n\n    wait(&status);\n    f = fopen(outfile, \"a\");\n\n    fprintf(f, \"\\nand just the weights vector:\\n\");\n    strcpy(apop_opts.output_delimiter, \"\\t\");\n    apop_vector_print(d->weights, .output_type='p', .output_pipe=f);\n    fclose(f);\n    int has_diffs = apop_system( \"diff -b %s/printing_sample %s\", DATADIR, outfile);\n    assert(!has_diffs);\n    //apop_system(\"rm %s\", outfile);\n    unlink(outfile);\n    unlink(\"xxx\");\n}\n\nvoid test_crosstabbing() {\n    apop_db_close(); //gotta test it somewhere\n    if (!apop_table_exists(\"snps\"))\n        apop_text_to_db( DATADIR \"/\" \"data-mixed\" , \"snps\", 0, 1);\n    apop_table_exists(\"snp_ct\", 'd');\n    apop_query(\"create table snp_ct as \"\n                 \" select a_allele, b_allele, count(*) as ct \"\n                 \" from snps group by a_allele, b_allele \");\n    apop_data *d = apop_db_to_crosstab(\"snp_ct\", \"a_allele\", \"b_allele\", \"ct\");\n    assert(apop_data_get(d, .rowname=\"A\", \"G\")==5);\n    assert(apop_data_get(d, .rowname=\"C\", \"G\")==1);\n\n    apop_data *ct = apop_text_alloc(apop_data_alloc(3,1),3,1);\n    apop_data_set(ct, 0, 0, 1); apop_text_set(ct, 0, 0, \"first\");\n    apop_data_set(ct, 1, 0, 2); apop_text_set(ct, 1, 0, \"second\");\n    apop_data_set(ct, 2, 0, 3); apop_text_set(ct, 2, 0, \"third\");\n    apop_table_exists(\"ct\", 'd');\n    apop_crosstab_to_db(ct, \"ct\", \"r\", \"c\", \"val\");\n    if (apop_opts.db_engine=='s'){\n        assert(!strcmp(**(apop_query_to_text(\"select val from ct where r='r0' and c='t0'\")->text), \"first\"));\n        assert(!strcmp(**(apop_query_to_text(\"select val from ct where r='r2' and c='t0'\")->text), \"third\"));\n    }\n    assert(apop_query_to_float(\"select val from ct where r='r1' and c='c0'\")==2);\n}\n\n#define do_test(text, fn) {if (verbose) printf(\"%s:\", text); \\\n                          fflush(NULL);                      \\\n                          fn;                                \\\n                          if (verbose) printf(\"\\nPASS.  \");} \n\nint main(){\n    do_test(\"test data to db\", test_data_to_db());\n    do_test(\"db_to_text\", db_to_text());\n    do_test(\"test queries returning empty tables\", test_blank_db_queries());\n    do_test(\"NaN handling\", test_nan_data());\n    do_test(\"test printing\", test_printing());\n    do_test(\"test db to crosstab\", test_crosstabbing());\n    apop_db_close();\n}\n"
  },
  {
    "path": "tests/distribution_tests.c",
    "content": "/* These are tests of distributions. The basic idea is to \n --assume a true set of parameters\n --generate a fake data set via a few thousand draws from your preferred model.\n --estimate the parameters of a new model using the fake data\n --assert that the estimated parameters are within epsilon of the true parameters.\n*/\n#include <apop.h>\n#include <unistd.h>\n#ifdef _OPENMP\n#include <omp.h>\n#endif\n\n#define Diff(L, R, eps) Apop_assert(fabs((L)-(R))<(eps), \"%g is too different from %g (abitrary limit=%g).\", (double)(L), (double)(R), eps);\n\n#define Print_dot if(verbose){printf(\".\");fflush(NULL);}\n#define is_t(d) !strcmp((d)->name, \"t distribution\")\n#define is_bernie(d) !strcmp((d)->name, \"Bernoulli distribution\")\n#define is_binom(d) !strcmp((d)->name, \"Binomial distribution\")\n#define is_beta(d) !strcmp((d)->name, \"Beta distribution\")\n#define is_poisson(d) !strcmp((d)->name, \"Poisson distribution\")\n\nint verbose = 1;\n\n//The MLE of the t distribution may have non-integer value (why not?)\n//Because we started with an integer value, we have to find the floor.\nvoid tfloor(apop_model *dce){\n    if (is_t(dce)) dce->parameters->vector->data[2] = floor(dce->parameters->vector->data[2]);\n}\n\nint estimate_model(apop_data *data, apop_model *dist, char *method, apop_data *true_params){\n    double *starting_pt;\n    if(is_bernie(dist))\n        starting_pt = (double[]){.5};\n    else starting_pt = (double[]) {1.6, 1.4, 10};\n\n    Apop_settings_add_group(dist, apop_mle, \n        .starting_pt = starting_pt,\n        .method       = method, .verbose   =0,\n        .step_size    = 1e-1,\n        .tolerance    = 1e-4,   .k         = 1.8,\n        .t_initial    = 1,      .t_min     = .5\n        );\n    //Apop_model_add_group(dist, apop_parts_wanted);\n\n    if((is_bernie(dist) || is_beta(dist))\n       && !strcasecmp(method, \"Newton hybrid\"))\n        return 0;\n    apop_model *e = apop_estimate(data, dist);\n    tfloor(e);\n    Diff(0.0, apop_vector_distance(apop_data_pack(true_params), apop_data_pack(e->parameters)), 1e-1); \n    //if (is_poisson(dist)) Apop_settings_add(dist, apop_parts_wanted, covariance, 'y');\n    Print_dot\n    e = apop_estimate_restart(e);\n    tfloor(e);\n    Diff(0.0, apop_vector_distance(apop_data_pack(true_params),apop_data_pack(e->parameters)), 1e-1); \n\n        if (!strcmp(e->name, \"Dirichlet distribution\")\n            || !strcmp(e->name, \"Gamma distribution\") //just doesn't work.\n            ||(is_bernie(e) && !strcasecmp(method, \"Newton hybrid\"))\n            ||(is_t(e)) //requires several restarts to work.\n            ||(!strcmp(e->name, \"Exponential distribution\")) //imprecise\n            || !strcmp(e->name, \"Yule distribution\")){\n            //cycle takes all day.\n            return 0;\n        }\n\n    apop_model *dc = apop_model_copy(dist);\n    Apop_settings_add(dc, apop_mle, tolerance, 1e-4);\n    Apop_settings_add(dc, apop_mle, dim_cycle_tolerance, fabs(apop_log_likelihood(data, e))/200.); //within .5%.\n    Print_dot\n    apop_model *dce = apop_estimate(data, dc);\n    Print_dot\n    Diff(0.0, apop_vector_distance(apop_data_pack(true_params),apop_data_pack(dce->parameters)), 1e-2); \n    return 0;\n}\n\n/*Produce random data, then try to recover the original params */\nvoid test_one_distribution(gsl_rng *r, apop_model *model, apop_model *true_params){\n    long int runsize = 1e5;\n    //generate.\n    apop_data *data = apop_data_calloc(runsize, model->dsize);\n    if (!strcmp(model->name, \"Wishart distribution\")){\n        data = apop_data_calloc(runsize,4);\n        true_params->parameters->vector->data[0] = runsize-4;\n        //Use Apop_r to get one row's data and fill it with a draw\n        for (size_t i=0; i< runsize; i++){\n            gsl_vector *v = Apop_rv(data, i);\n            true_params->draw(v->data, r, true_params);\n            assert(!isnan(apop_sum(v)));\n        }\n    } else {\n        for (size_t i=0; i< runsize; i++){\n            gsl_vector *v = Apop_rv(data, i);\n            true_params->draw(v->data, r, true_params);\n            assert(!isnan(apop_sum(v)));\n        }\n    }\n    if (model->estimate) estimate_model(data, model, \"\", true_params->parameters);\n    else { //try all the MLEs.\n        estimate_model(data, model, \"NM simplex\", true_params->parameters);\n        if(is_t(model)) return; //t distribution still v. slow to converge.\n        estimate_model(data, model, \"PR cg\", true_params->parameters);\n        estimate_model(data, model, \"Newton Hybrid\", true_params->parameters);\n    }\n    apop_data_free(data);\n}\n\nvoid test_cdf(gsl_rng *r, apop_model *m){//m is parameterized\n    //Make random draws from the dist, then find the CDF at that draw\n    //That should generate a uniform distribution.\n    if (!m->cdf || is_bernie(m) || is_binom(m))\n        return;\n    int drawct = 1e4;\n    apop_data *draws = apop_data_alloc(drawct, m->dsize);\n    apop_data *cdfs = apop_data_alloc(drawct);\n    for (int i=0; i< drawct; i++){\n        Apop_stopif(apop_draw(Apop_r(draws, i)->matrix->data, r, m), abort(), 0, \"bad draw.\");\n        apop_data_set(cdfs, i, -1, apop_cdf(Apop_r(draws, i), m));\n    }\n    apop_model *cdf = apop_estimate(apop_data_sort(cdfs), apop_pmf);\n    apop_model *u01 = apop_model_set_parameters(apop_uniform, 0, 1);\n    apop_data *ktest = apop_test_kolmogorov(cdf, u01);\n    //apop_data_show(ktest);\n    double maxdist = apop_data_get(ktest, .rowname=\"max distance\");\n    assert(maxdist < .03); //the K-S test has high confidence of rejection with large N\n    apop_data_free(ktest); apop_data_free(draws); apop_data_free(cdfs);\n    apop_model_free(u01);  apop_model_free(cdf);\n}\n\ndouble true_parameter_v[] = {1.82,2.1};\n\nvoid test_distributions(gsl_rng *r){\n    if (verbose) printf(\"\\n\");\n    apop_model* true_params;\n    apop_model *null_model = &(apop_model){\"the null model\"};\n\n#define model_no_est(base) \\\n    apop_model * base ## _no_est = apop_model_copy(apop_##base);\\\n    base ## _no_est->estimate=NULL;\n\n    model_no_est(beta);\n    model_no_est(bernoulli);\n    model_no_est(gamma);\n    model_no_est(exponential);\n    model_no_est(poisson);\n\n    apop_t_distribution->estimate=NULL; //find df by MLE, not observation count.\n    apop_model *dist[] = {\n            apop_bernoulli, bernoulli_no_est, \n            apop_beta, beta_no_est,\n            apop_binomial, apop_dirichlet,\n            apop_exponential, exponential_no_est,\n            apop_gamma, gamma_no_est,\n            apop_lognormal, apop_multinomial, \n            apop_multivariate_normal,\n            apop_normal, apop_poisson, poisson_no_est,\n            apop_t_distribution, apop_uniform,\n            apop_yule, apop_zipf, /*apop_wishart,*/\n            null_model};\n\n    for (int i=0; strcmp(dist[i]->name, \"the null model\"); i++){\n        if (verbose) {printf(\"%s: \", dist[i]->name); fflush(NULL);}\n        true_params = apop_model_copy(dist[i]);\n        true_params->parameters = apop_data_fill_base(apop_data_alloc(dist[i]->vsize==1 ? 1 : 2), true_parameter_v);\n        if (is_beta(dist[i]))\n            true_params->parameters = apop_data_falloc((2), .5, .2);\n        if (is_bernie(dist[i]))\n            true_params->parameters = apop_data_falloc((1), .1);\n        if (is_binom(dist[i])){\n            true_params->parameters = apop_data_falloc((2), 15, .2);\n            dist[i]->dsize=2;\n        }\n        if (!strcmp(dist[i]->name, \"Dirichlet distribution\"))\n            dist[i]->dsize=2;\n        if (!strcmp(dist[i]->name, \"Multivariate normal distribution\")){\n            true_params->parameters = apop_data_falloc((2, 2, 2), 15, .5, .2,\n                                                                   3, .2, .5);\n            dist[i]->dsize=2;\n        }\n        if (!strcmp(dist[i]->name, \"Multinomial distribution\")){\n            true_params->parameters = apop_data_falloc((4), 15, .5, .2, .1);\n            dist[i]->dsize=4;\n        }\n        if (apop_regex(dist[i]->name, \"gamma distribution\"))\n            true_params->parameters = apop_data_falloc((2), 1.5, 2.5);\n        if (is_t(dist[i]))\n            true_params->parameters = apop_data_falloc((3), 1, 3, 16);\n        if (!strcmp(dist[i]->name, \"Wishart distribution\")){\n            true_params->parameters = apop_data_falloc((2, 2, 2), 996, .2, .1,\n                                                                    0, .1, .2);\n            apop_vector_realloc(true_params->parameters->vector, 1);\n        }\n        test_one_distribution(r, dist[i], true_params);\n        test_cdf(r, true_params);\n        if (verbose) {printf(\"\\nPASS.   \"); fflush(NULL);}\n    }\n}\nstatic void got_bored(){ exit(0); }\n\nint main(int argc, char **argv){\n#ifdef _OPENMP\n    if (omp_get_num_procs()==1) omp_set_num_threads(2); //always at least 2 threads\n#endif\n    int c;\n    char opts[] = \"sqt:\";\n    if (argc==1)\n        printf(\"\\tDistribution tests. Each dot is an optimization run, including some methods known to be inefficient.\\n\\tFor quieter output, use -q. Default is two threads; change with -t1, -t3, ...\\n\");\n    while((c = getopt(argc, argv, opts))!=-1)\n        if (c == 'q')      verbose  --;\n#ifdef _OPENMP\n        else if (c == 't')  omp_set_num_threads(atoi(optarg));\n#endif\n\n    gsl_rng *r = apop_rng_alloc(213452);\n    signal(SIGINT, got_bored);\n    test_distributions(r);\n}\n"
  },
  {
    "path": "tests/error_test.c",
    "content": "#include <apop.h>\n#if _POSIX_C_SOURCE >= 200809L\n#define HAVE_FMEMOPEN\n#endif\nFILE *fmemopen(void *buf, size_t size, const char *mode); //POSIX.\n\n/* Set up an error, and check for the presence of:\n  --the correct error flag in the data set\n  --the name of the function in the error log.\n\nI'm only checking for the function name in the error log because I don't want to rewrite this every time \nthe wording in the error message changes.\n\nIf fmemopen is missing, don't even bother with the error log stuff.\n\n*/\nchar errorbuff[10000];\n\nvoid check_log(char*fn_to_check, char*msg){\n#ifdef HAVE_FMEMOPEN\n    fflush(NULL);\n    Apop_stopif (!apop_regex(errorbuff, fn_to_check), abort(), 0, \"%s\", msg);\n#endif\n}\n\nvoid check_data_error(apop_data *in, char should_be, char *fn_to_check, char *msg){\n    Apop_stopif (in->error != should_be, abort(), 0, \"Didn't set %s.\", msg);\n    check_log(fn_to_check, msg);\n}\n\nvoid reset_log(){\n#ifdef HAVE_FMEMOPEN\n    if (apop_opts.log_file) fclose(apop_opts.log_file);\n    apop_opts.log_file = fmemopen(errorbuff, 10000, \"w\");\n#endif\n}\n\nint main(){\n    apop_opts.db_engine='s';\n    printf(\"test error checking (some systems may print error messags here)\\n\");\n    reset_log();\n    apop_data *d = apop_data_alloc();\n    apop_data *d2 = apop_data_add_page(d, apop_data_alloc(), \"newp\");\n    d2->more = d2;\n    apop_data_free(d);\n    check_data_error(d, 'c', \"apop_data_free_base\", \"circular error code\");\n    check_data_error(d2, 'c', \"apop_data_free_base\", \"circular error code\");\n\n    reset_log();\n    apop_data *c = apop_data_copy(d);\n    check_data_error(c, 'c', \"apop_data_copy\", \"circular error code\");\n    check_data_error(c->more, 'c', \"apop_data_copy\", \"circular error code\");\n\n    reset_log();\n    apop_data *d3 = apop_data_alloc(2,2);\n    apop_data_memcpy(d, d3);\n    check_data_error(d, 'p', \"apop_data_memcpy\", \"missing part code\");\n\n    reset_log();\n    apop_data *dbig = apop_data_alloc(2e8,2e8);\n    check_data_error(dbig, 'a', \"apop_data_alloc\", \"misallocation code \"\n                                \"(or you are using a big mother of a computer).\");\n\n    reset_log();\n    check_data_error(apop_query_to_data(\"stelect 8 from data\"), 'q', \"(apop_query_to_data|<unknown>)\", \"query error\");\n    reset_log();\n    check_data_error(apop_query_to_mixed_data(\"dd\", \"stelect 8 from data\"), 'q', \"(apop_sqlite_multiquery|<unknown>)\", \"query error\");\n\n    reset_log();\n    apop_data *fefail = apop_data_falloc((2,2), 0, 0, -1, -1);\n    apop_data *exact = apop_test_fisher_exact(fefail);\n    check_data_error(exact, 'p', \"apop_test_fisher_exact\", \"fexact internal processing code\");\n\n    reset_log();\n    apop_data *d44 = apop_data_alloc(4,4);\n    check_data_error(apop_dot(d44, d3), 'd', \"apop_dot\", \"dot product dimension error\");\n\n    apop_multivariate_normal->parameters = apop_data_calloc(2, 2, 2);\n    assert(apop_log_likelihood(fefail, apop_multivariate_normal) == -INFINITY);\n    check_log(\"apop_multinormal_ll\", \"Failed to not take the determinant of a zero matrix.\");\n\n    if (apop_opts.log_file) fclose(apop_opts.log_file);\n    apop_opts.log_file = NULL;\n}\n"
  },
  {
    "path": "tests/factors.c",
    "content": "#include <apop.h>\n\n/* We promise users that they can copy factors from one data set to the next. But if the\nsecond data set has values that do not appear in the first data set, then they have\nto be added to the factor list.\n*/\n\nint main(){\n    apop_data *d1 = apop_text_alloc(NULL, 5, 1);\n    apop_data *d2 = apop_text_alloc(NULL, 5, 1);\n    apop_text_fill(d1, \"A\", \"B\", \"C\", \"B\", \"B\");\n    apop_text_fill(d2, \"B\", \"A\", \"D\", \"B\", \"B\");\n    apop_data_to_factors(d1);\n    apop_data_show(d1);\n    d2->more = apop_data_copy(apop_data_get_factor_names(d1, 0, 't'));\n    printf(\"-----\\n\");\n    apop_data_to_dummies(d2, .append='y');\n    apop_data_show(d2);\n\n\n    //some spot checks.\n    assert(apop_data_get(d1, 2)==2);\n    assert(apop_data_get(d2, 0, 0)==1);\n    assert(apop_data_get(d2, 2, 0)==0);\n    assert(apop_data_get(d2, 2, 1)==0);\n    assert(apop_data_get(d2, 2, 2)==1);\n    assert(apop_data_get(d2, 3, 0)==1);\n}\n"
  },
  {
    "path": "tests/faith.data",
    "content": "#The \"old faithful\" data, pulled from R. If you have a copy of R on hand, use ?faithful for the full documentation.\n#eruptions=length of eruption (minutes, rounded somehwat)\n#waiting=time between eruptions.\nid,eruptions,waiting\n1,3.6,79\n2,1.8,54\n3,3.333,74\n4,2.283,62\n5,4.533,85\n6,2.883,55\n7,4.7,88\n8,3.6,85\n9,1.95,51\n10,4.35,85\n11,1.833,54\n12,3.917,84\n13,4.2,78\n14,1.75,47\n15,4.7,83\n16,2.167,52\n17,1.75,62\n18,4.8,84\n19,1.6,52\n20,4.25,79\n21,1.8,51\n22,1.75,47\n23,3.45,78\n24,3.067,69\n25,4.533,74\n26,3.6,83\n27,1.967,55\n28,4.083,76\n29,3.85,78\n30,4.433,79\n31,4.3,73\n32,4.467,77\n33,3.367,66\n34,4.033,80\n35,3.833,74\n36,2.017,52\n37,1.867,48\n38,4.833,80\n39,1.833,59\n40,4.783,90\n41,4.35,80\n42,1.883,58\n43,4.567,84\n44,1.75,58\n45,4.533,73\n46,3.317,83\n47,3.833,64\n48,2.1,53\n49,4.633,82\n50,2,59\n51,4.8,75\n52,4.716,90\n53,1.833,54\n54,4.833,80\n55,1.733,54\n56,4.883,83\n57,3.717,71\n58,1.667,64\n59,4.567,77\n60,4.317,81\n61,2.233,59\n62,4.5,84\n63,1.75,48\n64,4.8,82\n65,1.817,60\n66,4.4,92\n67,4.167,78\n68,4.7,78\n69,2.067,65\n70,4.7,73\n71,4.033,82\n72,1.967,56\n73,4.5,79\n74,4,71\n75,1.983,62\n76,5.067,76\n77,2.017,60\n78,4.567,78\n79,3.883,76\n80,3.6,83\n81,4.133,75\n82,4.333,82\n83,4.1,70\n84,2.633,65\n85,4.067,73\n86,4.933,88\n87,3.95,76\n88,4.517,80\n89,2.167,48\n90,4,86\n91,2.2,60\n92,4.333,90\n93,1.867,50\n94,4.817,78\n95,1.833,63\n96,4.3,72\n97,4.667,84\n98,3.75,75\n99,1.867,51\n100,4.9,82\n101,2.483,62\n102,4.367,88\n103,2.1,49\n104,4.5,83\n105,4.05,81\n106,1.867,47\n107,4.7,84\n108,1.783,52\n109,4.85,86\n110,3.683,81\n111,4.733,75\n112,2.3,59\n113,4.9,89\n114,4.417,79\n115,1.7,59\n116,4.633,81\n117,2.317,50\n118,4.6,85\n119,1.817,59\n120,4.417,87\n121,2.617,53\n122,4.067,69\n123,4.25,77\n124,1.967,56\n125,4.6,88\n126,3.767,81\n127,1.917,45\n128,4.5,82\n129,2.267,55\n130,4.65,90\n131,1.867,45\n132,4.167,83\n133,2.8,56\n134,4.333,89\n135,1.833,46\n136,4.383,82\n137,1.883,51\n138,4.933,86\n139,2.033,53\n140,3.733,79\n141,4.233,81\n142,2.233,60\n143,4.533,82\n144,4.817,77\n145,4.333,76\n146,1.983,59\n147,4.633,80\n148,2.017,49\n149,5.1,96\n150,1.8,53\n151,5.033,77\n152,4,77\n153,2.4,65\n154,4.6,81\n155,3.567,71\n156,4,70\n157,4.5,81\n158,4.083,93\n159,1.8,53\n160,3.967,89\n161,2.2,45\n162,4.15,86\n163,2,58\n164,3.833,78\n165,3.5,66\n166,4.583,76\n167,2.367,63\n168,5,88\n169,1.933,52\n170,4.617,93\n171,1.917,49\n172,2.083,57\n173,4.583,77\n174,3.333,68\n175,4.167,81\n176,4.333,81\n177,4.5,73\n178,2.417,50\n179,4,85\n180,4.167,74\n181,1.883,55\n182,4.583,77\n183,4.25,83\n184,3.767,83\n185,2.033,51\n186,4.433,78\n187,4.083,84\n188,1.833,46\n189,4.417,83\n190,2.183,55\n191,4.8,81\n192,1.833,57\n193,4.8,76\n194,4.1,84\n195,3.966,77\n196,4.233,81\n197,3.5,87\n198,4.366,77\n199,2.25,51\n200,4.667,78\n201,2.1,60\n202,4.35,82\n203,4.133,91\n204,1.867,53\n205,4.6,78\n206,1.783,46\n207,4.367,77\n208,3.85,84\n209,1.933,49\n210,4.5,83\n211,2.383,71\n212,4.7,80\n213,1.867,49\n214,3.833,75\n215,3.417,64\n216,4.233,76\n217,2.4,53\n218,4.8,94\n219,2,55\n220,4.15,76\n221,1.867,50\n222,4.267,82\n223,1.75,54\n224,4.483,75\n225,4,78\n226,4.117,79\n227,4.083,78\n228,4.267,78\n229,3.917,70\n230,4.55,79\n231,4.083,70\n232,2.417,54\n233,4.183,86\n234,2.217,50\n235,4.45,90\n236,1.883,54\n237,1.85,54\n238,4.283,77\n239,3.95,79\n240,2.333,64\n241,4.15,75\n242,2.35,47\n243,4.933,86\n244,2.9,63\n245,4.583,85\n246,3.833,82\n247,2.083,57\n248,4.367,82\n249,2.133,67\n250,4.35,74\n251,2.2,54\n252,4.45,83\n253,3.567,73\n254,4.5,73\n255,4.15,88\n256,3.817,80\n257,3.917,71\n258,4.45,83\n259,2,56\n260,4.283,79\n261,4.767,78\n262,4.533,84\n263,1.85,58\n264,4.25,83\n265,1.983,43\n266,2.25,60\n267,4.75,75\n268,4.117,81\n269,2.15,46\n270,4.417,90\n271,1.817,46\n272,4.467,74\n"
  },
  {
    "path": "tests/lognormal_test.c",
    "content": "#include <apop.h>\n#define Diff(L, R, eps) {double left=(L), right=(R); Apop_stopif(isnan(left-right) || fabs((left)-(right))>(eps), abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", (double)(left), (double)(right), eps);}\n\nvoid test_lognormal(gsl_rng *r){\n    apop_model *source = apop_model_copy(apop_normal);\n    apop_model_clear(NULL, source);\n    double mu = gsl_ran_flat(r, -1, 1);\n    double sigma = gsl_ran_flat(r, .01, 1);\n    int n = gsl_ran_flat(r,1,8e5);\n    apop_data *data = apop_data_alloc(0,1,n);\n    gsl_vector_set(source->parameters->vector, 0, mu);\n    gsl_vector_set(source->parameters->vector, 1, sigma);\n    for (int j=0; j< n; j++){\n        double *k   = gsl_matrix_ptr(data->matrix, 0, j);\n        apop_draw(k, r, source);\n        *k = exp(*k);\n    }\n    apop_model_free(source);\n    apop_model *out = apop_estimate(data, apop_lognormal);\n    double muhat = apop_data_get(out->parameters, 0,-1);\n    double sigmahat = apop_data_get(out->parameters, 1,-1);\n    //if (verbose) printf(\"mu: %g, muhat: %g, var: %g, varhat: %g\\n\", mu, muhat,  sigma,sigmahat);\n    Diff(mu, muhat, 1e-2);\n    Diff(sigma, sigmahat, 1e-2);\n    apop_model_free(out);\n\n    apop_model *for_mle= apop_model_copy(apop_lognormal);\n    for_mle->estimate=NULL;\n    apop_model *out2 = apop_estimate(data, for_mle);\n    apop_model_free(for_mle);\n    muhat = apop_data_get(out2->parameters, 0,-1);\n    sigmahat = apop_data_get(out2->parameters, 1,-1);\n    Diff(mu, muhat, 1e-2);\n    Diff(sigma, sigmahat, 1e-2);\n    apop_model_free(out2);\n    apop_data_free(data);\n}\n\nint main(){ test_lognormal(apop_rng_alloc(24)); }\n"
  },
  {
    "path": "tests/nist_tests.c",
    "content": "/* These are stats tests from NIST. See http://www.itl.nist.gov/div898/strd/\nNotice that I use various levels of tolerance, so this gives you an idea\nof the relative accuracies of various operations. */\n\n#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n#include <unistd.h>\n\n#define TOL 1e-15\n#define TOL2 1e-5\n#define TOL3 1e-9\n\n#define Diff(L, R, eps) Apop_stopif(isnan(L-R) || fabs((L)-(R))>(eps), abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", (double)(L), (double)(R), eps);\n\nvoid pontius(){\n    apop_text_to_db( DATADIR \"/\" \"pontius.dat\" ,\"pont\", .delimiters=\" \");\n    apop_data *d = apop_query_to_data(\"select y, x, pow(x,2) as p from pont\");\n    apop_model *est =  apop_estimate(d, apop_ols);\n\n    Diff(apop_data_get(est->parameters, 0, -1), 0.673565789473684E-03, TOL3);\n    Diff(apop_data_get(est->parameters, 1, -1), 0.732059160401003E-06, TOL);\n    Diff(apop_data_get(est->parameters, 2, -1), -0.316081871345029E-14, TOL);\n    apop_data *cov = apop_data_get_page(est->parameters, \"<covariance>\");\n    Diff(apop_data_get(cov, 0, 0), pow(0.107938612033077E-03,2), TOL2);\n    Diff(apop_data_get(cov, 1, 1), pow(0.157817399981659E-09,2), TOL2);\n    Diff(apop_data_get(cov, 2, 2), pow(0.486652849992036E-16,2), TOL2);\n    Diff(apop_data_get(est->info, .rowname=\"R squared\"), 0.999999900178537, TOL);\n    Diff(apop_data_get(est->info, .rowname=\"SSR\"), 15.6040343244198, TOL3);\n}\n\nvoid wampler1(){\n    apop_text_to_db( DATADIR \"/\" \"wampler1.dat\" ,\"w1\", .delimiters=\" \");\n    apop_data *d = apop_query_to_data(\"select y, x, pow(x,2) as p2, \\\n                                pow(x,3) as p3, pow(x,4) as p4, pow(x,5) as p5 from w1\");\n    apop_model *est = apop_estimate(d, apop_ols);\n    for (int i=0; i<6; i++)\n        Diff(apop_data_get(est->parameters, i, -1) ,1 , 1e-3);\n    apop_data *cov = apop_data_get_page(est->parameters, \"<covariance>\");\n    for (int i=0; i<6; i++)\n        Diff(apop_data_get(cov, i, i), 0, TOL2);\n    Diff(apop_data_get(est->info, .rowname=\"R squared\"), 1, TOL);\n}\n\nvoid numacc4(){\n    apop_data *d  = apop_text_to_data( DATADIR \"/\" \"numacc4.dat\" );\n    gsl_vector *v = Apop_cv(d, 0);\n    Diff(apop_vector_mean(v), 10000000.2, 1e-5);\n    Diff(apop_vector_var(v)*(v->size -1)/v->size, 0.01, TOL3);\n    //I don't do this yet:\n    //Sample Autocorrelation Coefficient (lag 1) r(1):   -0.999     (exact)\n}\n\nint main(){\n    pontius();\n    wampler1();\n    numacc4();\n}\n"
  },
  {
    "path": "tests/printing_sample",
    "content": "    2\t    4\t    6\t    8\n    3\t    7\t  nan\t    6\n    3\t  nan\t    1\t    7\n    9\t    0\t    8\t    1\n\nand a full vector+matrix+text+weights data set, formatted for computer reading:\n\tSome SNPS\n\nchr  \t| aa\t| ab\t| rsid\t| a_allele\t| b_allele\t| treatment\n    1\t| \t|     0\t|    37\t| rs1933024\t| A\t| G\t| cases\t|  1830\n    1\t| \t|     0\t|    39\t| rs11497407\t| A\t| G\t| cases\t|  1828\n    1\t| \t|     2\t|   190\t| rs12565286\t| C\t| G\t| cases\t|  1674\n    1\t| \t|     1\t|   179\t| rs'11804171\t| A\t| T\t| cases\t|  1686\n    1\t| \t|  1836\t|    31\t| rs2977656\t| C\t| T\t| cases\t|     0\n    1\t| \t|     0\t|   201\t| rs12138618\t| A\t| G\t| cases\t|  1660\n    1\t| \t|  1230\t|   570\t| rs3094315\t| A\t| G\t| cases\t|    60\n    1\t| \t|    33\t|   437\t| rs17160906\t| A\t| G\t| cases\t|  1396\n    1\t| \t|     0\t|   133\t| rs2519016\t| C\t| T\t| cases\t|  1731\n\nand just the names:\ntitle: Some SNPS\nvector:\tchr\ncolumn:\taa\tab\ntext:\trsid\ta_allele\tb_allele\ttreatment\n\nand just the weights vector:\n 1830\t 1828\t 1674\t 1686\t    0\t 1660\t   60\t 1396\t 1731\n"
  },
  {
    "path": "tests/rake_test.c",
    "content": "#include <apop.h>\n\n#define Diff(L, R, eps) Apop_assert_n(fabs((L)-(R))<(eps), \"%g is too different from %g (abitrary limit=%g).\", (double)(L), (double)(R), eps);\n\nvoid rake_check(apop_model *base, apop_model *fitted){\n    Diff(apop_query_to_float(\"select sum(weights) from raked where first=1\"), 12, 1e-4);\n    Diff(apop_query_to_float(\"select sum(weights) from raked where first=2\"), 20, 1e-4);\n    Diff(apop_query_to_float(\"select sum(weights) from raked where second=1\"), 25, 1e-4);\n    Diff(apop_query_to_float(\"select sum(weights) from raked where second=2\"), 7, 1e-4);\n    /* Raking minimizes KL divergence given the margin constraints. So nudging the table \n       in a manner that fits the constraints should raise KLdiv. */\n    double kl1= apop_kl_divergence(base, fitted);\n    *gsl_vector_ptr(fitted->data->weights, 0) += 0.05;\n    *gsl_vector_ptr(fitted->data->weights, 1) += -0.05;\n    *gsl_vector_ptr(fitted->data->weights, 2) += -0.05;\n    *gsl_vector_ptr(fitted->data->weights, 3) += 0.05;\n    assert(kl1 < apop_kl_divergence(base, fitted));\n}\n\n//these work by checking that K-L divergence shrunk, and that individual margins are correct.\nvoid test_raking_further(){\n    apop_table_exists(\"rake_test\", 'd');\n    apop_query(\"create table rake_test (first, second, weights);\"\n            \"insert into rake_test values(1, 1, 10);\"\n            \"insert into rake_test values(1, 2, 2);\"\n            \"insert into rake_test values(2, 1, 15);\"\n            \"insert into rake_test values(2, 2, 5);\"\n            );\n\n    //Synthetic data, starting at all ones.\n    apop_data_print(\n            apop_rake(.margin_table=\"rake_test\", .count_col=\"weights\", \n                .contrasts=(char*[]){\"first\", \"second\"}, .contrast_ct=2),\n        .output_name=\"raked\", .output_type='d');\n    apop_model *fitted= apop_estimate(apop_query_to_mixed_data(\"mmw\", \"select * from raked\"), apop_pmf);\n    rake_check(apop_estimate(apop_query_to_mixed_data(\"mmw\", \"select first, second, 1 from rake_test\"), apop_pmf), fitted);\n\n        //With an alternate init table\n    apop_table_exists(\"raked\", 'd');\n    apop_query(\"create table rakeinit (first, second, weights);\"\n            \"insert into rakeinit values(1, 1, 32);\"\n            \"insert into rakeinit values(1, 2, 289);\"\n            \"insert into rakeinit values(2, 1, 19);\"\n            \"insert into rakeinit values(2, 2, 5447);\"\n            );\n    apop_data_print(\n            apop_rake(.margin_table=\"rake_test\", .count_col=\"weights\", \n                .contrasts=(char*[]){\"first\", \"second\"}, .contrast_ct=2, .init_table=\"rakeinit\", .init_count_col=\"weights\"),\n        .output_name=\"raked\", .output_type='d');\n    //apop_data_show(apop_query_to_data(\"select * from raked\"));\n\n    apop_model *base= apop_estimate(apop_query_to_mixed_data(\"mmw\", \"select * from rakeinit\"), apop_pmf);\n    fitted= apop_estimate(apop_query_to_mixed_data(\"mmw\", \"select * from raked\"), apop_pmf);\n    rake_check(base, fitted);\n}\n\n\n/* Some OK tests on the raking procedure. We assert that a regression on the raw data \nand a set of dummies is equivalent to a regression on the base data. */\n\ndouble weights_are_one(apop_data *in){assert(gsl_vector_get(in->weights, 0)==1); return 0;}\ndouble equal_or_absent(apop_data *in){\n    assert(apop_data_get(in, .colname=\"a\") != 3);\n    assert(apop_data_get(in, .colname=\"b\") != 7);\n    assert(gsl_vector_get(in->weights, 0)==100./81);\n    return 0;\n}\n\ndouble compare_results(apop_data *in, void *other, int index){\n    double other_val= apop_data_get(((apop_data *)other), .row=index, .col=-1);\n    assert(fabs(gsl_vector_get(in->vector, 0)-other_val)< 1e-3); \n    return 0;\n}\n    \nint main(){\n    //trivial case: if all margins are equal, MLE is to give equal weights.\n    int a, b, c;\n    apop_query(\"create table equals (a,b,c)\");\n    for (a=0; a < 10; a++)\n        for (b=0; b < 10; b++)\n            for (c=0; c < 10; c++)\n                apop_query(\"insert into equals values (%i, %i, %i)\", a,b,c);\n    apop_data *equal_weights = apop_rake(.margin_table=\"equals\", .init_table=\"equals\");\n    apop_map(equal_weights, .fn_r=weights_are_one);\n\n    //structural zeros should be missing from the output table.\n    apop_data *with_zeros = apop_rake(\"equals\", .structural_zeros=\"a+0.0==3 or b+0.0==7\");\n    apop_map(with_zeros, .fn_r=equal_or_absent);\n\n    //Not-trivial case.\n    //Regression parameters on the inequal weights should match regression parameters on the raw data.\n    apop_query(\"create table inequals (a,b,c, weights)\");\n    gsl_rng *r = apop_rng_alloc(25);\n    for (a=0; a < 3; a++)\n        for (b=0; b < 4; b++)\n            for (c=0; c < 10; c++)\n                apop_query(\"insert into inequals values (%i, %i, %i, %g)\", a,b,c, a/2.+gsl_rng_uniform(r));\n    char *contrasts[] ={\"a|b\"};\n    apop_data *inequal_weights = apop_rake(\"inequals\", .var_list=(char*[]){\"a\", \"b\", \"c\"}, .var_ct=3, .contrasts=contrasts, .count_col=\"weights\", .contrast_ct=1, .tolerance=1e-8);\n\n\n    //regress using the estimates from the raking\n    apop_data_rm_columns(inequal_weights, (int[]){0, 0, 1});\n    apop_data_pmf_compress(inequal_weights);\n    apop_data_to_dummies(inequal_weights, .type='d', .append='y', .remove='y'); //first A column\n    apop_data_to_dummies(inequal_weights, .col=apop_name_find(inequal_weights->names, \"b\", 'c'), \n                                          .type='d', .append='y', .remove='y');\n    inequal_weights->vector = inequal_weights->weights;\n    inequal_weights->weights = NULL;\n    apop_model *raked_ols = apop_estimate(inequal_weights, apop_ols);  \n\n   \n    //regress using the original data\n    apop_data *t = apop_query_to_mixed_data(\"vmm\", \"select sum(weights), a, b from inequals group by a, b\");//force linear, not affine.\n    apop_data_to_dummies(t, .type='d', .append='y', .remove='y');\n    apop_data_to_dummies(t, .col=apop_name_find(t->names, \"b\", 'c'), .type='d', .append='y', .remove='y');\n    apop_model *ols_out = apop_estimate(t, apop_ols);\n\n    apop_map(ols_out->parameters, .fn_rpi=compare_results, .param= raked_ols->parameters);\n\n    test_raking_further();\n}\n"
  },
  {
    "path": "tests/sort_example.c",
    "content": "#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#include <apop.h>\n#include <unistd.h>\n#ifdef Testing\n#include \"sort_tests.c\" //For Apophenia's test suite, some tedious checks that the sorts worked\n#endif\n\n//get_distance is for the sort-by-Euclidian distance example below.\ndouble get_distance(gsl_vector *v) {return apop_vector_distance(v);}\n\nint main(){\n    apop_text_to_db( DATADIR \"/\" \"amash_vote_analysis.csv\" );\n    apop_data *d = apop_query_to_mixed_data(\"mntmtm\", \"select 1,id,party,contribs/1000.0,vote,ideology from amash_vote_analysis \" );\n\n    //use the default order of columns for sorting\n    apop_data *sorted = apop_data_sort(d, .inplace='n');\n#ifndef Testing\n    apop_data_print(sorted);\n#else\n    check_sorting1(sorted);\n#endif\n\n    //set up a specific column order\n    apop_data *perm = apop_data_copy(Apop_r(d, 0));\n    perm->vector = NULL;\n    apop_data_fill(perm, 5, 3, 4);\n    apop_text_set(perm, 0, 0, \"2\");\n    apop_text_set(perm, 0, 1, \"1\");\n\n    apop_data_sort(d, perm);\n#ifndef Testing\n    apop_data_print(d);\n#else\n    check_sorting2(d);\n#endif\n\n    //sort a list of names\n    apop_data *blank = apop_data_alloc();\n    apop_data_add_names(blank, 'r', \"C\", \"E\", \"A\");\n    apop_data_sort(blank);\n    assert(*blank->names->row[0] == 'A');\n    assert(*blank->names->row[1] == 'C');\n    assert(*blank->names->row[2] == 'E');\n\n    //take each row of the matrix as a vector; store the Euclidian distance to the origin in the vector;\n    //sort in descending order.\n    apop_data *rowvectors = apop_text_to_data( DATADIR \"/\" \"test_data\" );\n    apop_map(rowvectors, .fn_v=get_distance, .part='r', .inplace='y');\n    apop_data *arow = apop_data_copy(Apop_r(rowvectors, 0));\n    arow->matrix=NULL; //sort only by the distance vector\n    apop_data_sort(rowvectors, arow, .asc='d');\n#ifndef Testing\n    apop_data_print(rowvectors);\n#else\n    double prev = INFINITY;\n    for (int i=0; i< rowvectors->vector->size; i++){\n        double this = apop_data_get(rowvectors, i, -1);\n        assert(this < prev);\n        prev = this;\n    }\n#endif\n}\n"
  },
  {
    "path": "tests/sort_tests.c",
    "content": "//for inclusion into sort_example.c\n\nvoid check_sorting1(apop_data *d){\n    double last_val = -INFINITY;\n    double last_val2 = -INFINITY;\n    for (int i=0; i < d->matrix->size1; i++){\n        //Check for correct sort two columns deep.\n        //Col 0 is all ones, so skip it.\n        double this_val = apop_data_get(d, i, 1);\n        double this_val2 = apop_data_get(d, i, 2);\n        assert(this_val >= last_val);\n        if (this_val == last_val)\n            assert(this_val2 >= last_val2);\n        last_val = this_val;\n        last_val2 = this_val2;\n    }\n}\n\nvoid check_sorting2(apop_data *d){\n    double last_val = -INFINITY;\n    double last_val2 = -INFINITY;\n    char *last_str = \"Aye\";\n    char *last_str2 = \"Dem\";\n    for (int i=0; i < *d->textsize; i++){\n        //Check for correct sort two columns deep.\n        //Col 0 is all ones, so skip it.\n        char *this_str = d->text[i][1];\n        char *this_str2 = d->text[i][0];\n        double this_val = apop_data_get(d, i, 1);\n        double this_val2 = apop_data_get(d, i, 2);\n        assert(strcasecmp(this_str, last_str) >=0);\n        if (!strcasecmp(this_str, last_str)){\n            assert(strcasecmp(this_str2, last_str2) >=0);\n            if (!strcasecmp(this_str2,last_str2)){\n                assert(this_val >= last_val);\n                if(this_val == last_val)\n                    assert(this_val2 >= last_val2);\n            }\n        }\n        last_str = this_str;\n        last_str2 = this_str2;\n        last_val = this_val;\n        last_val2 = this_val2;\n    }\n}\n"
  },
  {
    "path": "tests/test_apop.c",
    "content": "/*\nHere are assorted unit tests, some mechanical and some much more computation-intensive.\n\nThe mechanical tests often do things in a convoluted manner for the purpose of touching\nas much of the code base as possible. If you want examples for using Apophenia, please \ndon't look in the tests---try the eg directory in this distribution or online. \n\nFor an example of the statistical tests, let us say that we wish to verify the results\nof a regression. Some systems have a canned screenshot of the 'correct' regression\nresults that ships with the test suite, and compare a screenshot of the run to the\ncanned version. I don't get much confidence from this---what if the canned screenshot is\nwrong? Better would be to know something about the regression results (like the relation\nbetween the common F statistic, SSR, and SSE) and check that the fact always holds.\n\nThose claims are true as N goes to infinity; for finite N the routines have to strike\na balance. How many draws should I make, and how much user time should I waste, before\nmeasuring the error, and what error tolerance should I set? This is a difficult balance,\nand is to some extent the key problem behind all of numeric computing.\n\nThere are two types of error bounds here. One is tighter, and therefore more prone\nto false alarms, but really forces us to write better numeric code. The other is\nmuch more permissive, and just tells us whether the computation failed to go in the\nright direction. Users who run 'make check' will be running the second type of test,\nbecause I (BK) got sick of people sending me bug reports that a test failed\nbecause it reported an error of 1e-5 when it should have been 1e-8. There is always\nroom for better numeric precision; we all know this without reminders from the\npost-install tests.  */\n\n#ifdef Datadir\n#define DATADIR Datadir\n#else\n#define DATADIR \".\"\n#endif\n\n#define _GNU_SOURCE\n#include <apop.h>\n#include <unistd.h>\n\n#ifdef _OPENMP\n#include <omp.h>\n#endif\n\n#ifdef FULL_TOLERANCE\ndouble tol6 = 1e-6;\ndouble tol5 = 1e-5;\ndouble tol3 = 1e-3;\ndouble tol2 = 1e-2;\ndouble tol1 = 1e-1;\n#else\ndouble tol6 = 1e-3;\ndouble tol5 = 1e-3;\ndouble tol3 = 1e-2;\ndouble tol2 = 1e-1;\ndouble tol1 = 1e-1;\n#endif\n\n//assertions never return a value.\n#undef Apop_assert\n#define Apop_assert(expr, ...) {if (!(expr)) {fprintf(stderr, __VA_ARGS__); abort();}}\n\n#define Diff(L, R, eps) {double left=(L), right=(R); Apop_stopif(isnan(left-right) || fabs((left)-(right))>(eps), abort(), 0, \"%g is too different from %g (abitrary limit=%g).\", (double)(left), (double)(right), eps);}\n\n//A NULL-tolerant strcmp, which used to be a fn and has been deleted.\n#define apop_strcmp(a, b) (((a)&&(b) && !strcmp((a), (b))) || (!(a) && !(b)))\n\nint len = 8000;\nint verbose = 1;\n\nvoid test_nan_data();\nvoid db_to_text();\n\nvoid v_pow10(double *in){ *in = pow(10,*in);}\ndouble log_for_map(gsl_vector *v){apop_vector_log(v); return apop_sum(v);}\ndouble log_by_val(double x){return x;}\n\nstatic void log_and_exp(gsl_rng *r){\n    apop_data *d = apop_data_alloc(100,2);\n    apop_data_add_names(d, 'c', \"10\", \"e\");\n    for (int i=0; i< 100; i++){\n        apop_data_set(d, i, .colname=\"10\", .val=gsl_rng_uniform(r)*10);\n        double *eval = apop_data_ptr(d, .row=i, .colname=\"e\");\n        *eval = gsl_rng_uniform(r)*10;\n    }\n    apop_data *d2 = apop_data_copy(d);\n    Apop_col_tv(d, \"10\", tencol);\n    apop_vector_log10(tencol);\n    apop_vector_apply(tencol, v_pow10);\n    gsl_vector *o_tencol = Apop_cv(d2, 0);\n    assert(apop_vector_distance(tencol, o_tencol) < 1e-3);\n    \n    Apop_col_tv(d, \"e\", ecol);\n    apop_vector_log(ecol);\n    apop_vector_exp(ecol);\n    gsl_vector *o_ecol = Apop_cv(d2, 1);\n    assert(apop_vector_distance(ecol, o_ecol) < 1e-3);\n\n    apop_data *d5 = apop_data_alloc(5,5);\n    for (int i=0; i< 5; i++)\n        for (int j=0; j< 5; j++)\n            apop_data_set(d5, i, j, gsl_rng_uniform(r));\n    double log_sum_by_v = apop_matrix_map_sum(d5->matrix, log_for_map);\n    //d5 is now all logs.\n    double log_sum = apop_matrix_map_all_sum(d5->matrix, log_by_val);\n    Diff(log_sum_by_v, log_sum, 1e-5);\n    Diff(apop_matrix_mean(d5->matrix)*25., log_sum, 1e-5);\n    apop_data_free(d); apop_data_free(d2);\n    apop_data_free(d5);\n}\n\nvoid test_percentiles(){\n    gsl_vector *v = gsl_vector_alloc(307);\n    for (size_t i=0; i< 307; i++)\n        gsl_vector_set(v, i, i);\n    double *pcts_up    = apop_vector_percentiles(v, 'u');\n    double *pcts_down  = apop_vector_percentiles(v, 'd');\n    double *pcts_avg   = apop_vector_percentiles(v, 'a');\n    for (size_t i=0; i< 101; i++){\n        assert(pcts_up[i] >= pcts_down[i]);\n        assert(pcts_up[i] >= pcts_avg[i]);\n        assert(pcts_avg[i] >= pcts_down[i]);\n    }\n    assert(pcts_up[100] == pcts_down[100] && pcts_avg[100] == pcts_down[100]);\n    assert(pcts_up[0] == pcts_down[0] && pcts_avg[0] == pcts_down[0]);\n    assert(pcts_avg[50] == (pcts_down[50] + pcts_up[50])/2);\n}\n\nvoid test_score(){\n    int len = 1e5;\n    gsl_rng *r = apop_rng_alloc(123);\n    apop_data *data = apop_data_alloc(len,1);\n    apop_model *source = apop_model_set_parameters(apop_normal, \n                                gsl_ran_flat(r, -5, 5), gsl_ran_flat(r, .01, 5));\n    for (size_t j=0; j< len; j++)\n        apop_draw(gsl_matrix_ptr(data->matrix, j, 0), r, source);\n    apop_model *estme = apop_model_copy(apop_normal);\n    Apop_model_add_group(estme, apop_mle, .method= \"annealing\");\n    apop_prep(data, estme);\n    apop_maximum_likelihood(data, estme);\n\n    apop_model *straight_est = apop_estimate(data, apop_normal);\n    Diff (straight_est->parameters->vector->data[0], source->parameters->vector->data[0], tol1);\n    Diff (straight_est->parameters->vector->data[1], source->parameters->vector->data[1], tol1);\n    apop_model_free(straight_est); \n\n    double sigsqn = gsl_pow_2(estme->parameters->vector->data[1])/len;\n    apop_data *cov = apop_data_get_page(estme->parameters, \"cov\", 'r');\n    Diff (apop_data_get(cov, 0, 0), sigsqn , tol3);\n    Diff (apop_data_get(cov, 1, 1), sigsqn/2 , tol3);\n    double *cov1 = apop_data_ptr(estme->parameters, .page=\"<covariance>\", .row=1, .col=1);\n    Diff (*cov1 ,sigsqn/2 , tol3);\n    Diff(apop_data_get(cov, 0,1) + apop_data_get(cov, 0,1), 0, tol3);\n    apop_model_free(estme);\n    apop_model_free(source); \n    apop_data_free(data);\n}\n\nvoid test_normalizations(gsl_vector *v){\n    //let's check out normalizations, while we have a vector for it\n    gsl_vector_scale(v, 23);\n    gsl_vector_add_constant(v, 8);\n    apop_data dv = (apop_data){.matrix=apop_vector_to_matrix(v)};\n    apop_data_transpose(&dv);\n    for (int i=0; i< dv.matrix->size1; i++)\n        apop_vector_normalize(Apop_rv(&dv, i), NULL, 's');\n    apop_data *dvagain = apop_data_transpose(&dv, .inplace='n');\n    apop_data *sum = apop_data_summarize(dvagain);\n    apop_data_free(dvagain);\n    Diff(apop_data_get(sum, .colname=\"mean\"), 0, 1e-5);\n    Diff(apop_data_get(sum, .colname=\"std dev\"), 1, 1e-5);\n    Diff(apop_data_get(sum, .colname=\"variance\"), 1, 1e-5);\n    apop_data_free(sum);\n    gsl_matrix_free(dv.matrix);\n}\n\n//This tests the database-side functions. Only for sqlite.\nvoid test_skew_and_kurt(gsl_rng *r){\n    gsl_vector *v;\n    if (apop_opts.db_engine=='s'){\n        apop_table_exists(.remove='d', .name=\"t\");\n        apop_query(\"create table t(vals)\");\n        for(int i=0; i<1e4; i++)\n            apop_query(\"insert into t values(%g)\", gsl_rng_uniform(r));\n        v = apop_query_to_vector(\"select * from t\");\n        Diff (apop_var(v), apop_query_to_float(\"select var(vals) from t\"), tol6);\n        Diff (apop_vector_skew(v), apop_query_to_float(\"select skew(vals) from t\"), tol6);\n        Diff (apop_vector_kurtosis(v), apop_query_to_float(\"select kurt(vals) from t\"), tol5);\n        apop_table_exists(\"t\", 'd');\n    } else {\n        v = gsl_vector_alloc(1e4);\n        for(int i=0; i<1e4; i++) gsl_vector_set(v, i,  gsl_rng_uniform(r));\n    }\n    test_normalizations(v);\n    gsl_vector_free(v);\n}\n\nvoid test_listwise_delete(){\n  apop_data *t1 = apop_data_calloc(10,10);\n  apop_text_alloc(t1, 10, 10);\n  for (int i=0; i< 10; i++)\n      for (int j=0; j< 10; j++)\n          apop_text_set(t1, i, j, \"%i\", i*j);\n  //no NaNs yet\n  apop_data *t1c = apop_data_listwise_delete(t1);\n  assert(t1c->matrix->size1==10);\n  assert(t1c->matrix->size2==10);\n  assert(atoi(t1c->text[3][4])==12);\n  //Now kill two rows\n  asprintf(&(t1c->text[3][4]), \"nan\");\n  asprintf(&(t1c->text[9][9]), \"NaN\");\n  t1c = apop_data_listwise_delete(t1c);\n  assert(t1c->matrix->size1==8);\n  assert(t1c->matrix->size2==10);\n  assert(atoi(t1c->text[3][4])==16);\n  //check the vector\n  apop_data *t2 = apop_data_calloc(10); //check on this form of calloc.\n  t1->vector = t2->vector;\n  apop_data_set(t1, 4,-1, GSL_NAN);\n  apop_data *t2c = apop_data_listwise_delete(t1);\n  assert(t2c->matrix->size1==9);\n  apop_data_set(t1, 4,-1, GSL_NAN);\n  apop_data_set(t1, 7,-1, GSL_NAN);\n  apop_data *t3c = apop_data_listwise_delete(t1);\n  assert(atoi(t3c->text[3][4])==12);\n  assert(atoi(t3c->text[4][4])==20);\n  assert(atoi(t3c->text[7][4])==36);\n  assert(t3c->matrix->size1==8);\n  //return NULL if every row has missing data.\n  Apop_col_v(t1, 7, v)\n  gsl_vector_set_all(v, GSL_NAN);\n  assert(!apop_data_listwise_delete(t1));\n\n    //btw, check non-square transpose with blanks.\n    t1 = apop_data_calloc();\n    apop_text_alloc(t1, 12, 10);\n    for (int i=0; i< 10; i++)\n        for (int j=0; j< 9; j++)\n            apop_text_set(t1, i, j, \"%i\", i*j);\n    t2 = apop_data_copy(t1);\n    apop_data_transpose(t1);\n    assert(!strlen(t1->text[7][11]));\n\n    //come back\n    apop_data_transpose(t1);\n    assert(!strlen(t1->text[11][7]));\n    for (int i=0; i< 10; i++)\n        for (int j=0; j< 9; j++)\n            assert(!strcmp(t1->text[i][j], t2->text[i][j]));\n\n    apop_data_free(t1);\n    t1 = apop_text_alloc(NULL, 10, 12);\n    for (int i=0; i< 9; i++)\n        for (int j=0; j< 10; j++)\n            apop_text_set(t1, i, j, \"%i\", i*j);\n    apop_data *t4 = apop_data_transpose(t1, .inplace='n');\n    assert(!strlen(t4->text[11][7]));\n    assert(atoi(t4->text[9][8])==72);\n    assert(t4->textsize[0]==12);\n    assert(t4->textsize[1]==10);\n}\n\n\nstatic void wmt(gsl_vector *v, gsl_vector *v2, gsl_vector *w, gsl_vector *av, gsl_vector *av2, double mean){\n    assert(apop_vector_mean(av) == apop_vector_mean(v,w));\n    assert(apop_vector_mean(v, w) == mean);\n    Diff (apop_vector_var(av), apop_vector_var(v, w), tol5);\n    Diff (apop_vector_cov(av, av2), apop_vector_cov(v, v2, w), tol5);\n    Diff (apop_vector_skew_pop(av), apop_vector_skew_pop(v, w), tol5);\n    Diff (apop_vector_kurtosis_pop(av), apop_vector_kurtosis_pop(v, w), tol5);\n}\n\nvoid test_weigted_moments(){\n  //double        data[]      = {1,2,3};//checking vector_fill with this and w2\n  double        alldata[]   = {1,2,3};\n  double        data3[]     = {3,2,1};\n  double        alldata3[]  = {3,2,1};\n  double        weights[]   = {1,1,1};\n  gsl_vector    *v          = apop_vector_fill(gsl_vector_alloc(3), 1, 2, 3);\n  gsl_vector    *v2         = apop_array_to_vector(data3, 3);\n  gsl_vector    *w          = apop_array_to_vector(weights, 3);\n  gsl_vector    *av         = apop_array_to_vector(alldata, 3);\n  gsl_vector    *av2        = apop_array_to_vector(alldata3, 3);\n    wmt(v,v2,w,av,av2,2);\n  double data2[]       = {0,1,2,3,4};\n  double alldata2[]    = {0,0,0,0,1,1,1,2,2,3};\n  double data4[]       = {0,1,3,2,4};\n  double alldata4[]    = {0,0,0,0,1,1,1,3,3,2};\n  //double weights2[]    = {4,3,2,1,0};\n    v             = apop_array_to_vector(data2, 5);\n    v2            = apop_array_to_vector(data4, 5);\n    av            = apop_array_to_vector(alldata2, 10);\n    av2           = apop_array_to_vector(alldata4, 10);\n    gsl_vector *w2 = gsl_vector_alloc(5);\n    apop_vector_fill(w2, 4, 3, 2, 1, 0);\n    wmt(v, v2, w2, av, av2, 1);\n}\n\nvoid test_split_and_stack(gsl_rng *r){\n    apop_data *d1 = apop_data_alloc(10,10,10);\n    int i,j, tr, tc;\n    apop_data **splits, *dv2;\n    for(i=-1; i< 10; i++)\n        for(j=0; j< 10; j++)\n            apop_data_set(d1, j, i, gsl_rng_uniform(r));\n\n    d1->weights = gsl_vector_alloc(10);\n    for(j=0; j< 10; j++)\n        gsl_vector_set(d1->weights, j, gsl_rng_uniform(r));\n\n    //vector_stacking NULLs:\n    gsl_vector *orig=apop_vector_copy(d1->vector);\n    gsl_vector *cp=apop_vector_stack(NULL, d1->vector);\n    apop_vector_stack(d1->vector, NULL);\n    assert(d1->vector->size==10);\n        for(j=0; j< 10; j++){\n            assert(gsl_vector_get(d1->vector, j)==gsl_vector_get(orig, j));\n            assert(gsl_vector_get(cp, j)==gsl_vector_get(orig, j));\n        }\n    assert(!apop_vector_stack(NULL, NULL));\n    assert(!apop_data_stack(NULL, NULL));\n    gsl_vector_free(orig);\n    gsl_vector_free(cp);\n\n    for(i=-1; i< 13; i++){\n        splits  = apop_data_split(d1, i, 'r');\n        if (i>0 && i< 10)\n            assert(splits[0]->matrix->size1 == i);\n        else if (i < 0)\n            assert(!splits[0]);\n        else if (i >= 10)\n            assert(splits[0]->matrix->size1 == 10);\n        dv2 = apop_data_stack(splits[0], splits[1], 'r');\n        for(j=0; j< 50; j++){\n            tr  = (int) gsl_rng_uniform(r)*10;\n            tc  = (int) gsl_rng_uniform(r)*11-1;\n            assert(apop_data_get(dv2, tr, tc) == apop_data_get(d1,tr,tc));\n            assert(gsl_vector_get(dv2->weights, tr) == gsl_vector_get(d1->weights,tr));\n            if(tr < i){\n                assert(apop_data_get(splits[0], tr, tc)==apop_data_get(d1, tr, tc));\n                assert(apop_data_get(splits[0], tr, tc)==apop_data_get(d1, tr, tc));\n                assert(gsl_vector_get(splits[0]->weights, tr) == gsl_vector_get(d1->weights,tr));\n            } else{\n                int start=i < 0 ? 0 : i;\n                assert(apop_data_get(splits[1], tr-start, tc)==apop_data_get(d1, tr, tc));\n                assert(apop_data_get(splits[1], tr-start, tc)==apop_data_get(d1, tr, tc));\n                assert(gsl_vector_get(splits[1]->weights, tr-start) == gsl_vector_get(d1->weights,tr));\n            }\n        }\n        apop_data_free(dv2);\n        apop_data_free(splits[0]);\n        apop_data_free(splits[1]);\n    }\n    for(i=-1; i< 13; i++){\n        splits  = apop_data_split(d1, i, 'c');\n        if (i>0 && i< 10){\n            assert(splits[0]->matrix->size2 == i);\n            assert(splits[0]->vector->size == 10);\n            assert(!splits[1]->vector);\n        }\n        else if (i < 0){\n            assert(!splits[0]);\n            assert(!splits[0]);\n            assert(splits[1]->vector->size == 10);\n        }\n        else if (i >= 10){\n            assert(splits[0]->matrix->size1 == 10);\n            assert(splits[0]->vector->size == 10);\n            assert(!splits[1]);\n        }\n        dv2 = apop_data_stack(splits[0], splits[1], 'c');\n        for(j=0; j< 50; j++){\n            tr  = (int) gsl_rng_uniform(r)*10;\n            tc  = (int) gsl_rng_uniform(r)*11-1;\n            assert(apop_data_get(dv2, tr, tc) == apop_data_get(d1,tr,tc));\n            assert(gsl_vector_get(dv2->weights, tr) == gsl_vector_get(d1->weights,tr));\n            if (splits[0]) assert(gsl_vector_get(splits[0]->weights, tr) == gsl_vector_get(d1->weights,tr));\n            if (splits[1]) assert(gsl_vector_get(splits[1]->weights, tr) == gsl_vector_get(d1->weights,tr));\n            if(tc < i){\n                assert(apop_data_get(splits[0], tr, tc)==apop_data_get(d1, tr, tc));\n                assert(apop_data_get(splits[0], tr, tc)==apop_data_get(d1, tr, tc));\n            } else{\n                int start=i < 0 ? 0 : i;\n                assert(apop_data_get(splits[1], tr, tc-start)==apop_data_get(d1, tr, tc));\n                assert(apop_data_get(splits[1], tr, tc-start)==apop_data_get(d1, tr, tc));\n            }\n        }\n        apop_data_free(splits[0]);\n        apop_data_free(splits[1]);\n        apop_data_free(dv2);\n    }\n\n    apop_data **notsplits = apop_data_split(d1, 800, 'c');\n    assert(notsplits[0]->matrix->size1 == d1->matrix->size1);\n    assert(notsplits[0]->matrix->size2 == d1->matrix->size2);\n    assert(notsplits[0]->vector->size == d1->vector->size);\n    assert(notsplits[1] == NULL);\n    //let's try a NULL data set \n    apop_data *nulldata = apop_data_alloc();\n    apop_data *onespot = apop_data_alloc(1,1);\n    apop_data_set(onespot, 0, 0, 12);\n    apop_data_stack(nulldata, onespot, .posn='c', .inplace='y');\n    assert(nulldata->matrix->size1==1);\n    assert(nulldata->matrix->size2==1);\n    assert(apop_data_get(nulldata, 0, 0) == 12);\n    apop_data_free(nulldata);\n    apop_data_free(onespot);\n\n    //text\n    apop_data *txt = apop_text_alloc(NULL, 3,3);\n    apop_data *txt2 = apop_text_alloc(NULL, 3,3);\n    for (int i=0; i< 3; i++)\n        for (int j=0; j< 3; j++){\n            apop_text_set(txt, i, j, \"(%i, %i)\", i, j);\n            apop_text_set(txt2, i, j, \"[%i, %i]\", i, j);\n        }\n\n    apop_data *rbound = apop_data_stack(txt, txt2, .posn='r');\n    assert(rbound->textsize[0] == 6);\n    assert(rbound->textsize[1] == 3);\n    assert(!strcmp(rbound->text[2][2], \"(2, 2)\"));\n    assert(!strcmp(rbound->text[5][2], \"[2, 2]\"));\n\n    apop_data_stack(txt, txt2, .posn='c', .inplace='y');\n    assert(txt->textsize[0] == 3);\n    assert(txt->textsize[1] == 6);\n    assert(!strcmp(txt->text[2][2], \"(2, 2)\"));\n    assert(!strcmp(txt->text[2][5], \"[2, 2]\"));\n    apop_data_free(txt);\n    apop_data_free(txt2);\n\n    apop_data **txtsplits = apop_data_split(rbound, 3, 'r');\n    apop_data_free(rbound);\n    assert(txtsplits[0]->textsize[0] ==txtsplits[0]->textsize[1]);\n    assert(txtsplits[0]->textsize[0] == 3);\n    assert(txtsplits[1]->textsize[0] ==txtsplits[1]->textsize[1]);\n    assert(txtsplits[1]->textsize[0] == 3);\n    assert(!strcmp(txtsplits[0]->text[1][1], \"(1, 1)\"));\n    assert(!strcmp(txtsplits[1]->text[1][1], \"[1, 1]\"));\n    apop_data_free(d1);\n    apop_data_free(onespot);\n    apop_data_free(nulldata);\n}\n\n/** I claim that the mean residual is near zero, and that the predicted\n  value is \\f$X'\\beta\\f$.\n  */\nvoid test_predicted_and_residual(apop_model *est){\ngsl_vector  v,\n            *prediction = gsl_vector_alloc(est->data->matrix->size1);\ngsl_matrix  *m          = gsl_matrix_alloc(est->data->matrix->size1,est->data->matrix->size2);\n    //prep an affine data matrix.\n    gsl_matrix_memcpy(m, est->data->matrix);\n    v   = gsl_matrix_column(m, 0).vector;\n    gsl_vector_set_all(&v, 1);\n\n    apop_data *predict_tab = apop_data_get_page(est->info, \"predict\", 'r');\n    v   = gsl_matrix_column(predict_tab->matrix, apop_name_find(predict_tab->names, \"residual\", 'c')).vector;\n    assert(fabs(apop_mean(&v)) < tol5);\n\n    Apop_col_tv(predict_tab, \"predicted\", vv);\n    gsl_blas_dgemv(CblasNoTrans, 1, m, est->parameters->vector, 0, prediction);\n    gsl_vector_sub(prediction, vv);\n    assert(fabs(apop_vector_sum(prediction)) < tol5);\n}\n\nvoid test_OLS(gsl_rng *r){\n    apop_data *set = apop_data_alloc(0, len, 2);\n    for(int i=0; i< len; i++){\n        apop_data_set(set, i, 1, 100*(gsl_rng_uniform(r)-0.5));\n        apop_data_set(set, i, 0, -1.4 + apop_data_get(set,i,1)*2.3);\n    }\n    apop_data *bkup = apop_data_copy(set);\n    apop_model *out = apop_estimate(set, apop_ols);\n    Diff (apop_data_get(out->parameters, 0,-1) , -1.4 , tol5);\n    Diff (apop_data_get(out->parameters, 1,-1) , 2.3 , tol5);\n    apop_model_free(out);\n\n    gsl_vector *w = gsl_vector_alloc(set->matrix->size1);\n    gsl_vector_set_all(w, 14);\n    bkup->weights  = w;\n    out = apop_estimate(bkup, apop_ols);\n    Diff (apop_data_get(out->parameters, 0,-1) , -1.4 , tol5);\n    Diff (apop_data_get(out->parameters, 1,-1) , 2.3 , tol5);\n    apop_data_free(bkup);\n    apop_data_free(set);\n    apop_model_free(out);\n}\n\n#define INVERTSIZE 100\nvoid test_inversion(gsl_rng *r){\n    gsl_matrix *invme = gsl_matrix_alloc(INVERTSIZE, INVERTSIZE);\n    gsl_matrix *inved;\n    gsl_matrix *inved_back;\n    apop_data *four = apop_data_alloc(1);\n    apop_data_set(four, 0, -1, 4);\n    apop_model *fourp = apop_model_copy(apop_zipf);\n    fourp->parameters = four;\n    for(int i=0; i<INVERTSIZE; i++)\n        for(int j=0; j<INVERTSIZE; j++)\n            apop_zipf->draw(gsl_matrix_ptr(invme, i,j),r,  fourp);\n    apop_det_and_inv(invme, &inved, 0, 1);\n    apop_det_and_inv(inved, &inved_back, 0, 1);\n    apop_model_free(fourp);\n    double error = 0;\n    for(int i=0; i<INVERTSIZE; i++)\n        for(int j=0; j<INVERTSIZE; j++)\n            error += gsl_matrix_get(invme, i,j) - gsl_matrix_get(inved_back, i,j);\n    assert (error < 1e-5);\n    gsl_matrix_free(invme);\n    gsl_matrix_free(inved);\n    gsl_matrix_free(inved_back);\n}\n\nvoid test_summarize(){\n    apop_table_exists(\"td\", 'd');\n    apop_text_to_db( DATADIR \"/\" \"test_data\" , .has_row_names= 0,1, .tabname = \"td\");\n    apop_data *m = apop_query_to_data(\"select * from td\");\n    apop_data *s = apop_data_summarize(m);\n    apop_data_free(m);\n    double t = gsl_matrix_get(s->matrix, 1,0);\n    assert (t ==3);\n    t = gsl_matrix_get(s->matrix, 2, 1);\n    double v = sqrt((2*2 +3*3 +3*3 +4.*4.)/3.);\n    assert (t == v);\n    apop_data_free(s);\n}\n\nvoid test_dot(){\napop_data *d1   = apop_text_to_data(.text_file= DATADIR \"/\" \"test_data2\" ,0,1); // 55 x 2\napop_data *d2   = apop_text_to_data( DATADIR \"/\" \"test_data2\" ); // 55 x 2\napop_data *d3   = apop_dot(d1, d2, .form2='t');\ngsl_vector v1 = gsl_matrix_row(d1->matrix, 0).vector;\napop_data *dv   = &(apop_data){.vector=&v1}; // 2 x 1\napop_data *d7   = apop_dot(dv, dv);\n    assert(apop_data_get(d7, 0, -1) == apop_data_get(d3, 0,0));\napop_data *d8   = apop_dot(d1, dv);\napop_data *d9   = apop_dot(dv, d1, .form2='t');\n    gsl_vector_sub(d8->vector, d9->vector);\n    assert(!apop_vector_sum(d8->vector));\n\n    int verbosity = apop_opts.verbose;\n    apop_opts.verbose = -1;\n    apop_data *d10 = apop_dot(d1, d2);\n    assert(d10->error == 'd');\n    apop_data *d11 = apop_dot(dv, d1);\n    assert(d11->error == 'd');\n    apop_opts.verbose = verbosity;\n    apop_data_free(d1); apop_data_free(d2); apop_data_free(d3); \n    apop_data_free(d7); apop_data_free(d8);\n    apop_data_free(d9); apop_data_free(d10); apop_data_free(d11);\n}\n \nstatic void fill_p(apop_data *d, gsl_rng *r){\n    int j, k;\n    if (d->vector)\n        for (j=0; j< d->vector->size; j++)\n            gsl_vector_set(d->vector, j, gsl_rng_uniform(r));\n    if (d->matrix)\n        for (j=0; j< d->matrix->size1; j++)\n            for (k=0; k< d->matrix->size2; k++)\n                gsl_matrix_set(d->matrix, j, k, gsl_rng_uniform(r));\n    if (d->weights)\n        for (j=0; j< d->weights->size; j++)\n            gsl_vector_set(d->weights, j, gsl_rng_uniform(r));\n}\n\nstatic void check_p(apop_data *d, apop_data *dout){\n    int j, k;\n    if (d->vector)\n        for (j=0; j< d->vector->size; j++)\n            assert(dout->vector->data[j] == d->vector->data[j]);\n    if (d->weights)\n        for (j=0; j< d->weights->size; j++)\n            assert(dout->weights->data[j] == d->weights->data[j]);\n    if (d->matrix)\n        for (j=0; j< d->matrix->size1; j++)\n            for (k=0; k< d->matrix->size2; k++)\n                assert(gsl_matrix_get(d->matrix, j, k) == gsl_matrix_get(dout->matrix, j, k));\n}\n\nvoid apop_pack_test(gsl_rng *r){\n  int i, v, m1,m2, w;\n  apop_data *d, *dout, *p2, *outp2;\n  gsl_vector *mid;\n    for (i=0; i< 10; i++){\n        v   = gsl_rng_uniform(r) > 0.5 ? gsl_rng_uniform(r)*100 : 0;\n        m1  = gsl_rng_uniform(r)*100;\n        m2  = gsl_rng_uniform(r) > 0.5 ? gsl_rng_uniform(r)*100 : 0;\n        w   = gsl_rng_uniform(r) > 0.5 ? gsl_rng_uniform(r)*100 : 0;\n        if (!v && !w && (!m1 || !m2))\n            continue; //I actually get this unlucky draw.\n        d   = apop_data_alloc(v, m1, m2);\n        dout    = apop_data_alloc(v, m1, m2);\n        if (w) {d->weights = gsl_vector_alloc(w);\n                dout->weights = gsl_vector_alloc(w);}\n        fill_p(d, r);\n        int second_p = i %2;\n        if (second_p){\n            p2 = apop_data_add_page(d, apop_data_alloc(v, m1, m2), \"second p\");\n            outp2 = apop_data_add_page(dout, apop_data_alloc(v, m1, m2), \"second p\");\n            if (w) {p2->weights = gsl_vector_alloc(w);\n                    outp2->weights = gsl_vector_alloc(w);}\n            fill_p(p2, r);\n        }\n        mid     = apop_data_pack(d, .more_pages= second_p ? 'y' : 'n');\n        apop_data_unpack(mid, dout);\n        check_p(d, dout);\n        if (second_p)\n            check_p(d->more, dout->more);\n        if (mid) gsl_vector_free(mid); \n        apop_data_free(d); apop_data_free(dout);\n        }\n}\n\nvoid test_model_fix_parameters(gsl_rng *r){\n    size_t ct = 1000;\n    apop_data *d = apop_data_alloc(0,ct,2);\n    double draw[2];\n    apop_multivariate_normal->vsize =\n    apop_multivariate_normal->msize1 =\n    apop_multivariate_normal->msize2 = 2;\n    apop_model *pp = apop_model_set_parameters(apop_multivariate_normal,\n                                        8, 1, 0.5,\n                                        2, 0.5, 1);\n    for(int i=0; i< ct; i++){\n        apop_multivariate_normal->draw(draw, r, pp);\n        apop_data_set(d, i, 0, draw[0]);\n        apop_data_set(d, i, 1, draw[1]);\n    }\n\n    apop_data *pcopy = apop_data_copy(pp->parameters);\n    gsl_matrix_set_all(pp->parameters->matrix, GSL_NAN);\n    apop_model *mep1  = apop_model_fix_params(pp);\n    Apop_settings_add(mep1, apop_mle, starting_pt, ((double[]){1.5, .25, .25, 1.5}));\n    apop_model *e1 = apop_estimate(d, mep1);\n    gsl_vector_sub(e1->parameters->vector, pcopy->vector);\n    assert(apop_vector_sum(e1->parameters->vector) < 1e-1);\n    apop_model_free(e1);\n\n    double start2[] = {7,3};\n    pp->parameters = apop_data_copy(pcopy);\n    gsl_vector_set_all(pp->parameters->vector, GSL_NAN);\n    apop_model *mep2  = apop_model_fix_params(pp);\n    apop_model_free(pp);\n    Apop_settings_add(mep2, apop_mle, starting_pt, start2);\n    Apop_settings_add(mep2, apop_mle, method, \"PR cg\");\n\n    apop_model *e2 = apop_estimate(d, mep2);\n    apop_model_free(mep2);\n    gsl_matrix_sub(e2->parameters->matrix, pcopy->matrix);\n    assert(apop_matrix_sum(e2->parameters->matrix) < 1e-2);\n    apop_model_free(e2);\n}\n\nvoid test_linear_constraint(){\n  gsl_vector *beta      = gsl_vector_alloc(2);\n    gsl_vector_set(beta, 0, 7);\n    gsl_vector_set(beta, 1, 7);\n  apop_data *contrasts  = apop_data_calloc(1,1,2);\n    apop_data_set(contrasts, 0, 0, -1);\n    apop_data_set(contrasts, 0, 1, -1);\n    Diff (apop_linear_constraint(beta, contrasts, 0) , sqrt(2*49) , tol5);\n    assert(!apop_vector_sum(beta));\n    gsl_vector_set(beta, 0, 0);\n    gsl_vector_set(beta, 1, 7);\n    Diff (apop_linear_constraint(beta, contrasts, 0) , sqrt(49/2.) , tol5);\n    assert(!apop_vector_sum(beta));\n    assert(gsl_vector_get(beta,0)==-7/2.);\n    //inside corner: find the corner\n  gsl_vector *beta2     = gsl_vector_alloc(3);\n    gsl_vector_set(beta2, 0, 7);\n    gsl_vector_set(beta2, 1, 7);\n    gsl_vector_set(beta2, 2, 7);\n  apop_data *contrasts2  = apop_data_calloc(3,3,3);\n    apop_data_set(contrasts2, 0, 0, -1);\n    apop_data_set(contrasts2, 1, 1, -1);\n    apop_data_set(contrasts2, 2, 2, -1);\n    Diff (apop_linear_constraint(beta2, contrasts2, 0) , sqrt(3*49) , tol5);\n    assert(apop_vector_sum(beta2)==0);\n    //sharp corner: go to one wall.\n    gsl_vector_set(beta2, 0, 7);\n    gsl_vector_set(beta2, 1, 7);\n    gsl_vector_set(beta2, 2, 7);\n    apop_data_set(contrasts2, 0, 1, 1);\n    Diff(apop_linear_constraint(beta2, contrasts2, 0) , sqrt(2*49), tol5);\n    assert(gsl_vector_get(beta2,0)==7);\n    assert(gsl_vector_get(beta2,1)==0);\n    assert(gsl_vector_get(beta2,2)==0);\n}\n\nstatic void broken_est(apop_data *d, apop_model *m){\n    static gsl_rng *r; if (!r) r = apop_rng_alloc(1);\n    if (gsl_rng_uniform(r) < 1./100.) {\n        gsl_vector_set_all(m->parameters->vector, GSL_NAN);\n        return;\n    }\n    apop_normal->estimate(d, m);\n}\n\nstatic void super_broken_est(apop_data *d, apop_model *m){\n    static gsl_rng *r; if (!r) r = apop_rng_alloc(1);\n    if (gsl_rng_uniform(r) < 3./4.) {\n        gsl_vector_set_all(m->parameters->vector, GSL_NAN);\n        return;\n    }\n    apop_normal->estimate(d, m);\n}\n\n//In my inattention, I wrote two jackknife tests: this one and eg/jack.c. So you get double the checks.\nvoid test_jackknife(gsl_rng *r){\n    double pv[] = {3.09,2.8762};\n    int len = 2000;\n    apop_model *m = apop_model_copy(apop_normal);\n    apop_data *d = apop_data_alloc(0, len, 1);\n    apop_data *p = apop_data_alloc();\n    p->vector = apop_array_to_vector(pv, 2);\n    apop_model*pp = apop_model_copy(m);\n    pp->parameters = p;\n    for (size_t i =0; i< len; i++)\n        m->draw(apop_data_ptr(d, i, 0), r, pp); \n    apop_data *out = apop_jackknife_cov(d, m);\n    //Notice that the jackknife just ain't a great estimator here.\nassert ((fabs(apop_data_get(out, 0,0) - gsl_pow_2(pv[1])/len)) < tol2 \n            && fabs(apop_data_get(out, 1,1) - gsl_pow_2(pv[1])/(2*len)) < tol2*100);\n    apop_data *out2 = apop_bootstrap_cov(d, m, .keep_boots='y');\n    assert (fabs(apop_data_get(out2) - gsl_pow_2(pv[1])/len) < tol2\n                && fabs(apop_data_get(out2, 1,1) - gsl_pow_2(pv[1])/(2*len)) < tol2);\n    apop_data_free(out2);\n\n    //bootstrap should recover gracefully from a small number of NaNs...\n    m->estimate = broken_est;\n    out2 = apop_bootstrap_cov(d, m, .ignore_nans='y');\n    assert (fabs(apop_data_get(out2) - gsl_pow_2(pv[1])/len) < tol2\n                && fabs(apop_data_get(out2, 1,1) - gsl_pow_2(pv[1])/(2*len)) < tol2);\n\n\n    //...but not from a large number of NaNs.\n    int vvv= apop_opts.verbose;\n    apop_opts.verbose = -1;\n    m->estimate= super_broken_est;\n    apop_data *out3 = apop_bootstrap_cov(d, m, .ignore_nans='y');\n    assert(out3->error);\n    apop_opts.verbose = vvv;\n\n    apop_data_free(d);\n    apop_data_free(out);\n    apop_data_free(out2);\n    apop_data_free(out3);\n    apop_model_free(m);\n}\n\nvoid test_multivariate_normal(){\n    int len = 5e5;\n    double params[] = {1, 3, 0,\n                       2, 0, 1};\n    apop_data *p = apop_data_fill_base(apop_data_alloc(2, 2, 2), params);\n    apop_model *mv = apop_model_copy(apop_multivariate_normal);\n    mv->parameters=p;\n    mv->dsize=2;\n    apop_data *rdraws = apop_model_draws(mv, .count=len);\n    mv->parameters=NULL;\n    apop_model_free(mv);\n    apop_model *est =apop_estimate(rdraws, apop_multivariate_normal);\n    double error = fabs(est->parameters->vector->data[0] - p->vector->data[0])\n                  +fabs(est->parameters->vector->data[1] - p->vector->data[1])\n                  +fabs(est->parameters->matrix->data[0] - p->matrix->data[0])\n                  +fabs(est->parameters->matrix->data[1] - p->matrix->data[1])\n                  +fabs(est->parameters->matrix->data[2] - p->matrix->data[2])\n                  +fabs(est->parameters->matrix->data[3] - p->matrix->data[3]);\n    Diff(error, 0, 4e-2); //yes, unimpressive, but we don't wanna be here all day.\n    apop_model_free(est);\n    apop_data_free(rdraws);\n}\n\nstatic void common_binomial_bit(apop_model *out, int n, double p){\n    /*double phat = apop_data_get(out->parameters, 1,-1);\n    double nhat = apop_data_get(out->parameters, 0,-1);\n    if (verbose) printf(\"n: %i, p: %g, nhat: %g, phat: %g\\n\", n, p, phat, nhat);*/\n    assert(apop_data_get(out->parameters, 0,-1) == n);\n    assert(apop_data_get(out->parameters, 1,-1) - p < 1e-2);\n}\n\nvoid test_binomial(gsl_rng *r){\n    double p = gsl_rng_uniform(r);\n    int n     = gsl_ran_flat(r,1,1e5);\n    apop_data *d = apop_data_falloc((1,2), n*(1-p), n*p);\n    apop_model *out = apop_estimate(d, apop_binomial);\n    apop_model *outm = apop_estimate(d, apop_multinomial);\n    common_binomial_bit(out, n, p);\n    common_binomial_bit(outm, n, p);\n    apop_model_free(outm);\n    apop_data_free(d);\n    apop_model_free(out);\n}\n\nvoid test_rownames(){\n    apop_data *d = apop_data_falloc((2, 2), 0, 1, \n                                            2, 3);\n    apop_data_add_names(d, 'r', \"zero\", \"one\");\n    apop_data_add_names(d, 'c', \"C zero\", \"C one\");\n    assert(apop_data_get(d, .rowname=\"zero\", .col=0) == 0);\n    assert(apop_data_get(d, .rowname=\"zero\", .colname=\"C zero\") == 0);\n    assert(apop_data_get(d, .rowname=\"one\", .col=0) == 2);\n\n    double *oneone = apop_data_ptr(d, .rowname=\"one\", .col=1);\n    *oneone= 27;\n    assert(apop_data_get(d, .rowname=\"one\", .colname=\"C one\")==27);\n\n    apop_data_set(d, .rowname=\"one\", .colname=\"C zero\", .val=33);\n    double *onezero = apop_data_ptr(d, .rowname=\"one\", .col=0);\n    assert(*onezero == 33);\n\n    apop_data_set(d, .rowname=\"zero\", .col=1, .val=10);\n    double *zeroone = apop_data_ptr(d, .rowname=\"zero\", .colname=\"C one\");\n    assert(*zeroone == 10);\n}\n\nint get_factor_index(apop_data *flist, char *findme){\n    for (int i=0; i< flist->textsize[0]; i++)\n        if (apop_strcmp(flist->text[i][0], findme))\n            return i;\n    return -2;\n}\n\n//If the dummies are a separate matrix, offset=0;\n//If the dummies are an addendum to main, offset=original_data->matrix->size2;\nstatic void check_for_dummies(apop_data *d, apop_data *dum, int offset){\n    int n;\n    apop_data *factorlist = apop_data_get_factor_names(d, 0, 't');\n    for(int i=0; i < d->textsize[0]; i ++)\n        if ((n = get_factor_index(factorlist, d->text[i][0]))>0){\n            for(int j=0; j < factorlist->textsize[0]-1; j ++)\n                if (j==n-1)\n                    assert(apop_data_get(dum, i, j+offset));\n                else\n                    assert(!apop_data_get(dum, i, j+offset));\n        } else\n            for(int j=0; j < factorlist->textsize[0]-1; j ++)\n                assert(!apop_data_get(dum, i, j+offset));\n}\n\nvoid dummies_and_factors(){\n    apop_text_to_db( DATADIR \"/\" \"data-mixed\" , \"genes\");\n    apop_data *d = apop_query_to_mixed_data(\"mmmt\", \"select aa, bb, 1, a_allele from genes\");\n    apop_data *dum = apop_data_to_dummies(d, 0, 't', 0);\n    check_for_dummies(d, dum, 0);\n    apop_data_to_factors(d, 't', 0, 2);\n    for(int i=0; i < d->textsize[0]; i ++) //the set is only As and Cs.\n        if (!strcmp(d->text[i][0], \"A\"))\n            assert(apop_data_get(d, i, 2) == 0);\n        else\n            assert(apop_data_get(d, i, 2) == 1);\n    //test combination routines\n    apop_data *d2 = apop_query_to_mixed_data(\"mmmt\", \"select aa, bb, 1, a_allele from genes\");\n    apop_data_to_dummies(d2, 0,  't', .append='y');\n    check_for_dummies(d2, d2, 3);\n}\n\nvoid test_vector_moving_average(){\n  int   i;\n  gsl_vector *v = apop_vector_realloc(NULL, 100); //using realloc as an alloc\n    for(i=0; i < 100; i ++)\n        gsl_vector_set(v, i, i);\n    gsl_vector *unsmooth = apop_vector_moving_average(v, 1);\n    for(i=0; i < 100; i ++)\n        assert(gsl_vector_get(v, i) == gsl_vector_get(unsmooth, i));\n    gsl_vector *slightly_smooth = apop_vector_moving_average(v, 2);\n    //With evenly-spaced data, a moving average returns the original,\n    //with tails missing:\n    for(i=0; i < 98; i ++)\n        assert(gsl_vector_get(v, i+1) == gsl_vector_get(slightly_smooth, i));\n}\n\nvoid test_transpose(){\n    apop_data *t = apop_text_to_data( DATADIR \"/\" \"test_data\" , 0, 1);\n    apop_data *tt = apop_data_transpose(t, .inplace='n');\n    assert(apop_data_get(tt, 0, 3) == 9);\n    assert(apop_data_get(tt, 1, 0) == 4);\n    assert(!strcmp(tt->names->row[2], \"c\"));\n    assert(!strcmp(tt->names->row[3], \"d\"));\n    assert(!tt->names->colct);\n    apop_data_transpose(t);\n    assert(apop_data_get(t, 0, 3) == 9);\n    assert(apop_data_get(t, 1, 0) == 4);\n    assert(!strcmp(t->names->row[2], \"c\"));\n    assert(!strcmp(t->names->row[3], \"d\"));\n    assert(!t->names->colct);\n}\n\napop_data *generate_probit_logit_sample (gsl_vector* true_params, gsl_rng *r, apop_model *method){\n  int i, j;\n  double val;\n  int samples = 8e4;\n  apop_data *data = apop_data_alloc(samples, true_params->size);\n        //generate a random vector of data X, then set the outcome to one if the probit/logit condition holds\n        for (i = 0; i < samples; i++){\n            apop_data_set(data, i, 0, 1);\n            for (j = 1; j < true_params->size; j++)\n                apop_data_set(data, i, j, (gsl_rng_uniform(r)-0.5) *2);\n            Apop_row_v(data, i, asample);\n            gsl_blas_ddot(asample, true_params, &val);\n            if (method == apop_probit)\n                apop_data_set(data, i, 0, (gsl_ran_gaussian(r, 1) > -val));\n            else   //Logit:   p(act) = e(xb) / (1+e(xb));\n                apop_data_set(data, i, 0, gsl_rng_uniform(r) < exp(val)/(1+exp(val)));\n        }\n    return data;\n}\n\nvoid test_unique_elements(){\n    double d[] = {0, -3, 1.2, 2.4, -2, -3, 0.1, -0.1, 1.2, -2};\n    gsl_vector *dv = apop_array_to_vector(d, sizeof(d)/sizeof(double));\n    gsl_vector *distinct = apop_vector_unique_elements(dv);\n    assert(distinct->size == 7);\n    assert(gsl_vector_get(distinct, 2) == -.1);\n    assert(gsl_vector_get(distinct, 3) == 0);\n    assert(gsl_vector_get(distinct, 4) == .1);\n\n    apop_data *t = apop_text_alloc(NULL, 9, 7);\n    apop_text_set(t, 0, 0, \"Hi,\");\n    apop_text_set(t, 1, 0, \"there\");\n    apop_text_set(t, 2, 0, \".\");\n    apop_text_set(t, 3, 0, \"This\");\n    apop_text_set(t, 4, 0, \"there\");\n    apop_text_set(t, 5, 0, \"is\");\n    apop_text_set(t, 6, 0, \"dummy\");\n    apop_text_set(t, 7, 0, \"text\");\n    apop_text_set(t, 8, 0, \".\");\n    apop_data *dt = apop_text_unique_elements(t, 0);\n    assert(dt->textsize[0] == 7);\n    assert(!strcmp(\".\", dt->text[0][0]));\n    assert(!strcmp(\"Hi,\", dt->text[1][0]));\n    assert(!strcmp(\"text\", dt->text[5][0]));\n}\n\nvoid test_probit_and_logit(gsl_rng *r){\n    int param_ct = gsl_rng_uniform(r)*7 + 1; //up to seven params.\n    gsl_vector *true_params = gsl_vector_alloc(param_ct);\n    for (int j = 0; j < param_ct; j++)\n        gsl_vector_set(true_params, j, (gsl_rng_uniform(r)-0.5)*2);\n\n    //Logit\n    apop_data* data = generate_probit_logit_sample(true_params, r, apop_logit);\n    Apop_model_add_group(apop_logit, apop_mle, .tolerance=1e-5);\n    Apop_model_add_group(apop_logit, apop_parts_wanted);\n    apop_model *m = apop_estimate(data, apop_logit);\n    assert(apop_vector_distance(Apop_cv(m->parameters, 0), true_params) < 0.07);\n    apop_data_free(data);\n    apop_model_free(m);\n\n    //Probit\n    apop_data* data2 = generate_probit_logit_sample(true_params, r, apop_probit);\n    Apop_model_add_group(apop_probit, apop_mle);\n    Apop_model_add_group(apop_logit, apop_parts_wanted);\n    m = apop_estimate(data2, apop_probit);\n    assert(apop_vector_distance(Apop_cv(m->parameters, 0), true_params) < 0.07);\n    gsl_vector_free(true_params);\n    apop_model_free(m);\n    apop_data_free(data2);\n}\n\nvoid test_resize(){\n    //This is the multiplication table from _Modeling with Data_\n    //with a +.1 to distinguish columns from rows.\n    gsl_matrix *m = apop_matrix_realloc(NULL, 20,15);//check using realloc as an alloc\n    gsl_matrix_set_all(m, 1);\n    for (int i=0; i< m->size1; i++){\n        Apop_matrix_row(m, i, one_row);\n        gsl_vector_scale(one_row, i+1);\n    }\n    for (int i=0; i< m->size2; i++){\n        Apop_matrix_col(m, i, one_col);\n        gsl_vector_scale(one_col, i+1);\n        gsl_vector_add_constant(one_col, (i+1)/10.);\n    }\n    apop_matrix_realloc(m, 11, 17);\n    assert(gsl_matrix_get(m, 3, 5) == 4*6+.6);\n    apop_matrix_realloc(m, 10, 10);\n    Diff (apop_matrix_sum(m) , 55 * 56 , tol6);\n    gsl_vector *v = gsl_vector_alloc(20);\n    for (int i=0; i< 20; i++)\n        gsl_vector_set(v, i, i);\n    apop_vector_realloc(v, 38);\n    for (int i=0; i< 20; i++)\n        assert(gsl_vector_get(v, i) == i);\n    apop_vector_realloc(v, 10);\n    assert(apop_vector_sum(v) == 45);\n}\n\nvoid test_mvn_gamma(){\n    assert(apop_multivariate_gamma(10, 1)==gsl_sf_gamma(10));\n    assert(apop_multivariate_lngamma(10, 1)==gsl_sf_lngamma(10));\n}\n\nvoid test_default_rng(gsl_rng *r) {\n    gsl_vector *o = gsl_vector_alloc(2e5);\n    apop_model *ncut = apop_model_set_parameters(apop_normal, 1.1, 1.23);\n    ncut->draw = NULL; //forced to use the default.\n    for(size_t i=0; i < 2e5; i ++)\n        apop_draw(o->data+i, r, ncut);\n    apop_model *back_out = apop_estimate(&(apop_data){.vector=o}, apop_normal);\n    Diff(back_out->parameters->vector->data[0] , 1.1 , tol2);\n    Diff(back_out->parameters->vector->data[1] , 1.23 , tol2);\n    gsl_vector_free(o);\n}\n\ndouble ran_uniform(double in, void *r){ return gsl_rng_uniform(r);}\ndouble negate(double in){ return -in;}\n\nvoid test_posdef(gsl_rng *r){\n    for(size_t j=0; j < 30; j ++){\n        int size = gsl_rng_uniform(r) *10+1;\n        apop_data *d = apop_data_alloc(size, size);\n        apop_map(d, .fn_dp=ran_uniform, .param=r, .inplace='y', .part='m');\n        apop_matrix_to_positive_semidefinite(d->matrix); \n        assert(apop_matrix_is_positive_semidefinite(d->matrix));\n\n        //start over, from the negation of where you just were \n        //(guaranteed neg definite, no?)\n        gsl_matrix * neg = apop_matrix_map_all(d->matrix, negate);\n        apop_matrix_to_positive_semidefinite(neg);\n        assert(apop_matrix_is_positive_semidefinite(neg));\n        gsl_matrix_free(neg);\n        apop_data_free(d);\n    }\n}\n\nstatic double set_to_index(double in, int index){ return index;}\nstatic double is_even(double in){ return !((int)in%2);}\nstatic double is_odd(double in){ return (int)in%2;}\nstatic double nan_even(double in){ return is_even(in) ? GSL_NAN : in; }\n\nvoid row_manipulations(){\n    apop_data *test= apop_data_alloc(10);\n    apop_map(test, .fn_di=set_to_index, .part='v', .inplace='y');\n    int rm[10] = {0,1,0,1,0,1,0,1,0,1};\n    apop_data_rm_rows(test, rm);\n    assert (!apop_map_sum(test, .fn_d=is_odd, .part='v'));\n\n    apop_data *test2= apop_data_alloc(10,0,0);\n    apop_map(test2, .fn_di=set_to_index, .part='v', .inplace='y');\n    apop_map(test2, .fn_d=nan_even, .part='v', .inplace='y');\n    apop_data_listwise_delete(test2, 'y');\n    assert (5== apop_map_sum(test2, .fn_d=is_odd, .part='v'));\n    assert (!apop_map_sum(test2, .fn_d=is_even, .part='v'));\n}\n\n\nvoid test_pmf(){\n    double x[] = {0, 0.2, 0 , 0.4, 1, .7, 0 , 0, 0};\n    gsl_rng *r = apop_rng_alloc(1234);\n    apop_data *d = apop_data_alloc();\n    d->weights = apop_array_to_vector(x, 9);\n    apop_model *mc = apop_model_copy(apop_pmf);\n    Apop_model_add_group(mc, apop_pmf, .draw_index= 'y');\n    mc->dsize=0;\n    apop_model *m = apop_estimate(d, mc);\n    gsl_vector *v = gsl_vector_calloc(d->weights->size);\n    for (size_t i=0; i< 1e5; i++){\n        double out;\n        apop_draw(&out, r, m);\n        (*gsl_vector_ptr(v, out))++;\n    }\n    apop_vector_normalize(d->weights);\n    apop_vector_normalize(v);\n    for (size_t i=0; i < v->size; i ++)\n        Diff(d->weights->data[i], v->data[i], 1e-2);\n    apop_model_free(m);\n    apop_data_free(d);\n    gsl_vector_free(v);\n}\n\nvoid test_arms(gsl_rng *r){\n    gsl_vector *o = gsl_vector_alloc(3e5);\n    apop_model *ncut = apop_model_set_parameters(apop_normal, 1.1, 1.23);\n    ncut->draw = NULL; //testing the default.\n    for(size_t i=0; i < 3e5; i ++)\n        apop_draw(o->data+i, r, ncut);\n    apop_data *ov = &(apop_data){.vector=o};\n    apop_model *back_out = apop_estimate(ov, apop_normal);\n    Diff(back_out->parameters->vector->data[0] , 1.1, 1e-2)\n    Diff(back_out->parameters->vector->data[1] , 1.23, 1e-2)\n\n    apop_opts.verbose ++;\n    apop_model *bcut = apop_model_set_parameters(apop_beta, 0.4, 0.43); //bimodal\n    Apop_model_add_group(bcut, apop_arms, .model=bcut, .xl=1e-5, .xr=1-1e-5);\n    bcut->draw = NULL; //testing the default.\n    for(size_t i=0; i < 3e5; i ++)\n        apop_draw((o->data)+i, r, bcut);\n    ov = &(apop_data){.vector=o};\n    apop_model *back_outb = apop_estimate(ov, apop_beta);\n    gsl_vector_free(o);\n    Diff(back_outb->parameters->vector->data[0] , 0.4, 1e-2)\n    Diff(back_outb->parameters->vector->data[1] , 0.43, 1e-2)\n    apop_opts.verbose --;\n    apop_model *test_copying = apop_model_copy(back_outb);\n    apop_model_free(ncut);\n    apop_model_free(back_out);\n    apop_model_free(back_outb);\n    apop_model_free(test_copying);\n}\n\nvoid test_pmf_compress(gsl_rng *r){\n    apop_data *d = apop_data_alloc();\n    apop_text_alloc(d, 9, 1);\n    d->vector = apop_array_to_vector((double []){12., 1., 2., 2., 1., 1., 2., 2., NAN}, 9);\n    apop_text_fill(d, \"Dozen\", \"Single\", \"Pair\", \"Pair\",\n                      \"Single\", \"Single\", \"Pair\", \"Pair\",\n                      \"Nada\");\n\n\n    apop_data_pmf_compress(d);\n\n    assert(d->vector->data[0]==12);\n    assert(d->vector->data[1]==1);\n    assert(d->vector->data[2]==2);\n    assert(gsl_isnan(d->vector->data[3]));\n    assert(d->weights->data[0]==1);\n    assert(d->weights->data[1]==3);\n    assert(d->weights->data[2]==4);\n    assert(d->weights->data[3]==1);\n    assert(apop_strcmp(d->text[0][0], \"Dozen\"));\n    assert(apop_strcmp(d->text[1][0], \"Single\"));\n    assert(apop_strcmp(d->text[2][0], \"Pair\"));\n    assert(apop_strcmp(d->text[3][0], \"Nada\"));\n\n    apop_data *b = apop_data_alloc();\n    b->vector = apop_array_to_vector((double []){1.1, 2.1, 2, 1, 1}, 5);\n    apop_data *spec = apop_data_copy(Apop_r(b, 0));\n    gsl_vector_set_all(spec->vector, 1);\n    apop_data *c = apop_data_to_bins(b, .binspec=spec);\n    apop_data_free(b);\n    gsl_vector *should_be = apop_data_falloc((5), 1, 2, 2, 1, 1)->vector;\n    assert(!apop_vector_distance(should_be, c->vector, 'd'));\n    apop_data_free(c);\n\n    //I assert that if I use the default binspec returned by a call to apop_data_to_bins,\n    //then re-binning with the binspec explicitly stated will give identical results.\n    int dcount = 10000;\n    apop_data *draws = apop_data_alloc(dcount);\n    apop_model *norm = apop_model_set_parameters(apop_normal, 0, 1);\n    for (int i=0; i<dcount; i++)\n        apop_draw(draws->vector->data+i, r, norm);\n    apop_data_sort(draws);\n    apop_data *drawcopy = apop_data_copy(draws);\n    apop_data *binned = apop_data_to_bins(draws);\n    apop_data *binnedc = apop_data_to_bins(drawcopy, .binspec=apop_data_get_page(draws, \"<binspec>\"), .close_top_bin='y');\n    for (int i=0; i< binned->vector->size; i++)\n        assert(binned->vector->data[i] == binnedc->vector->data[i]);\n}\n\nvoid test_vtables(){\n    //run an updating to make sure that the vtable has been generated.\n    apop_model *n = apop_model_set_parameters(apop_normal, 0, 1);\n    apop_data *d = apop_model_draws(n);\n    apop_model *out = apop_update(d, n, apop_normal);\n    //did it use the vtable to see this is a Normal distribution?\n    assert(out->log_likelihood == apop_normal->log_likelihood);\n    //updating a distribution with data from itself:\n    Diff(apop_data_get(n->parameters), apop_data_get(out->parameters), 1e-1);\n\n    assert(apop_update_vtable_drop(apop_normal, apop_normal)==0);\n    assert(apop_update_vtable_drop(apop_beta, apop_binomial)==0);\n}\n\nvoid test_weighted_regression(apop_data *d, apop_model *e){\n    //pretty rudimentary: set all weights to equal and see if we get the same result.\n    apop_data *cp = apop_data_copy(d);\n    cp->weights = gsl_vector_alloc(d->matrix->size1);\n    gsl_vector_set_all(cp->weights, 3);\n    apop_model *e2 = apop_estimate(cp, apop_ols);\n    assert( apop_vector_distance(e2->parameters->vector, e->parameters->vector) < tol5);\n}\n\nvoid test_ols_offset(gsl_rng *r){\n    //A thing we know about OLS: an offset in the variables should make almost no\n    //difference. Here we fit OLS with data that is of the form Y = 3*Y+eps\n    //and then fit with the same data but offset: Y=3*(Y+20)+eps\n    //The coefficient of the constant term moves to absorb the shift;\n    //The p-value for that coefficient changes;\n    //everything else should remain constant.\n    int i, size1 = 1000;\n    apop_data *useme = apop_data_alloc(size1, 2);\n    for (i =0; i< size1; i++){\n        apop_data_set(useme, i, 0, i);\n        apop_data_set(useme, i, 1, 3*i+gsl_ran_gaussian(r, 2));\n    }\n    apop_data *cp = apop_data_copy(useme);\n    apop_model *zero_off = apop_estimate(useme, apop_ols);\n    gsl_vector_add_constant(Apop_cv(cp, 1), 20);\n    apop_model *way_off = apop_estimate(cp, apop_ols);\n    assert(apop_vector_distance(zero_off->info->vector, way_off->info->vector) < 1e-4);\n    gsl_vector *zcov = apop_data_pack(apop_data_get_page(zero_off->parameters, \"<covariance>\"), .use_info_pages='y');\n    gsl_vector *wcov = apop_data_pack(apop_data_get_page( way_off->parameters, \"<covariance>\"), .use_info_pages='y');\n    assert(apop_vector_distance(zcov, wcov) < 1e-4);\n    assert(apop_data_get(zero_off->parameters, 0, -1) - (apop_data_get(way_off->parameters, 0, -1)+20./3)  < 5e-3);\n    assert(apop_data_get(zero_off->parameters, 1, -1) - apop_data_get(way_off->parameters, 1, -1)  < 1e-4);\n    apop_model_free(zero_off);\n    apop_model_free(way_off);\n    apop_data_free(cp);\n    apop_data_free(useme);\n}\n\n#define do_test(text, fn) {if (verbose) printf(\"%s:\", text); \\\n                          fflush(NULL);                      \\\n                          fn;                                \\\n                          if (verbose) printf(\"\\nPASS.  \");} \n\nint main(int argc, char **argv){\n    int  slow_tests = 0;\n#ifdef _OPENMP\n    if (omp_get_num_procs()==1) omp_set_num_threads(2); //always at least 2 threads.\n#endif\n    int c;\n    char opts[]  = \"sqt:\";\n    if (argc==1)\n        printf(\"Sundry tests for Apophenia.\\nRunning relatively faster tests.  To run slower tests (primarily simulated annealing), use -s.\\nFor quieter output, use -q. To change thread count (default=2), use -t1, -t2, -t3, ...\\n\");\n    while((c = getopt(argc, argv, opts))!=-1)\n        if (c == 's')       slow_tests++;\n        else if (c == 'q')  verbose--;\n#ifdef _OPENMP\n        else if (c == 't')  omp_set_num_threads(atoi(optarg));\n#endif\n\n    //set up some global or common variables\n    gsl_rng *r = apop_rng_alloc(8); \n    apop_data *d = apop_text_to_data( DATADIR \"/\" \"test_data2\" ,0,1);\n    apop_model *an_ols_model = apop_model_copy(apop_ols);\n    Apop_model_add_group(an_ols_model, apop_lm, .want_expected_value= 1);\n    apop_model *e  = apop_estimate(d, an_ols_model);\n\n    do_test(\"vtables\", test_vtables());\n    do_test(\"test listwise delete\", test_listwise_delete());\n    do_test(\"rownames\", test_rownames());\n    do_test(\"apop_dot\", test_dot());\n    do_test(\"apop_jackknife\", test_jackknife(r));\n    do_test(\"test multivariate_normal\", test_multivariate_normal());\n    do_test(\"log and exponent\", log_and_exp(r));\n    do_test(\"split and stack test\", test_split_and_stack(r));\n    do_test(\"test probit and logit\", test_probit_and_logit(r));\n    do_test(\"test probit and logit again\", test_probit_and_logit(r));\n    do_test(\"test data compressing\", test_pmf_compress(r));\n    do_test(\"weighted regression\", test_weighted_regression(d,e));\n    do_test(\"offset OLS\", test_ols_offset(r));\n    do_test(\"default RNG\", test_default_rng(r));\n    do_test(\"test row set and remove\", row_manipulations());\n    do_test(\"test PMF\", test_pmf());\n    do_test(\"apop_pack/unpack test\", apop_pack_test(r));\n    do_test(\"test adaptive rejection sampling\", test_arms(r));\n    //do_test(\"test fix params\", test_model_fix_parameters(r));\n    do_test(\"positive definiteness\", test_posdef(r));\n    do_test(\"test binomial estimations\", test_binomial(r));\n    do_test(\"dummies and factors\", dummies_and_factors());\n    do_test(\"test vector/matrix realloc\", test_resize());\n    do_test(\"test_vector_moving_average\", test_vector_moving_average());\n    do_test(\"apop_estimate->dependent test\", test_predicted_and_residual(e));\n    do_test(\"OLS test\", test_OLS(r));\n    do_test(\"database skew, kurtosis, normalization\", test_skew_and_kurt(r));\n    do_test(\"test_percentiles\", test_percentiles());\n    do_test(\"weighted moments\", test_weigted_moments());\n    do_test(\"multivariate gamma\", test_mvn_gamma());\n    do_test(\"Inversion\", test_inversion(r));\n    do_test(\"apop_matrix_summarize\", test_summarize());\n    do_test(\"apop_linear_constraint\", test_linear_constraint());\n    do_test(\"transposition\", test_transpose());\n    do_test(\"test unique elements\", test_unique_elements());\n    if (slow_tests){\n        if (verbose) printf(\"\\tSlower tests:\\n\");\n        do_test(\"Test score (dlog likelihood) calculation\", test_score());\n    }\n    printf(\"\\nApophenia has passed all of the sundry tests. Yay.\\n\");\n    apop_db_close();\n}\n"
  },
  {
    "path": "tests/test_data",
    "content": "a, \"b\",c,\"d\"\n2,4,6,8\n3,7,1,6\n3,1,1,7\n9,0,8,1\n"
  },
  {
    "path": "tests/test_data2",
    "content": "visits, date\n12, 0\n18, 1\n20, 2\n18, 3\n11, 4\n11, 5\n14, 6\n17, 7\n11, 8\n10, 9\n15, 10\n13, 11\n13, 12\n20, 13\n24, 14\n19, 15\n22, 16\n25, 17\n17, 19\n22, 20\n11, 21\n13, 22\n17, 23\n20, 24\n16, 25\n16, 26\n25, 27\n31, 28\n15, 29\n17, 30\n10, 32\n13, 33\n18, 34\n17, 35\n12, 36\n23, 37\n11, 41\n16, 42\n18, 43\n21, 44\n26, 45\n17, 46\n22, 48\n25, 49\n14, 50\n19, 51\n13, 52\n11, 53\n33, 54\n35, 55\n20, 56\n30, 57\n16, 58\n28, 59\n19, 60\n"
  },
  {
    "path": "tests/test_data_fixed_width",
    "content": "123A#C3.14159\n-21 BC2.71828\n"
  },
  {
    "path": "tests/test_data_nans",
    "content": "a,b,c,d, head\n2,4,6,8, first\n3,7,nan,6, second\n3,\"NaN\",1,7, third\n9,0,8,1, fourth\n"
  },
  {
    "path": "tests/test_kernel_ll.c",
    "content": "/* The kernel log likelihood is somewhat convoluted, to retain numeric precision. This\ntest compares the result to the direct approach, in the case of a few data sets\nthat are small enough that we don't have to worry about underflow.\n*/\n#include <apop.h>\n\nlong double sum_of_parts(apop_data *d1, apop_data *target){\n    apop_data *row, *trow;\n    long double p=1;\n    for(int i=0; (trow=Apop_r(target, i)) && trow->vector; i++){\n        long double tprob=0;\n        for(int j=0; (row=Apop_r(d1, j)) && row->vector; j++)\n            tprob+= apop_p(trow, apop_model_set_parameters(apop_normal, *row->vector->data, 1));\n        p *=tprob/d1->vector->size;\n    }\n    return p;\n}\n\nvoid go(apop_data *d1, apop_data *d2){\n    apop_model *k = apop_model_set_settings(apop_kernel_density, .base_data=d1);\n    assert(fabs(apop_p(d2, k)- sum_of_parts(d1, d2)) < 1e-5);\n\n    apop_model *test_copying = apop_model_copy(k);\n    assert(fabs(apop_p(d2, test_copying)- sum_of_parts(d1, d2)) < 1e-8);\n    apop_model_free(k);\n    apop_model_free(test_copying);\n}\n\nint main(){\n    apop_data *d1= apop_data_falloc((4), 2,4,6,8);\n    go(d1, apop_data_falloc((4), 1,3,5,7));\n    go(d1, apop_data_falloc((4), 1,1,1,1));\n\n    apop_data *d2= apop_data_falloc((4, 4, 1), 2,1.1, 4,2.2, 6,3.1, 8,0);\n    go(d2, apop_data_falloc((4, 4, 1), 1, 0, 3,0, 5, 0, 7, 0));\n}\n"
  },
  {
    "path": "tests/update_via_rng.c",
    "content": "/* Test three methods for the entries in the conjugate prior table:\n\n--using the conjugate priors\n--using Metropolis-Hastings\n--using draws from the prior\n\nThere are many complications.\n\nSelecting the method:\nThe second method happens if the lookup in the update vtable fails, which is why the likelihood gets overwritten with fake_ll.\n\nThe third method happens when there is no p or log_likelihood at all.\n\nComparing to the `truth':\nThe parameters from updating on the c.p. table are very sharp---it is unlikely that\na search in the time tolerable for a unit test will get anywhere near them. However,\nthe CDF should still be about the same regardless of method. The deciles function goes through many slices of the CDF (more than ten as of this writing), and checks that the M-H CDF or the draw-based CDF are within 9% of the c.p. CDF.\n\nGenerating something for the CDF:\nThe apop_pmf method is limited at the momemt: if a point is not in the CMF, then\napop_cdf(intermeidate_point, the_pmf) returns zero. The rationale is that the data is\nnot necessarily ordered, but that seriously needs to be revised. In the mean time, I estimate a parameterized distribution from the PMF, and use that for the deciles tests.\n\nEstimating from a weighted CMF:\nNone of the estimation routines use the weights in a data set. So, the\napop_data_pmf_expand function replicates the data according to the weights.\n*/\n#include <apop.h>\n#include <assert.h>\n\nlong double fake_ll (apop_data *d, apop_model *m){\n    return ((apop_model*)(m->more))->log_likelihood(d, m);\n}\n\napop_data *apop_data_pmf_expand(apop_data *in, int factor){\n    apop_data *expanded = apop_data_alloc();\n    apop_vector_normalize(in->weights);\n    for (int i=0; i< in->weights->size;i++){\n        int wt = gsl_vector_get(in->weights, i)* factor;\n        if (wt){\n            apop_data *next = apop_data_alloc(wt);\n            gsl_vector_set_all(next->vector, apop_data_get(in, i));\n            apop_data_stack(expanded, next, .inplace='y');\n        }\n    }\n    if (expanded->vector) return expanded;\n    else return NULL;\n}\n\nvoid deciles(apop_model *m1, apop_model *m2, double max){\n    double width = 30;\n    for (double i=0; i< max; i+=1/width){\n        apop_data *x = apop_data_falloc((1), i);\n        double L = apop_cdf(x, m1);\n        double R = apop_cdf(x, m2);\n        assert(fabs(L-R) < 0.18); //wide, I know.\n    }\n}\n\nvoid betabinom(){\n    apop_model *beta = apop_model_set_parameters(apop_beta, 10, 5);\n\n    apop_model *drawfrom = apop_model_copy(apop_multinomial);\n    drawfrom->parameters = apop_data_falloc((2), 30, .4);\n    drawfrom->dsize = 2;\n    int draw_ct = 80;\n    apop_data *draws = apop_model_draws(drawfrom, draw_ct);\n\n    apop_model *betaup = apop_update(draws, beta, apop_binomial);\n    apop_model_show(betaup);\n\n    beta->more = apop_beta;\n    beta->log_likelihood = fake_ll;\n    apop_model *bi = apop_model_fix_params(apop_model_set_parameters(apop_binomial, 30, NAN));\n    apop_model *upd = apop_update(draws, beta, bi);\n    apop_model *betaed = apop_estimate(upd->data, apop_beta);\n    deciles(betaed, betaup, 1);\n\n    //now via a non-MCMC updating routine: draw from fixed prior, evaluate using likelihood.\n    beta->log_likelihood = NULL;\n    apop_model *upd_r = apop_update(draws, beta, bi);\n    betaed = apop_estimate(apop_data_pmf_expand(upd_r->data, 2000), apop_beta);\n    deciles(betaed, betaup, 1);\n}\n\nvoid gammafish(){\n    printf(\"gamma/poisson\\n\");\n    apop_model *gamma = apop_model_set_parameters(apop_gamma, 1.5, 2.2);\n\n    apop_model *drawfrom = apop_model_set_parameters(apop_poisson, 3.1);\n    int draw_ct = 90;\n    apop_data *draws = apop_model_draws(drawfrom, draw_ct);\n\n    apop_model *gammaup = apop_update(draws, gamma, apop_poisson);\n    apop_model_show(gammaup);\n\n    gamma->more = apop_gamma;\n    gamma->log_likelihood = fake_ll;\n    Apop_settings_add_group(gamma, apop_mcmc, .burnin=.1, .periods=1e4);\n    apop_model *upd = apop_update(draws, gamma, apop_poisson);\n    apop_model *gammafied = apop_estimate(upd->data, apop_gamma);\n    deciles(gammafied, gammaup, 5);\n\n    gamma->log_likelihood = NULL;\n    apop_model *upd_r = apop_update(draws, gamma, apop_poisson);\n    apop_model *gammafied2 = apop_estimate(apop_data_pmf_expand(upd_r->data, 2000), apop_gamma);\n    deciles(gammafied2, gammaup, 5);\n    deciles(gammafied, gammafied2, 5);\n}\n\nvoid make_draws(){\n    apop_model *multinom = apop_model_copy(apop_multivariate_normal);\n    multinom->parameters = apop_data_falloc((2, 2, 2), \n                                        1,  1, .1,\n                                        8, .1,  1);\n    multinom->dsize = 2;\n\n    apop_model *d1 = apop_estimate(apop_model_draws(multinom), apop_multivariate_normal);\n    for (int i=0; i< 2; i++)\n        for (int j=-1; j< 2; j++)\n            assert(fabs(apop_data_get(multinom->parameters, i, j)\n                    - apop_data_get(d1->parameters, i, j)) < .25);\n    multinom->draw = NULL; //so draw via MCMC\n    apop_model *d2 = apop_estimate(apop_model_draws(multinom, 10000), apop_multivariate_normal);\n    for (int i=0; i< 2; i++)\n        for (int j=-1; j< 2; j++)\n            assert(fabs(apop_data_get(multinom->parameters, i, j)\n                    - apop_data_get(d2->parameters, i, j)) < .25);\n}\n\nint main(){\n    make_draws();\n    betabinom();\n    gammafish();\n}\n"
  },
  {
    "path": "tests/utilities_test.in",
    "content": "#!/bin/sh\n\nAPOP_DATA_DIR=@abs_top_srcdir@/tests\nAPOP_CMD_DIR=@top_builddir@/cmd\n\nBC=@BC@\nSQLITE3=@SQLITE3@\n\nAPOP_PLOT_QUERY=${APOP_CMD_DIR}/apop_plot_query\nAPOP_TEXT_TO_DB=${APOP_CMD_DIR}/apop_text_to_db\nAPOP_DB_TO_CROSSTAB=${APOP_CMD_DIR}/apop_db_to_crosstab\n\nDiff(){\nreturn `echo 'out=0\n    if (('\"$1 - $2\"')^2 < 1/1000) out=1\n    print out' | bc -l`\n}\n\nread_faith(){\n    rm -f ff.db\n    if ! $APOP_TEXT_TO_DB ${APOP_DATA_DIR}/faith.data faith ff.db ;\n    then return 1;\n    elif ! sed '1,/id/d'  ${APOP_DATA_DIR}/faith.data |  $APOP_TEXT_TO_DB -N\"number,len,delay\" - faith2 ff.db\n    then return 2;\n    elif Diff `$APOP_PLOT_QUERY -n ff.db \"select avg(eruptions) - avg(len)\n                from faith, faith2 where id==number\" | sed '/avg/d'` 0\n    then return 3;\n    else return 0;\n    fi\n}\n\ncrosstab(){\n    if [ ! `$APOP_DB_TO_CROSSTAB -eo ff.db \"faith group by round(eruptions), waiting\" waiting 'round(eruptions)' 'count(id)'|sed -n '/70/p' | cut -f 4` \\\n        -eq `$SQLITE3 ff.db \"select count(*) from faith where round(eruptions)=4 and waiting=70\"` ]\n        then echo crosstabbing failed.; return 1;\n    elif [ ! `$APOP_DB_TO_CROSSTAB -d'|' ff.db \"faith group by round(eruptions), waiting\" waiting 'round(eruptions)' 'count(id)'|sed -n '/70/p' | cut -d'|' -f 4` \\\n        -eq `$SQLITE3 ff.db \"select count(*) from faith where round(eruptions)=4 and waiting=70\"` ]\n        then echo crosstabbing with nonstandard delimiter failed.; return 2;\n    else\n        return 0;\n    fi\n}\n\nfixed_read(){\n    $APOP_TEXT_TO_DB -ed -f \"3,6\" -nc ${APOP_DATA_DIR}/test_data_fixed_width td td.db\n    $APOP_TEXT_TO_DB -ea -f \"3,6\" -nc ${APOP_DATA_DIR}/test_data_fixed_width td td.db\n    if $SQLITE3 td.db \"select col_2 from td where rowid=4\" | grep '2\\.71828' > /dev/null\n        then return 0\n        else return 1\n    fi\n}\n\n( #one big subshell, so exits don't kill the parent shell.\nif [ ! `echo 123 | $BC` ];\n    then echo \"Missing POSIX-standard bc; exiting without running tests.\"; exit 0; #not a fail.\nelif ! read_faith\n    then echo \"$APOP_TEXT_TO_DB test failed with code $?.\"; exit 1;\nelif ! fixed_read\n    then echo \"read of fixed data failed with code $?.\"; exit 2;\nfi\n)\n"
  },
  {
    "path": "transform/Makefile.am",
    "content": "\nnoinst_LTLIBRARIES = libapoptransform.la\n\nlibapoptransform_la_SOURCES = \\\n\tapop_dconstrain.c\\\n\tapop_fix_params.c \\\n\tapop_coordinate_transform.c \\\n\tapop_mixture.c \\\n\tapop_cross.c\n\nlibapoptransform_la_CFLAGS = \\\n\t-I $(top_srcdir) \\\n\t$(PTHREAD_CFLAGS) \\\n\t$(OPENMP_CFLAGS) \\\n\t$(SQLITE3_CFLAGS) \\\n\t$(GSL_CFLAGS)\n"
  },
  {
    "path": "transform/apop_coordinate_transform.c",
    "content": "#include \"apop_internal.h\"\n\n/* \\amodel apop_coordinate_transform Apply a coordinate transformation of the data to\nproduce a distribution over the transformed data space. This is sometimes called a Jacobian transformation.\n\nHere is an example that replicates the Lognormal distribution. Note the use of \\ref\napop_model_copy_set to set up a model with the given settings.\n\n\\include jacobian.c\n\n\\adoc Input_format The input data is sent to the first model, so use the input format for that model.\n\\adoc Settings   \\ref apop_coordinate_transform_settings\n*/\n\nApop_settings_init(apop_coordinate_transform,\n    Apop_stopif(!in.base_model, , 0, \"I need a .base_model.\");\n)\nApop_settings_copy(apop_coordinate_transform,)\nApop_settings_free(apop_coordinate_transform,)\n\n#define Get_cs(inmodel, outval) \\\n    apop_coordinate_transform_settings *cs = Apop_settings_get_group(inmodel, apop_coordinate_transform); \\\n    Apop_stopif(!cs, return outval, 0, \"At this point, I expect your model to\" \\\n            \"have an apop_coordinate_transform_settings group.\");\n\nstatic void jacobian_prep(apop_data *d, apop_model *m){\n    apop_coordinate_transform_settings *cs = Apop_settings_get_group(m, apop_coordinate_transform); \n    Apop_stopif(!cs, m->error='s', 0, \"missing apop_coordinate_transform_settings group. \"\n            \"Maybe initialize this with apop_model_coordinate_transform?\");\n    apop_prep(d, cs->base_model);\n    m->parameters=cs->base_model->parameters;\n    m->dsize=cs->base_model->dsize;\n}\n\nstatic long double ct_ll(apop_data *indata, apop_model* mj){\n    Get_cs(mj, GSL_NAN)\n    Apop_stopif(!cs->base_model, return GSL_NAN, 0, \"No base model to transform back to.\");\n    Apop_stopif(!cs->transformed_to_base, return GSL_NAN, 0, \"No reverse transformation function.\");\n    Apop_stopif(!cs->jacobian_to_base, return GSL_NAN, 0, \"No Jacobian for the reverse transformation function, \"\n                                                          \"and using numeric derivatives is not yet implemented.\");\n\n    apop_data *rev = cs->transformed_to_base(indata);\n    double ll = apop_log_likelihood(rev, cs->base_model);\n    apop_data_free(rev);\n    ll += log(cs->jacobian_to_base(indata));\n    return ll;\n}\n\napop_model *apop_coordinate_transform = &(apop_model){\"Jacobian-transformed model\", .log_likelihood=ct_ll, .prep=jacobian_prep};\n\ntypedef apop_data *(*d_to_d)(apop_data*);\ntypedef double (*d_to_f)(apop_data*);\n"
  },
  {
    "path": "transform/apop_cross.c",
    "content": "/* Cross product of distributions.\n Copyright (c) 2013 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  \n\n\\amodel apop_cross A cross product of models.  Generate via \\ref apop_model_cross .\n\nFor the case when you need to bundle two uncorrelated models into one larger model. For example, the prior for a multivariate normal (whose parameters are a vector of means and a covariance matrix) is a Multivariate Normal-Wishart pair.\n\n\\adoc    Input_format     There are two means of handling the input format. If the settings group attached to the data set has a non-\\c NULL \\c splitpage element, then \nappend the second data set as an additional page to the first data set, and name the second set with the name you listed in \\c splitpage; see the example.  \n\nIf \\c splitpage is \\c NULL, then I will send the same data set to both models.\n\n\\adoc    Settings   \\ref apop_cross_settings\n\n\\adoc    Parameter_format  \ncurrently \\c NULL; check the sub-models for their parameters.\n\n\\adoc    For an example, see \\ref apop_model_cross .\n*/\n\n#include \"apop_internal.h\"\n\nApop_settings_init(apop_cross, )\nApop_settings_copy(apop_cross, )\nApop_settings_free(apop_cross, )\n\n\n/* For (almost) all methods, the workings are:\n  --get the settings group; fail if missing\n  --find the first and second data sets. This may require changing a \n    ->more pointer in the input data set to split it into two parts\n  --call the submodels using our two data sets \n  --restore that ->more pointer, if needed.\n   */\n\ntypedef struct {\n    apop_data *d1, *d2, *dangly_bit;\n    _Bool need_to_free;\n} twop_s;\n\n\n/* A model must accept data in the form of each observation being a single row of\n   data---i.e., each row is what you'd get from apop_draw.\n    If the length of the data matrix is longer than the first model's stated msize2,\n    then we have to unpack the draw into two new apop_data sets.\n*/\ntwop_s unpack_a_draw(apop_data *d, apop_cross_settings *s){\n    twop_s out = (twop_s){\n        .d1 = (s->model1 ? apop_data_alloc(s->model1->vsize, s->model1->msize1, s->model1->msize2) : NULL),\n        .d2 = (s->model2 ? apop_data_alloc(s->model2->vsize, s->model2->msize1, s->model2->msize2) : NULL),\n        .need_to_free=1};\n    int len1 = s->model1->vsize + s->model1->msize1 * s->model1->msize2;\n    for (int i=0; i< d->matrix->size1; i++){\n        apop_data_unpack(Apop_subvector(Apop_rv(d, i), 0, len1), Apop_r(out.d1, i));\n        apop_data_unpack(Apop_subvector(Apop_rv(d, i), len1, d->matrix->size1), Apop_r(out.d2, i));\n      }\n    return out;\n}\n\nstatic twop_s get_second(apop_data *d, char *splitpage, apop_cross_settings *s){\n    twop_s out = {.d1=d, .d2=d};\n    if (splitpage) {\n        if (d->matrix && (d->matrix->size2 > s->model1->msize2))\n            return unpack_a_draw(d, s);\n        apop_data *ctr = d;\n        if (!ctr ||(ctr->names && ctr->names->title && !strcasecmp(ctr->names->title, splitpage))){\n            out.d1 = NULL;\n            out.d2 = d;\n            return out;\n        }\n        for ( ; ctr->more && (!ctr->more->names || !ctr->more->names->title || strcasecmp(ctr->more->names->title, splitpage)); ) \n            ctr = ctr->more; \n        out.d2 = ctr->more;\n        out.dangly_bit = ctr;\n        ctr->more = NULL; //the only change to the original data set.\n    }\n    return out;\n}\n\n//post-work cleanup: reattach 2nd data set, or if we built a newly-shaped data set, free it.\nstatic void repaste(twop_s dd){\n    if (dd.dangly_bit) dd.dangly_bit->more = dd.d2;\n    if (dd.need_to_free) {apop_data_free(dd.d1); apop_data_free(dd.d2);}\n}\n\n#define check_settings(ret) Apop_stopif(!s, m->error='s'; return ret, 0, \"This model wasn't set up right. Maybe use apop_model_cross to set up your pair of models.\");\n\nvoid cross_print(apop_model *m, FILE *out){\n    apop_cross_settings *s = Apop_settings_get_group(m, apop_cross);\n    check_settings( );\n    apop_model *m1 = s->model1;\n    apop_model *m2 = s->model2;\n    fprintf(out, \"Cross product of %s and %s models:\\n\", m1->name, m2->name);\n    apop_model_print(m1, out);\n    apop_model_print(m2, out);\n}\n    \n#define Preliminaries(ret)          \\\n    apop_cross_settings *s = Apop_settings_get_group(m, apop_cross);    \\\n    check_settings(ret);        \\\n    twop_s datas = get_second(d, s->splitpage, s);\n\nstatic void cross_est(apop_data *d, apop_model *m){\n    Preliminaries();\n\n    s->model1 = apop_estimate(datas.d1, s->model1);\n    s->model2 = apop_estimate(datas.d2, s->model2);\n\n    repaste(datas);\n}\n\nstatic long double cross_ll(apop_data *d, apop_model *m){\n    Preliminaries(GSL_NAN);\n\n    double out =  apop_log_likelihood(datas.d1, s->model1)\n                 +apop_log_likelihood(datas.d2, s->model2);\n    repaste(datas);\n    return out;\n}\n\nstatic long double cross_p(apop_data *d, apop_model *m){\n    Preliminaries(GSL_NAN)\n\n    double out =  apop_p(datas.d1, s->model1) *apop_p(datas.d2, s->model2);\n    repaste(datas);\n    return out;\n}\n\nstatic int cross_draw(double *d, gsl_rng *r, apop_model *m){\n    apop_cross_settings *s = Apop_settings_get_group(m, apop_cross);\n    check_settings(1); \n    Apop_stopif(apop_draw(d, r, s->model1), return 1, 0, \"draw from first model failed.\");\n    double *d2 = d+ s->model1->dsize;\n    Apop_stopif(apop_draw(d2, r, s->model2), return 1, 0, \"draw from second model failed.\");\n    return 0;\n}\n\napop_model *apop_cross = &(apop_model){\"Cross product of models\", .p=cross_p, .log_likelihood=cross_ll, \n    .estimate=cross_est, .draw=cross_draw\n};\n\napop_model *apop_model_cross_base(apop_model *mlist[]){\n    apop_model_print_vtable_add(cross_print, apop_cross);\n    Apop_stopif(!mlist[0], apop_model *oute = apop_model_copy(apop_cross); oute->error='i', \n                            0, \"No inputs. Returning blank model with outmodel->error=='n'.\");\n    Apop_stopif(!mlist[1], return apop_model_copy(mlist[1]), 2, \"Only one model input; returning a copy of that model.\");\n    apop_model *m2 = mlist[2] ? apop_model_cross_base(mlist+1): mlist[1];\n    apop_model *out = apop_model_copy(apop_cross);\n    Apop_model_add_group(out, apop_cross, .model1=mlist[0], .model2=m2);\n    if (mlist[0]->dsize >=0 && m2->dsize >=0) out->dsize = mlist[0]->dsize + m2->dsize;\n    return out;\n}\n"
  },
  {
    "path": "transform/apop_dconstrain.c",
    "content": "#include \"apop_internal.h\"\n#include <stdbool.h>\n\n/* \\amodel apop_dconstrain A model that constrains the base model to within some\ndata constraint. E.g., truncate \\f$P(d)\\f$ to zero for all \\f$d\\f$ outside of a given\nconstraint. Generate using \\ref apop_model_dconstrain .\n\nThe log likelihood works by using the \\c base_model log likelihood, and then scaling\nit based on the part of the base model's density that is within the constraint. If you\nhave an easy means of specifying what that density is, please do, as in the example. If\nyou do not, the log likelihood will calculate it by making \\c draw_ct random draws from\nthe base model and checking whether they are in or out of the constraint. Because this\ndefault method is stochastic, there is some loss of precision.\n\nThe previous scaling is stored in the \\ref apop_dconstrain settings group. Get/set via:\n\\code\ndouble scale = Apop_settings_get(your_model, apop_dconstrain, scale);\nApop_settings_set(your_model, apop_dconstrain, scale, 0);\n\\endcode\nIf \\c scale is zero, because that is the default or because you set it as above, then\nI recalculate the scale.  If the value of the \\c parameters changed since \\c scale\nwas last calculated, I recalculate. If you made other relevant changes to the scale,\nthen you may need to manually zero out \\c scale so it can be recalculated.\n\nHere is an example that makes a few draws and estimations from data-constrained\nmodels. Note the use of \\ref apop_model_set_settings to prepare the constrained models.\n\n\\adoc Examples\n\\include dconstrain.c\n\n\\adoc Input_format That of the base model.\n\\adoc Parameter_format That of the base model. In fact, the \\c parameters element is a pointer to the base model \\c parameters, so both are modified simultaneously.\n\\adoc Settings   \\ref apop_dconstrain_settings\n*/\n\n#define Get_set(inmodel, outval) \\\n    apop_dconstrain_settings *cs = Apop_settings_get_group(inmodel, apop_dconstrain); \\\n    Apop_stopif(!cs, return outval, 0, \"At this point, I expect your model to\" \\\n            \"have an apop_dconstrain_settings group.\");\n\n//what percent of the model density is inside the constraint?\nstatic double get_scaling(apop_model *m){\n    Get_set(m, GSL_NAN)\n    apop_data *d = apop_data_alloc(1, cs->base_model->dsize);\n    int tally = 0;\n    for (int i=0; i< cs->draw_ct; i++){\n        apop_draw(d->matrix->data, cs->rng, cs->base_model);\n        tally += !!cs->constraint(d, cs->base_model);\n    }\n    apop_data_free(d);\n    return (tally+0.0)/cs->draw_ct;\n}\n\nApop_settings_init(apop_dconstrain,\n    Apop_stopif(!in.base_model, , 0, \"I need a .base_model.\");\n    Apop_stopif(!in.constraint, , 0, \"I need a .constraint.\");\n    if (!in.draw_ct) out->draw_ct = 1e4;\n    if (!in.rng && !in.scaling) out->rng = apop_rng_alloc(apop_opts.rng_seed++);\n    if (!in.scaling) out->scaling = get_scaling;\n)\n\nApop_settings_copy(apop_dconstrain, in->refct++;)\nApop_settings_free(apop_dconstrain, in->refct--;\n        if(!in->refct) gsl_vector_free(in->last_params);\n)\n\nstatic void dc_prep(apop_data *d, apop_model *m){\n    apop_dconstrain_settings *cs = Apop_settings_get_group(m, apop_dconstrain); \n    Apop_stopif(!cs, m->error='s', 0, \"missing apop_dconstrainct_settings group. \"\n            \"Maybe initialize this with apop_model_dconstrain?\");\n    apop_prep(d, cs->base_model);\n    m->parameters=cs->base_model->parameters;\n    m->constraint=cs->base_model->constraint;\n    m->vsize = cs->base_model->vsize;\n    m->msize1 = cs->base_model->msize1;\n    m->msize2 = cs->base_model->msize2;\n    m->dsize=cs->base_model->dsize;\n}\n\n/* \\adoc RNG Draw from the base model; if the draw is outside the constraint, throw it out and try again. */\nstatic int dc_rng(double *out, gsl_rng *r, apop_model *m){\n    Get_set(m, 1)\n    gsl_matrix_view mv;\n    do {\n        apop_draw(out, r, cs->base_model);\n        mv = gsl_matrix_view_array(out, 1, cs->base_model->dsize);\n    } while (!cs->constraint(&(apop_data){.matrix=&(mv.matrix)}, cs->base_model));\n    return 0;\n}\n\nstatic double constr(apop_data *d, void *csin){\n    apop_dconstrain_settings* cs = csin;\n    return !cs->constraint(d, cs->base_model);\n}\n\nstatic bool is_stale(apop_dconstrain_settings *cs, apop_model *m){ //do I need to recalculate the scale?\n    bool stale = false;\n    if (!cs->last_params && !m->parameters)\n        stale=false;\n    else {\n        gsl_vector *params = apop_data_pack(m->parameters);\n        if (!cs->last_params){\n            cs->last_params = apop_vector_copy(params);\n            stale = true;\n        } else if (cs->last_params->size != params->size){\n            apop_vector_realloc(cs->last_params, params->size);\n            gsl_vector_memcpy(cs->last_params, params);\n            stale = true;\n        } else if (apop_vector_distance(cs->last_params, params)){\n            gsl_vector_memcpy(cs->last_params, params);\n            stale=true;\n        }\n        gsl_vector_free(params);\n    }\n    if (!cs->scale) stale = true; //but at this point, last_params is prepped.\n    return stale;\n}\n\nstatic long double dc_ll(apop_data *indata, apop_model* m){\n    Get_set(m, GSL_NAN)\n    Apop_stopif(!cs->base_model, return GSL_NAN, 0, \"No base model.\");\n    double any_outside = apop_map_sum(indata, .fn_rp=constr, .param=cs);\n    if (any_outside) return -INFINITY;\n\n    if (is_stale(cs, m)) cs->scale = cs->scaling((cs->scaling == get_scaling) ? m : cs->base_model);\n    Get_vmsizes(indata); //maxsize\n    return apop_log_likelihood(indata, cs->base_model) - log(cs->scale)*maxsize;\n}\n\napop_model *apop_dconstrain = &(apop_model){\"Data-constrained model\", .log_likelihood=dc_ll, .draw=dc_rng, .prep=dc_prep};\n\n/** \\def apop_model_dconstrain\nBuild an \\c apop_dconstrain model, q.v., which applies a data constraint to the data set. For example, this is how one would truncate a model to have data above zero.\n\n\\return An \\ref apop_model that is a copy of \\ref apop_dconstrain and is appropriately set up.\n\n\\li Uses the \\ref apop_dconstrain_settings group. This macro takes elements of that struct as inputs.\n\n\\li This function uses the \\ref designated syntax for inputs.\n*/\n"
  },
  {
    "path": "transform/apop_fix_params.c",
    "content": "/** \\file \n Set some of the parameters of a model to fixed values.*/\n\n/* There's only one public function here. Its header is in likelihoods.h\n \nCopyright (c) 2007, 2009, 2011 by Ben Klemens.  Licensed under the GPLv2; see COPYING.  */\n\n#include \"apop_internal.h\"\nstatic apop_model *fixed_param_model;\n\n//The model keeps a table of what the blanks should be filled in with.\n//This first section does the work for that part.\nstatic double find_nans(double in){ return isnan(in); }\n\nstatic void addin(apop_data *predict, size_t i, int j, size_t page){\n    int len;\n    if (!predict->matrix){\n        predict->matrix = gsl_matrix_alloc(1, 4); \n        len = 0;\n    } else \n        len = predict->matrix->size1;\n    apop_matrix_realloc(predict->matrix, len + 1, 4);\n    apop_data_set(predict, .row=len, .colname=\"row\", .val=i);\n    apop_data_set(predict, .row=len, .colname=\"col\", .val=j);\n    apop_data_set(predict, .row=len, .colname=\"page\", .val=page);\n}\n\nstatic void find_missing(const apop_data *data, apop_data *predict, size_t page){\n    //generate a list of fixed-parameter positions, and their paramvals.\n   apop_data * mask = apop_map((apop_data*)data, find_nans);\n    //find out where the NaNs are\n    for (size_t i=0; mask->vector && i< mask->vector->size; i++)\n        if (apop_data_get(mask, i, -1))\n            addin(predict, i, -1, page);\n    for (size_t i=0; mask->matrix && i< mask->matrix->size1; i++)\n        for (int j=0; j <mask->matrix->size2; j++)\n            if (apop_data_get(mask, i, j))\n                addin(predict, i, j, page);\n    apop_data_free(mask);\n}\n\nstatic apop_data *apop_predict_table_prep(apop_data *in){\n    apop_data *out = apop_data_alloc( );\n    apop_name_add(out->names, \"<fillins>\", 'h');\n    apop_name_add(out->names, \"row\", 'c');\n    apop_name_add(out->names, \"col\", 'c');\n    apop_name_add(out->names, \"page\", 'c');\n    apop_name_add(out->names, \"value\", 'c');\n    for (int page=0; in; page++){\n        find_missing(in, out, page);\n        in = in->more;\n    }\n    return out;\n}\n\n/* Take a \\c predict table and set the entries in the data set to the given predicted\n  value. Functions for prediction and imputation use this internally, and append to your\n  data a \\c predict table of the right form.  For example, \\c apop_ml_impute uses\n  this internally.\n  \n  I assume that the ordering of elements in the \\c predict table include everything on the\n  first page, then everything on the second, et cetera. \n\n\\param data The data set to be filled in. \n\\param predict The set of fillins.\n*/\nstatic void apop_data_predict_fill(apop_data *data, apop_data *predict){\n    if (!predict) return;\n    int this_page_ct = 0;\n    apop_data *this_page = data;\n    for (int i=0; i < predict->matrix->size1; i++){\n        int p = apop_data_get(predict, .row=i, .colname=\"page\");\n        while (p != this_page_ct){//entries are in sequential order, but may skip pages.\n            this_page_ct++;\n            this_page = this_page->more;\n        }\n        apop_data_set(this_page, .row= apop_data_get(predict, .row=i, .colname=\"row\"),\n                                 .col= apop_data_get(predict, .row=i, .colname=\"col\"),\n                                 .val= apop_data_get(predict, .row=i, .colname=\"value\"));\n    }\n}\n\n/////////End predict table machinery.\n\n/** \\cond doxy_ignore  Not in the apop.m4.h header --> not public. */\ntypedef struct {\n    apop_model *base_model;\n    apop_data *predict;\n    int ct;\n} apop_fix_params_settings;\n/** \\endcond */ //End of Doxygen ignore.\n\nstatic void unpack(apop_data *out, apop_model *m){\n    //predict table --> real param set \n   apop_fix_params_settings *mset = Apop_settings_get_group(m, apop_fix_params);\n   Apop_col_tv(mset->predict, \"value\", p_in_tab);\n   gsl_vector_memcpy(p_in_tab, m->parameters->vector);\n   apop_data_predict_fill(out, mset->predict);\n}\n\nstatic void pack(apop_data *in, apop_model *m){\n    //real param set --> predict table \n   apop_fix_params_settings *mset = Apop_settings_get_group(m, apop_fix_params);\n   apop_data *predict = mset->predict;\n    for(int i =0; i< predict->matrix->size1; i++){\n        apop_data_set(predict, .row =i, .colname=\"value\", .val=apop_data_get(in, \n                                                        apop_data_get(predict, .row=i, .colname=\"row\"),\n                                                        apop_data_get(predict, .row=i, .colname=\"col\")));\n        if (i< mset->ct-1 && apop_data_get(predict, .row= i+1, .colname=\"page\") \n                                != apop_data_get(predict, .row= i, .colname=\"page\"))\n            in = in->more;\n    }\n   Apop_col_tv(mset->predict, \"value\", p_in_tab);\n   if (m->parameters) //empty only during set_starting_point\n       gsl_vector_memcpy(m->parameters->vector, p_in_tab);\n}\n\n//The macros generating the fixed_param_settings group's init/copy/free functions:\nApop_settings_init(apop_fix_params, \n    Apop_assert(in.base_model, \"I can't fix a NULL model's parameters.\");\n)\nApop_settings_copy(apop_fix_params, )\nApop_settings_free(apop_fix_params, )\n\nstatic long double fix_params_ll(apop_data *d, apop_model *fixed_model){\n    apop_model *base_model = Apop_settings_get(fixed_model, apop_fix_params, base_model);\n    unpack(base_model->parameters, fixed_model);\n    return apop_log_likelihood(d, base_model);\n}\n\nstatic long double fix_params_p(apop_data *d, apop_model *fixed_model){\n    apop_model *base_model = Apop_settings_get(fixed_model, apop_fix_params, base_model);\n    unpack(base_model->parameters, fixed_model);\n    return apop_p(d, base_model);\n}\n\nstatic long double fix_params_constraint(apop_data *data, apop_model *fixed_model){\n    apop_model *base_model = Apop_settings_get(fixed_model, apop_fix_params, base_model);\n    unpack(base_model->parameters, fixed_model);\n    long double out = base_model->constraint(data, base_model);\n    if (out) pack(base_model->parameters, fixed_model);\n    return out;\n}\n\nstatic int fix_params_draw(double *out, gsl_rng* r, apop_model *eps){\n    apop_model *base_model = Apop_settings_get(eps, apop_fix_params, base_model);\n    unpack(base_model->parameters, eps);\n    return base_model->draw(out, r, base_model);\n}\n\nstatic void fixed_est(apop_data * data, apop_model *params){\n    if (!data) data = params->data;\n    apop_maximum_likelihood(data, params);\n    apop_model *base_model = Apop_settings_get(params, apop_fix_params, base_model);\n    unpack(base_model->parameters, params);\n}\n\nstatic void fixed_param_show(apop_model *m, FILE *out){\n    apop_fix_params_settings *mset = Apop_settings_get_group(m, apop_fix_params);\n    fprintf(out, \"The fill-in table:\\n\");\n    apop_data_print(mset->predict, .output_pipe=out);\n    if (!m->parameters) printf(\"This copy of the model has not yet been estimated.\\n\");\n    else {\n        fprintf(out, \"The base model, after unpacking:\\n\");\n        unpack(mset->base_model->parameters, m);\n    }\n    apop_model_print(mset->base_model, out);\n}\n\nstatic void fixed_param_prep(apop_data *data, apop_model *params){\n    apop_model_print_vtable_add(fixed_param_show, fixed_param_model);\n    apop_model_clear(data, params);\n    //apop_model *base_model = Apop_settings_get(params, apop_fix_params, base_model);\n    //apop_prep(data, base_model);\n}\n\nstatic apop_model *fixed_param_model = &(apop_model){\"Fill me\", .estimate=fixed_est, .p = fix_params_p, \n            .log_likelihood=fix_params_ll, .constraint= fix_params_constraint, \n            .draw=fix_params_draw, .prep=fixed_param_prep};\n\n\nvoid set_starting_point(apop_data * in_params, apop_model * model_out, double *start, apop_data *predict_tab){\n    //reshape the starting_pt to the shape of typical params\n    apop_data *param_cp = apop_data_copy(in_params);\n    Get_vmsizes(in_params);//tsize\n    gsl_vector_view v = gsl_vector_view_array(start, tsize);\n    apop_data_unpack(&(v.vector), param_cp);\n\n    pack(param_cp, model_out); //write to the fill tab.\n    apop_data_free(param_cp);\n    Apop_col_tv(predict_tab, \"value\", ptv);\n    double *new_start = malloc(ptv->size * sizeof(double)); //leak!!\n    memcpy(new_start, ptv->data, ptv->size* sizeof(double));\n    Apop_settings_add(model_out, apop_mle, starting_pt, new_start);\n}\n\n/** Produce a model based on another model, but with some of the parameters fixed at a given value. \n  \nYou will send me the model whose parameters you want fixed, with the \\c parameters element\nset as follows. For the fixed parameters, simply give the values to which they will\nbe fixed. Set the free parameters to \\c NaN.\n\nFor example, here is a Binomial distribution with a fixed \\f$n=30\\f$ but \\f$p_1\\f$ allowed to float freely:\n\n\\code\napop_model *bi30 = apop_model_fix_params(apop_model_set_parameters(apop_binomial, 30, NAN));\nApop_model_add_group(bi30, apop_mle, .starting_pt=(double[]){.5}); // The Binomial doesn't like the\n                                                                   //  default starting point of 1.\napop_model *out = apop_estimate(your_data, bi30);\n\\endcode\n\nThe output is an \\c apop_model that can be estimated, Bayesian updated, et cetera.\n\n\\li Rather than using this model, you may simply want a now-filled-in copy of the\n    original model. Use \\ref apop_model_fix_params_get_base to retrieve the original model's parameters.\n\\li The \\c estimate method always uses an MLE, and it never calls the base model's \\c estimate method.\n\\li If the input model has an \\ref apop_mle_settings group attached, I'll use them for the \\c\n    estimate method. Otherwise, I'll set my own.\n\\li If the parameter input has non-NaN values at the free parameters, then I'll use\n    those as the starting point for any MLE search; the defaults for the variables without\n    fixed values starts from <b>1</b> as usual.\n\\li I do check the \\c more pointer of the \\c parameters for additional pages and <tt>NaN</tt>s on those pages.\n\nHere is a sample program. It produces a few thousand draws from a Multivariate Normal distribution,\nand then tries to recover the means given a var/covar matrix fixed at the correct variance.\n\n\\include fix_params.c\n  \n\\param model_in   The base model\n\\return a model that can be used like any other, with the given params fixed or free.\n*/\napop_model * apop_model_fix_params(apop_model *model_in){\n    Nullcheck_mp(model_in, NULL)\n    apop_model *model_out  = Apop_model_copy_set(fixed_param_model,\n                                apop_fix_params, .base_model = model_in);\n\n    apop_data *predict_tab; //Keep the predict tab as a data set and in the settings struct\n    predict_tab = apop_predict_table_prep(model_in->parameters);\n    Apop_stopif (!predict_tab || !predict_tab->matrix|| !predict_tab->matrix->size2,\n        apop_data_free(predict_tab);\n        apop_model_free(model_out);\n        return apop_model_copy(model_in);\n        , 1, \"No free parameters (which would be marked with a NaN). \"\n                \"Returning a copy of the input model.\"\n    );\n    apop_settings_set(model_out, apop_fix_params, predict, predict_tab);\n    model_out->vsize = predict_tab->matrix->size1;\n    model_out->dsize = model_in->dsize;\n\n    if (Apop_settings_get_group(model_in, apop_mle)){\n        apop_settings_copy_group(model_out, model_in, \"apop_mle\");\n        double *start = Apop_settings_get(model_in, apop_mle, starting_pt);\n        if (start) set_starting_point(model_in->parameters, model_out, start, predict_tab);\n    }\n    else Apop_model_add_group(model_out, apop_mle, .method=\"PR cg\",\n                                     .step_size=1, .tolerance=0.2);\n    if (Apop_settings_get_group(model_in, apop_parts_wanted))\n        apop_settings_copy_group(model_out, model_in, \"apop_parts_wanted\");\n\n\n    #define cut_if_missing(method) if (!model_in->method) model_out->method = NULL;\n    cut_if_missing(p);\n    cut_if_missing(draw);\n    cut_if_missing(constraint);\n    cut_if_missing(log_likelihood);\n    snprintf(model_out->name, 100, \"%s, with some params fixed\", model_in->name);\n    return model_out;\n}\n\n/** The \\ref apop_model_fix_params function produces a model that has only the non-fixed\n  parameters of the model. After estimation of the fixed-parameter model, this function\n  fills the \\c parameters element of the base model and returns a pointer to the\n  base model.\n*/\napop_model * apop_model_fix_params_get_base(apop_model *fixed_model){\n    apop_model *base_model = Apop_settings_get(fixed_model, apop_fix_params, base_model);\n    unpack(base_model->parameters, fixed_model);\n    return base_model;\n}\n"
  },
  {
    "path": "transform/apop_mixture.c",
    "content": "/** \\file \n */\n/*\n\\amodel apop_mixture The mixture model transformation: a linear combination of multiple models.  \n\nUse \\ref apop_model_mixture to produce one of these models. In the examples below, some are generated from unparameterized input models with a form like \n\n\\code\napop_model *mf = apop_model_mixture(apop_model_copy(apop_normal), apop_model_copy(apop_normal));\nApop_settings_set(mf, apop_mixture, find_weights, 'y');\nApop_model_add_group(mf, apop_mle, .starting_pt=(double[]){.5, .5, 50, 5, 80, 5},\n                                   .step_size=3, .tolerance=1e-6);\napop_model_show(apop_estimate(dd, mf));\n\\endcode\n\nOr, one can skip the estimation and use already-parameterized models as input to \\ref apop_model_mixture, e.g.:\n\n\\code\napop_model *r_ed = apop_model_mixture(apop_model_set_parameters(apop_normal, 54.6, 5.87),\n                       apop_model_set_parameters(apop_normal, 80.1, 5.87));\napop_data *wts = apop_data_falloc((2), 0.36, 0.64);\nApop_settings_add(r_ed, apop_mixture, weights, wts->vector);\nprintf(\"LL=%g\\n\", apop_log_likelihood(dd, r_ed));\n\\endcode\n\nNotice that the weights vector has to be added after the call to \\ref apop_model_mixture. If none is given, then equal weights are assigned to all components of the mixture.\n\nOne can think of the estimation in the un-parameterized case as a missing-data problem: each data point originated\nin one distribution or the other, and if we knew with certainty which data point\ncame from which distribution, then the estimation problem would be trivial:\njust generate the subsets and call <tt>apop_estimate(dataset1, model1)</tt>, ...,\n<tt>apop_estimate(datasetn, modeln)</tt> separately.  But the assignment of which\nelement goes where is unknown information, which we guess at using an expectation-maximization (EM) algorithm. The\nstandard algorithm starts with an initial set of parameters for the models, and assigns\neach data point to its most likely model. It then re-estimates the\nmodel parameters using their subsets. The standard algorithm, see e.g. <a\nhref=\"http://www.jstatsoft.org/v32/i06/paper\">this PDF</a>, repeats until it arrives\nat an optimum.\n\nThus, the log likelihood method for this model includes a step that allocates each data\npoint to its most likely model, and calculates the log likelihood of each observation\nusing its most likely model. [It would be a valuable extension to extend this to\nnot-conditionally IID models. Commit \\c 1ac0dd44 in the repository had some notes on\nthis, now removed.]  As a side-effect, it calculates the odds of drawing from each model\n(the vector λ). Following the above-linked paper, the probability for a given\nobservation under the mixture model is its probability under the most likely model\nweighted by the previously calculated \\f$\\lambda\\f$ for the given model.\n\nApophenia implements the EM algorithm as a constrained optimization(!). The constraint\ncheck repositions the vector of weights to that calculated at the last step, then the\nlog likelihood calculates the likelihood as above, including the expected value of\nthe weights vector for the next step. Thus, Apophenia casts the Expectation step as\na step repositioning the maximization's constraint and its associated penalties.\n\n<em>Estimations of mixture distributions can be sensitive to initial conditions.</em>\nYou are encouraged to try a sequence of random starting points for your model parameters.\nSome authors recommend plotting the data and eyeballing a guess as to the model parameters.\n\n\\li The default is to take the weight assigned to each distribution as fixed. If you\nwant to use the EM algorithm to move the weights as described above, specify this via\n\\code\nApop_settings_set(your_model, apop_mixture, find_weights, 'y');\n\\endcode\n\n\\li A kernel density is a mixture of a large number of homogeneous models, where each is typically centered around a point in your data. For such situations, \\ref apop_kernel_density will be easier to use.\n\n\\adoc    Input_format   The same data gets sent to each of the component models of the\nmixture. Each row is an observation, and the estimation routine assumes that models are\nconditionally IID (i.e., having chosen what component of the mixture the observation\ncomes from, its likelihood can be calculated independently of all other observations).\n\n\\adoc    Settings   \\ref apop_mixture_settings \n\n\\adoc    Parameter_format The parameters are broken out in a readable form in the\n    settings group, so your best bet is to use those. See the sample code for usage.<br>\n    The <tt>parameter</tt> element is a single vector piling up all elements, beginning\n    with the first \\f$n-1\\f$ weights, followed by an <tt>apop_data_pack</tt> of each model's\n    parameters in sequence. Because all elements are in a single vector, one could run a\n    maximum likelihood search for all components (including the weights) at once. \n    The <tt>log_likehood</tt>, <tt>estimate</tt>, and other methods unpack\n    this vector into its component parts for you.\n\n\\adoc RNG Uses the weights to select a component model, then makes a draw from that component.\nThe model's \\c dsize (draw size) element is set when you set up the model in the\nmodel's \\c prep method (automatically called by \\ref apop_estimate, or call it directly)\niff all component models have the same \\c dsize.\n\n\\adoc   Examples\nThe first example uses a text file \\c faith.data, in the \\c tests directory of the distribution.\n\\include faithful.c\n\nThis example begins with a fixed mixture distribution, and makes assertions about the characteristics of draws from it.\n\n\\include hills2.c\n*/\n\n#include \"apop_internal.h\"\n\nApop_settings_copy(apop_mixture,\n    (*out->cmf_refct)++;\n    out->next_weights = apop_vector_copy(in->next_weights);\n)\n\nApop_settings_free(apop_mixture,\n    if (!(--in->cmf_refct)) {\n        apop_model_free(in->cmf);\n        free(in->cmf_refct);\n        free(in->model_list);\n    }\n    free(in->param_sizes);\n    gsl_vector_free(in->next_weights);\n) \n\nApop_settings_init(apop_mixture, \n    out->cmf_refct = calloc(1, sizeof(int));\n    (*out->cmf_refct)++;\n)\n\n//see apop_model_mixture in types.h\napop_model *apop_model_mixture_base(apop_model **inlist){\n    apop_model *out = apop_model_copy(apop_mixture);\n    int count=0, ctr=0;\n    for (apop_model **m = inlist; *m; m++) count++;\n\n    //inlist is stack-allocated; may disappear at any moment.\n    apop_model **inlist_copy = malloc((count+1) *sizeof(inlist));\n    for (apop_model **m = inlist; *m; m++) inlist_copy[ctr++] = *m;\n    inlist_copy[ctr] = NULL;\n\n    apop_mixture_settings *ms =Apop_settings_add_group(out, apop_mixture, .model_list=inlist_copy,\n            .model_count=count, .param_sizes=malloc(sizeof(int)*count));\n    int dsize = inlist[0]->dsize;\n    for (int i=1; i< count && dsize > -99; i++) if (inlist[i]->dsize != dsize) dsize = -100;\n    if (dsize > -99) out->dsize = dsize;\n\n    ms->weights = gsl_vector_alloc(ms->model_count);\n    gsl_vector_set_all(ms->weights, 1./count);\n    return out;\n}\n\nvoid mixture_show(apop_model *m, FILE *out){\n    apop_mixture_settings *ms = Apop_settings_get_group(m, apop_mixture);\n    if (m->parameters){\n        fprintf(out, \"Mixture of %i models, with weights:\\n\", ms->model_count);\n        apop_vector_print(ms->weights, .output_pipe=out);\n    } else fprintf(out, \"Mixture of %i models, with unspecified weights\\n\", ms->model_count);\n\n    for (int i=0; i< ms->model_count; i++){\n        fprintf(out, \"\\n\");\n        apop_model_print(ms->model_list[i], out);\n    }\n}\n\n\nstatic void mixture_prep(apop_data * data, apop_model *model){\n    apop_model_print_vtable_add(mixture_show, apop_mixture);\n    if (model->parameters) return;\n    apop_mixture_settings *ms = Apop_settings_get_group(model, apop_mixture);\n    model->parameters = apop_data_alloc();\n    model->parameters->vector = apop_vector_stack(model->parameters->vector, ms->weights, .inplace='y');\n\n    int i=0;\n    for (apop_model **m = ms->model_list; *m; m++){\n        if (!(*m)->parameters) apop_prep(data, *m);\n        gsl_vector *v = apop_data_pack((*m)->parameters);\n        ms->param_sizes[i++] = v ? v->size : 0;\n        model->parameters->vector = apop_vector_stack(model->parameters->vector, v, .inplace='y');\n        gsl_vector_free(v);\n    }\n    if (!model->dsize) model->dsize = (*ms->model_list)->dsize;\n    if (!model->vsize) model->vsize = model->parameters->vector->size;\n}\n\nvoid unpack(apop_model *min){\n    apop_mixture_settings *ms = Apop_settings_get_group(min, apop_mixture);\n    int i=0,  posn= (ms->weights ? ms->weights->size : 0);\n    if (!min->parameters) return; //Trusting user that the user has added already-esimated models.\n\n    //here, pre-calced weights are overwritten.\n    if (posn && (ms->find_weights && ms->find_weights!='n' && ms->find_weights!='N') && ms->next_weights)\n        gsl_vector_memcpy(ms->weights, Apop_rs(min->parameters, 0, posn)->vector);\n    for (apop_model **m = ms->model_list; *m; m++){\n        if (!ms->param_sizes[i]) continue; //NULL params\n        gsl_vector v = gsl_vector_subvector(min->parameters->vector, posn, ms->param_sizes[i]).vector;\n        apop_data_unpack(&v, (*m)->parameters);\n        posn+=ms->param_sizes[i++];\n    }\n}\n\n#define weighted_sum(fn)                                                     \\\n    long double total=0;                                                     \\\n    long double total_weight = apop_sum(ms->weights);                        \\\n    size_t i=0;                                                              \\\n    for (apop_model **m = ms->model_list; *m; m++)                           \\\n        total += fn(d, *m) * gsl_vector_get(ms->weights, i++)/total_weight;\n\n//The output is a grid of log likelihoods.\napop_data* get_lls(apop_data *d, apop_model *m){\n    apop_mixture_settings *ms = Apop_settings_get_group(m, apop_mixture);\n    Get_vmsizes(d); //maxsize\n    apop_data *out = apop_data_alloc(maxsize, ms->model_count);\n\n    for (int i=0; i< maxsize; i++){\n        Apop_row(d, i, onepoint);\n        for (int j=0; j< ms->model_count; j++){\n            double this_val = apop_log_likelihood(onepoint, ms->model_list[j]);\n            apop_data_set(out, i, j, this_val);\n        }\n    }\n    return out;\n}\n\n/* The trick to summing exponents: subtract the max:\n\nlet ll_M be the max LL. then\nΣexp(ll) = exp(llM)*(exp(ll₁-llM)+exp(ll₂-llM)+exp(ll₃-llM))\n\nOne of the terms in the sum is exp(0)=1. The others are all less than one, and so we\nare guaranteed no overflow. If any of them underflow, then that term must not have\nbeen very important for the sum.\n*/\nstatic long double sum_exp_vector(gsl_vector const *onerow){\n    long double rowtotal = 0;\n    double best = gsl_vector_max(onerow);\n    for (int j=0; j<onerow->size; j++) rowtotal += exp(gsl_vector_get(onerow, j)-best);\n    rowtotal *= exp(best);\n    return rowtotal;\n}\n\nstatic long double mixture_log_likelihood(apop_data *d, apop_model *model_in){\n    apop_mixture_settings *ms = Apop_settings_get_group(model_in, apop_mixture);\n    Apop_stopif(!ms, model_in->error='p'; return GSL_NAN, 0, \"No apop_mixture_settings group. \"\n                                              \"Did you set this up with apop_model_mixture()?\");\n    if (model_in->parameters) unpack(model_in);\n    apop_data *lls = get_lls(d, model_in);\n\n    //reweight by last round's lambda \n    for (int i=0; i< lls->matrix->size2; i++){\n        Apop_col_v(lls, i, onecol);\n        gsl_vector_add_constant(onecol, gsl_vector_get(ms->weights, i));\n    }\n\n//OK, now we need the λs, and then the max ll for each observation\n\n/*\nDraw probabilities are p₁/Σp p₂/Σp p₃/Σp (see equation (2) of the above pdf.)\nBut I have logs, and want to stay in log-format for as long as possible, to prevent undeflows and loss of precision.\n*/\n    long double total_ll=0;\n    gsl_vector *ps = gsl_vector_alloc(lls->matrix->size2);\n    gsl_vector *cp = gsl_vector_alloc(lls->matrix->size2);\n    for (int i=0; i< lls->matrix->size1; i++){\n        Apop_row_v(lls, i, onerow);\n        total_ll += gsl_vector_max(onerow);\n        for (int j=0; j < onerow->size; j++){\n            gsl_vector_memcpy(cp, onerow);\n            gsl_vector_add_constant(cp, -gsl_vector_get(onerow, j));\n            gsl_vector_set(ps, j, 1./sum_exp_vector(cp));\n        }\n        gsl_vector_memcpy(onerow, ps);\n\n        Apop_stopif(fabs(apop_sum(onerow) - 1) > 1e-3, /*Warn user, but continue.*/, 0,\n                \"One of the probability calculations is off: the total for the odds of drawing \"\n                \"from the %i mixtures is %g but should be 1.\", \n                ms->model_count, fabs(apop_sum(onerow) - 1));\n    }\n    gsl_vector_free(cp);\n    gsl_vector_free(ps);\n\n    if (!ms->next_weights) ms->next_weights = gsl_vector_alloc(ms->weights->size);\n    for (int i=0; i< lls->matrix->size2; i++){\n        Apop_col_v(lls, i, onecol);\n        gsl_vector_set(ms->next_weights, i, apop_sum(onecol)/lls->matrix->size1);\n    }\n    return total_ll;\n}\n\nstatic int mixture_draw (double *out, gsl_rng *r, apop_model *m){\n    apop_mixture_settings *ms = Apop_settings_get_group(m, apop_mixture);\n    OMP_critical (mixdraw)\n    if (!ms->cmf){\n        ms->cmf = apop_model_copy(apop_pmf);\n        ms->cmf->data = apop_data_alloc();\n        ms->cmf->data->weights = apop_vector_copy(ms->weights);\n        Apop_model_add_group(ms->cmf, apop_pmf, .draw_index='y');\n    }\n    double index; \n    Apop_stopif(apop_draw(&index, r, ms->cmf), return 1, \n            0, \"Couldn't select a mixture element using the internal PMF over mixture elements.\");\n    return apop_draw(out, r, ms->model_list[(int)index]);\n}\n\nstatic long double mixture_cdf(apop_data *d, apop_model *model_in){\n    Nullcheck_m(model_in, GSL_NAN)\n    Nullcheck_d(d, GSL_NAN)\n    apop_mixture_settings *ms = Apop_settings_get_group(model_in, apop_mixture);\n    unpack(model_in);\n    weighted_sum(apop_cdf);\n    return total;\n}\n\nstatic long double mixture_constraint(apop_data *data, apop_model *model_in){\n    apop_mixture_settings *ms = Apop_settings_get_group(model_in, apop_mixture);\n    long double penalty = 0;\n    unpack(model_in);\n    //check all component models.\n    for (apop_model **m = ms->model_list; *m; m++)\n        penalty += (*m)->constraint(data, *m);\n    if (penalty){\n        int posn=0, i=0;\n        for (apop_model **m = ms->model_list; *m; m++){\n            if (!ms->param_sizes[i]) continue; //NULL params\n            gsl_vector v = gsl_vector_subvector(model_in->parameters->vector, posn, ms->param_sizes[i]).vector;\n            apop_data_pack((*m)->parameters, &v);\n            posn+=ms->param_sizes[i++];\n        }\n    }\n    if (ms->next_weights && (ms->find_weights && ms->find_weights!='n' && ms->find_weights!='N')){\n        //correct the search to the weights from the EM algorithm.\n        gsl_vector *param_weights = Apop_rs(model_in->parameters, 0, ms->next_weights->size)->vector;\n        penalty += apop_vector_distance(ms->next_weights, param_weights);\n        gsl_vector_memcpy(param_weights, ms->next_weights);\n    } else {\n        //weights are all positive?\n        gsl_vector v = gsl_vector_subvector(model_in->parameters->vector, 0, ms->model_count).vector;\n        penalty += apop_linear_constraint(&v);\n    }\n    return penalty;\n}\n\napop_model *apop_mixture=&(apop_model){\"Mixture of models\", .prep=mixture_prep,\n    .constraint=mixture_constraint, .log_likelihood=mixture_log_likelihood,\n    .cdf=mixture_cdf, .draw=mixture_draw };\n"
  }
]