QRP Alias(es) & related concepts Definition QRP umbrella term(s) Example(s) Potential harms Preventive measures Detectability Clues Sources
Planning
Choosing biased manipulations Selecting an unjustified manipulation to reach a misleading outcome.
  • Influencing participants
(1) Selecting sub-clinical drug doses to suggest no effect or only deliberately high doses to suggest an adverse effect. (2) Using images or videos to elicit emotions, but the stimuli are not eliciting emotions.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Run pilot studies to investigate if the manipulation can elicit an effect
  • Use manipulation check questions
  • Use standard stimuli or dosages
Maybe
  • Lack of manipulation check
  • Using dosages outside of recommended values
  • Using stimuli that can elicit extreme responses
  • Using stimuli that were not previously tested and/or have no proven effect
  • Joe, S., & Leif, N. (2020, March 10). Data Replicada #4: The Problem of Hidden Confounds. Data Colada. https://datacolada.org/85
  • Marchetti, S., & Schellens, J. H. M. (2007). The impact of FDA and EMEA guidelines on drug development in relation to Phase 0 trials. British Journal of Cancer, 97(5), 577–581. https://doi.org/10.1038/sj.bjc.6603925
Choosing overlapping measures to find significant results Leveraging the jangle fallacy Exploiting the conceptual similarity between measures, presenting them as distinct constructs, to get significant results. None (1) Researcher is using similar items in two seemingly different constructs and concludes that the constructs have a high correlation. (2) Researcher finds a positive relationship between depression and suicidality, but the positive relation is in part due to the depression scale already containing items about suicide. (3) A researcher estimates a positive relation between construct A and construct B in a meta-analysis, but the meta-analysis includes studies that have used the same task to operationalize either construct.
  • Inflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Be transparent about similar items
  • Consider whether the measures used in a study distinctly operationalize different constructs
  • Account for covariance due to similar items
Yes
  • Items or tasks are similar for the correlated constructs
  • Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456-465. https://doi.org/10.1177/2515245920952393
  • Hodson, G. (2021). Construct jangle or construct mangle? Thinking straight about (nonredundant) psychological constructs. Journal of Theoretical Social Psychology, 5(4), 576–590. https://doi.org/10.1002/jts5.120
  • Wang, Y. A., & Eastwick, P. W. (2020). Solutions to the problems of incremental validity testing in relationship science. Personal Relationships, 27(1), 156-175. https://doi.org/10.1111/pere.12309
Performing inappropriate power analysis Selecting inappropriate parameters or methods for power analysis and/ or misinterpreting/misusing power analysis results. None (1) A researcher chooses a small sample size to get null results in a study about the harmful effects of smoking. (2) A researcher uses default parameters for the power analysis to show that power analysis had been performed, however, the analysis is uninformative.
  • Inflated confidence in the research
  • Inflated type II error
  • Reduced replicability
  • Be transparent about how power analysis was conducted
  • Use meaningful parameters that are based on field standards, literature, or common sense
Yes
  • Details of power analysis not reported or justified
  • Parameters of the power analysis are generic and do not fit the study
  • Relatively low sample size
  • The results of power analysis are misinterpreted
  • Heckman, M. G., Davis, J. M., 3rd, & Crowson, C. S. (2022). Post Hoc Power Calculations: An Inappropriate Method for Interpreting the Findings of a Research Study. The Journal of Rheumatology, 49(8), 867–870. https://doi.org/10.3899/jrheum.211115
  • Kovacs, M., van Ravenzwaaij, D., Hoekstra, R., & Aczel, B. (2022). SampleSizePlanner: A Tool to Estimate and Justify Sample Size for Two-Group Studies. Advances in Methods and Practices in Psychological Science, 5(1), 25152459211054059. https://doi.org/10.1177/25152459211054059
  • Lakens, D. (2022). Sample size justification. Collabra. Psychology, 8(1). https://doi.org/10.1525/collabra.33267
Preregistering after results are known (PARKing) Preregistering a hypothesis or analysis after knowing the outcome of the analysis.
  • Misusing open science practices
(1) Researcher realizes that their paper won’t be accepted without a preregistration, so they create one post-hoc and link it to their study.
  • Inflated confidence in the research
  • Be transparent about when the preregistration was done
  • Disclose the familiarity (if any) of the researcher with the data
  • Preregister before analyzing the data
Yes
  • Data collection occurred previous to the preregistration date
  • Date of preregistration is unrealistically close to the first submission without the paper being a registered report
Choosing biased measurements Selecting an instrument that is biased or invalid, to support a desired narrative.
  • Influencing participants
(1) Researcher uses loaded, leading, or suggestive questions, e.g., “Does the lack of respect schoolchildren have for their teachers, in your opinion, influence everyday teaching methods in schools?” (2) Researcher assesses a psychological construct using an ad hoc questionnaire with no proven validity. (3) Researcher uses a scale that do not capture the intended construct properly in order to support a desired narrative
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Frame questions in a neutral way
  • Get external opinions on questionnaires/study materials OR conduct a registered report
  • Use questionnaires that are unbiased and psychometrically sound
Yes
  • Ad hoc questionnaires are used instead of validated instruments
  • Measurement items use suggestive or biased language
  • Measurement is based on single-item questions
  • No discussion of the psychometric properties of the instruments
  • Flake, J. K., & Fried, E. I. (2020). Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393
Data collection
Mixing pilot and main study data Double dipping, Retaining pilot data Including data from a pilot study if the results support the hypothesis.
  • Sample curation
(1) Researcher conducts a pilot study to check the protocol and analyzes pilot data. The pilot data and main study data are aggregated in the data analysis if pilot data results are in line with expectations.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Don’t include pilot data in the analyzed dataset
  • Report pilot data separately
Maybe
  • If data are shared, timestamps may fall into two distinct periods
  • Sample sizes reported throughout the paper might not match
  • Kravitz, D., & Mitroff, S. (2020). Quantifying, and correcting for, the impact of questionable research practices on false discovery rates in psychological science. https://doi.org/10.31234/osf.io/fu9gy
  • Kriegeskorte, N., Simmons, W. K., Bellgowan, P. S. F., & Baker, C. I. (2010). Circular analysis in systems neuroscience the dangers of double dipping. Nature Neuroscience, 12(5), 535–540. https://doi.org/10.1038/nn.2303
Optional stopping Peeking, Data peeking Monitoring hypothesis tests during data collection, and stopping when statistical inference is favorable, without controlling for sequential testing.
  • Sample curation
(1) Researcher is collecting responses and tests the hypothesis after every participant - when significance is reached, the researcher stops collecting data.
  • Inflated type I error
  • Reduced replicability
  • Preregister stopping rules and adjustments for type I error-inflation
  • Preregister the estimated sample size
Maybe
  • Absence of preregistration
  • Low sample size
  • P-values are just below the significance threshold (usually 0.05)
  • Relatively large effect size compared to other studies in the field
  • Vague or absent reason for sample size
  • de Heide, R., & Grünwald, P. D. (2021). Why optional stopping can be a problem for Bayesians. Psychonomic Bulletin & Review, 28(3), 795–812. https://doi.org/10.3758/s13423-020-01803-x
  • Lakens, D. (2022). Sample size justification. Collabra. Psychology, 8(1). https://doi.org/10.1525/collabra.33267
  • Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47(5), 609–612. https://doi.org/10.1016/j.jrp.2013.05.009
  • Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M. (2017). Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological Methods, 22, 322–339. doi:10.1037/met0000061 [OSF project with reproducible code, workshop slides, presentation slides]
  • Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7, 1832. https://doi.org/10.3389/fpsyg.2016.01832
Selective sampling Biased sampling Collecting a sample in a way that biases the findings.
  • Sample curation
(1) Researcher tests the likeability of chocolate on a group of children only in order to find that everyone loves it. (2) Using uncomparable groups: The researcher tests if men are more aggressive than women. For comparison, women from a university are compared with men from a prison. (3) Picking a subsample of a panel dataset to find the desired results.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Reduced replicability if sampling bias is not disclosed
  • Consider using statistical control for confounding variables
  • Make sure that the sample represents the population
  • Preregister the sampling process
  • Report the sampling process transparently
  • Use a sampling method that doesn’t bias the results
  • Use comparable (or matched) groups
Yes
  • Compared groups are from different populations
  • Convenience sample
  • Unclear rationale for sample selection
Placing undue influence on participants Affecting participants to make them give responses that support the desired narrative.
  • Influencing participants
(1) Researcher tells the participants that he believes the treatment will work. (2) Researcher uses a briefing document that is trying to influence participants’ attitudes about the topic that is assessed in the study. (3) During data collection the same organization that runs the study also runs a marketing campaign to influence public opinion on the same topic.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Avoid exposing participants to any cues that might influence their responses
  • Blind the experimenter where possible
  • Researchers interacting with participants should remain neutral and follow scripts during testing
  • Use automated research procedures (e.g. research presentation software) instead of human research assistants wherever possible
No
  • Absence of blinding
  • Absence of procedure scripts
  • Suggestive questions in the survey
  • McCambridge, J., de Bruin, M., & Witton, J. (2012). The effects of demand characteristics on research participant behaviours in non-laboratory settings: a systematic review. PloS One, 7(6), e39116. https://doi.org/10.1371/journal.pone.0039116
Data processing
Excluding data points Exclusion of data points or outliers without proper justification and transparent reporting.
  • P-hacking
(1) Removing individual reaction time trials based on post hoc criteria. (2) Trying different outlier cut-off criteria until an effect is statistically significant.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Reduced replicability
  • Reduced reproducibility
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Preregister the study
  • Report post hoc changes in exclusion criteria
Maybe
  • Absence of open data
  • Absence of preregistration
  • Unexplained discrepancy between the recruited and analyzed sample sizes and degrees of freedom
  • Bakker, M., & Wicherts, J. M. (2014). Outlier removal and the relation with reporting errors and quality of psychological research. PloS one, 9(7), e103360. https://doi.org/10.1371/journal.pone.0103360
  • Bakker, M., & Wicherts, J. M. (2014). Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations. Psychological Methods, 19(3), 409–427. https://doi.org/10.1037/met0000014
  • Osborne, J. W., & Overbay, A. (2004). The power of outliers (and why researchers should always check for them). Practical Assessment, Research, and Evaluation, 9(1), 6. https://doi.org/10.7275/qf69-7k43
Missing data hacking Favorable imputation Choosing the strategy to handle missing data based on the impact on the results.
  • P-hacking
(1) A researcher tries three ways of handling missing data, for example, listwise deletion, multiple imputation, and inverse probability weighting. The expected results only appear with inverse probability weighting. The researcher reports only this strategy in the paper and leaves out results with listwise deletion and multiple imputation. (2) Can also be within a single method, specifically multiple imputation, since it uses one or more variables to replace missing data, and the choice of these variables is up to the researcher, but can also be statistically based.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced reproducibility
  • Reduced replicability
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Preregister missing data approach
Maybe
  • Absence of open data
  • Lack of rationale or references supporting the missing data approach
  • No mention of the missing data approach or the missing data at all
  • Unexplained discrepancy between the recruited and analyzed sample sizes
  • Enders, C. K. (2010). Applied missing data analysis. Guilford Press.
  • Woods, A. D., Gerasimova, D., Van Dusen, B., Nissen, J., Bainter, S., Uzdavines, A., Davis-Kean, P. E., Halvorson, M., King, K. M., Logan, J. A. R., Xu, M., Vasilev, M. R., Clay, J. M., Moreau, D., Joyal-Desmarais, K., Cruz, R. A., Brown, D. M. Y., Schmidt, K., & Elsherif, M. M. (2023). Best practices for addressing missing data through multiple imputation. Infant and Child Development. https://doi.org/10.1002/icd.2407
Using ad hoc exclusion criteria for participants Exclusion of participants without proper justification transparent reporting.
  • P-hacking
  • Sample curation
(1) Researcher finds that a correlation between two variables is not significant. After removing two participants - who should be included - the association becomes significant. Then the researcher comes up with post hoc exclusion criteria for those participants. (2) A researcher doesn’t find an expected association between perceived stress and personality. When looking only at the top 25% of perceived stress scores, the association is there. They go on to report the top 25% scores as their population of interest and do not disclose that they looked at the rest of the sample population.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Reduced replicability
  • Reduced reproducibility
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Preregister the study
  • Report post hoc changes in exclusion criteria
Maybe
  • Absence of open data
  • Absence of preregistration
  • Sample too narrow for recruitment methods
  • Unexplained discrepancy between the recruited and analyzed sample sizes
  • Lang, S., Armstrong, N., Deshpande, S., Ramaekers, B., Grimm, S., de Kock, S., Kleijnen, J., & Westwood, M. (2019). Clinically inappropriate post hoc exclusion of study participants from test accuracy calculations: the ROMA score, an example from a recent NICE diagnostic assessment. Annals of clinical biochemistry, 56(1), 72–81. https://doi.org/10.1177/0004563218782722
  • Nüesch, E., Trelle, S., Reichenbach, S., Rutjes, A. W. S., Bürgi, E., Scherer, M., Altman, D. G., & Jüni, P. (2009). The effects of excluding patients from the analysis in randomised controlled trials: meta-epidemiological study. BMJ (Clinical Research Ed.), 339(sep07 1), b3244. https://doi.org/10.1136/bmj.b3244
Discretizing continuous variables Dichotimizing variables, Median split Taking a continuous variable and making it categorical without proper justification and transparent reporting.
  • P-hacking
(1) Researcher doesn’t find an association between depression and continuous age variables, and recodes age into young and old categories. After that, age groups show a significant association with depression. An independent samples t-test is reported instead of a correlation.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Compromised generalizability
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Preregister the study
  • Use original measurement levels
Yes
  • Absence of open data
  • Scale or response options in materials or methods do not match how they are reported in the results
  • Test statistics do not match expected data analysis strategy
  • Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7, 247-253. https://doi.org/10.1177/014662168300700301
  • DeCoster, J., Gallucci, M., & Iselin, A.-M. R. (2011). Best practices for using median splits, artificial categorization, and their continuous alternatives. Journal of Experimental Psychopathology, 2(2), 197–209. https://doi.org/10.5127/jep.008310
  • MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7(1), 19–40. https://doi.org/10.1037/1082-989X.7.1.19
Modifying measurements Changing the properties of a measure/measurement to produce favorable results without proper justification and/or transparent reporting.
  • P-hacking
(1) Researcher uses only a portion of the items from a longer scale. (2) Researcher combines items from different scales into a single measure. (3) Researcher chooses which EEG electrodes to aggregate based on the results.
  • Reduced replicability
  • Reduced reproducibility
  • Reduced validity of the measure
  • Inflated or deflated reliability of the measure
  • Inflated type I or type II error
  • Inflated or deflated effect size estimates
  • Describe and justify any modifications on measurements
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Publish study materials
  • Preregister the study
  • Use conventional measurements/measures
Maybe
  • Absence of open data
  • Absence of open study materials
  • Discrepancy between the reported version of measurement/measure and original or conventional measure
  • Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456-465. https://doi.org/10.1177/2515245920952393
Redefining group membership rules Post hoc (re)definition of grouping criteria without proper justification and transparent reporting.
  • P-hacking
(1) Collapsing the multicategorical variable of sexual orientation into heterosexual and non-heterosexual. (2) Trying different age ranges in cross-sectional age comparisons to maximize group differences.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Reduced reproducibility
  • Compromised generalizability
  • Report post hoc changes in grouping rules and report results using original grouping rules as well
  • Preregister the study
  • Perform blinded data analysis
  • Perform sensitivity analysis
Maybe
  • Absence of open data
  • Absence of preregistration
  • Oversimplified sample description
  • Response options in materials/methods different than reported groups
  • Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E., Bakker, M., Van Aert, R., & Van Assen, M. A. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 1832. https://doi.org/10.3389/fpsyg.2016.0183
Variable transformation fishing Selecting variable transformations that produce favorable results without proper justification and/or transparent reporting.
  • P-hacking
(1) Researcher runs a statistical test using several different transformations (e.g., changing levels of measurement, log-transformations, rescaling) of the outcome, and only reports the one that produces a significant result.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Reduced reproducibility
  • Describe and justify any variable transformations
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Preregister conditional transformations
Maybe
  • Absence of open data
  • Reported values are outside of regular range
  • Transformation is applied without justification
  • Using transformations that are unconventional for the measure
Data analysis
Choosing a poor model specification Overfitting or underfitting models, Bias-variance tradeoff Creating too complex models on too small datasets causing the model to learn the noise and random fluctuations instead of generalizable patterns. Alternatively, creating too simple models that do not adequately fit the data.
  • P-hacking
(1) Overfitting: The researcher fits a regression model with 25 predictors on a sample of 100 participants. (2) Underfitting: The researcher uses linear regression to investigate a non-linear association.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Use a theoretically justified model in confirmatory studies
  • Underfitting: Visualize data and the model
  • Use methods that prevent overfitting in exploratory research, e.g. use separate train and test datasets, use cross-validation resampling methods, use regularization or other feature selection methods
Yes
  • Improper prediction selection (e.g., no regularization)
  • No mention of holdout (or test) dataset or cross-validation
  • Overfitting: Very high R2 value (close to 1)
  • The number of included predictors in the model is large
  • The number of observations is low
  • Underfitting: Data visualization shows high model bias
  • Babyak, M. A. (2004). What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosomatic medicine, 66(3), 411-421.
Choosing unjustified p-value adjustment Not adjusting or over-adjusting p-values Not adjusting or over-adjusting p-values when running multiple tests.
  • P-hacking
A researcher decides (1) whether or not to adjust for multiple tests (e.g., in an ANOVA) and (2) which adjustment method to use, and (3) which (i.e. how many) comparisons to include depending on results obtained. (4) Researcher uses Bonferroni correction when correlating several variables, to prove that an association does not exist.
  • Inflated type I or type II error
  • Perform blinded data analysis
  • Perform sensitivity analysis
  • Preregister p-value adjustment plans
  • Preregister planned contrasts
Yes
  • Multiple tests are made that would require p-value adjustment
  • P-value adjustment not mentioned
  • Using p-value correction that is too strict (e.g., Bonferroni) without proper justification
  • Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349. https://doi.org/10.1016/S0895-4356(00)00314-0
  • Cramer, A. O., van Ravenzwaaij, D., Matzke, D., Steingroever, H., Wetzels, R., Grasman, R. P., … & Wagenmakers, E. J. (2016). Hidden multiplicity in exploratory multiway ANOVA: Prevalence and remedies. Psychonomic bulletin & review, 23(2), 640-647. https://doi.org/10.3758/s13423-015-0913-5
  • Midway, S., Robertson, M., Flinn, S., & Kaller, M. (2020). Comparing multiple comparisons: practical guidance for choosing the best multiple comparisons test. PeerJ, 8, e10387. https://doi.org/10.7717/peerj.10387.
Neglecting assumptions for statistical models Using statistical models although requirements are not met.
  • P-hacking
(1) Analyzing data, using parametric tests such as t-tests but the data requires a non-parametric test. (2) Analyzing dependent data using a statistical model that does not account for dependency.
  • Inflated type I or type II error
  • Reduced replicability
  • Reduced reproducibility
  • Perform and report necessary assumption checks
Yes
  • Evidence of assumption breaches (e.g., non-normality, non-independent data, largely different SDs by group)
  • Not reporting assumption checks
Using ad hoc covariates Selectively including control variables Addition or removal of covariates to influence the estimates or significance for the effect of interest.
  • P-hacking
(1) A researcher opportunistically decides which background variables (e.g., age, gender) to control for, without a causal theory or a preregistration. (2) A researcher decides whether to control for a baseline value in an experimental design depending on the results of statistical tests. (3) Researcher avoids the inclusion (or measurement) of theoretically justified moderators (e.g., severity of a condition, or socio-economic status) to be able to imply greater generalisability.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Reduced replicability
  • Reduced reproducibility
  • Perform blinded data analysis
  • Preregister complete models/analytical plan
  • Report robustness checks
Maybe
  • Different covariates in different analysis steps are used
  • There is a lack of justification for the selection of the covariates
  • Becker, T. E., Atinc, G., Breaugh, J. A., Carlson, K. D., Edwards, J. R., & Spector, P. E. (2016). Statistical control in correlational studies: 10 essential recommendations for organizational researchers. Journal of Organizational Behavior, 37(2), 157-167. https://doi.org/10.1002/job.2053
  • Stefan, A., & Schönbrodt, F. D. (2022, March 16). Big little lies: A compendium and simulation of p-hacking strategies. https://doi.org/10.31234/osf.io/xy2dk
  • VanderWeele, T. J. (2019). Principles of confounder selection. European journal of epidemiology, 34(3), 211-219. https://doi.org/10.1007/s10654-019-00494-6
  • Wysocki, A. C., Lawson, K. M., & Rhemtulla, M. (2022). Statistical control requires causal justification. Advances in Methods and Practices in Psychological Science, 5(2), 25152459221095823. https://doi.org/10.1177/25152459221095823.
Selecting a favorable random number generator seed Resampling lottery Trying different random seeds until getting a favorable result, potentially in combination with small number of replications
  • P-hacking
(1) A researcher keeps on bootstrapping a confidence interval (e.g., for a mediation indirect effect) with different seeds until the 95% confidence interval just excludes 0.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Reduced reproducibility
  • Only report significance if the results are robust across random seeds
  • Use a large number of replications (e.g., bootstrap samples)
  • Perform blinded data analysis
Maybe
  • P-values just below the significance threshold (usually 0.05)
Write-up
Selective test reporting Repeatedly testing a hypothesis in different ways until the desired result is found, and then selectively reporting the findings that support the desired conclusion.
  • P-hacking
  • Cherry-picking
Researcher analyzes the data using (1) multiple statistical methods (multiple t-tests, ANOVAs, different random structures in LMEMs) and/ or (2) multiple data eligibility specifications. Based on the results, they choose to present only one analysis that gives a significant result.
  • Inflated confidence in the research
  • Inflated type I or type II error
  • Reduced reproducibility
  • Reduced replicability
  • Perform blinded data analysis
  • Perform specificity curve analysis
  • Preregister data processing (e.g., missing data approach) and statistical analysis strategy
  • Report all performed hypothesis-tests
Maybe
  • Absence of preregistration
  • Arbitrary data processing steps and/or statistical methods
  • P-values just below the significance threshold (usually 0.05)
  • Chuard, P. J. C., Vrtílek, M., Head, M. L., & Jennions, M. D. (2019). Evidence that nonsignificant results are sometimes preferred: Reverse P-hacking or selective reporting? PLoS Biology, 17(1), e3000127. https://doi.org/10.1371/journal.pbio.3000127
  • Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biology, 13(3), e1002106. https://doi.org/10.1371/journal.pbio.1002106
  • Olejnik, S. F., & Algina, J. (1987). Type I error rates and power estimates of selected parametric and nonparametric tests of scale. Journal of Educational Statistics, 12(1), 45-61. https://doi.org/10.3102/10769986012001045
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
  • Wagenmakers, EJ. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review 14, 779–804. https://doi.org/10.3758/BF03194105
Hypothesizing after the results are known (HARKing) Texas sharpshooter fallacy, Post hoc ergo propter hoc Presenting a hypothesis that is based on observed results (post-hoc or a posteriori) as if it was presumed before obtaining results (a priori). None (1) Researcher claims to have predicted an unexpected result. (2) Researcher has no hypotheses originally and forms hypotheses after exploring the data and presenting the hypotheses as they had those from the beginning. (3) Researcher has a hypothesis (e.g., a mediation hypothesis) and tests it, and if the results do not confirm the hypothesis but rather indicate an alternative pattern (e.g., a moderation), the researcher claims that this is what they hypothesized all along. (4) Post-hoc directional hypotheses: The researcher presents a hypothesis as if it was unidirectional (i.e. group A’s mean is larger than group B’s, or a correlation will be positive), although the original hypothesis was bi-directional. This change will make the hypothesis test significant.
  • Inflated confidence in the research
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Reduced replicability
  • Clearly separate exploratory and confirmatory findings
  • Form hypotheses before analyzing the data
  • Perform blinded data analysis
  • Preregister confirmatory hypotheses
  • Use robust exploratory research practices (e.g. holdout dataset, cross-validation, multiverse analysis, blinded data analysis, etc.)
Maybe
  • Absence of preregistration
  • Unexplained and unconventional choices in the methods and results section
  • Andrade, C. (2021). HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data Dredging and Mining as Questionable Research Practices. The Journal of Clinical Psychiatry, 82(1). https://doi.org/10.4088/JCP.20f13804
  • Brookes, S.T., Whitley, E., Peters, T.J., Mulheran, P.A., Egger, M. & Davey Smith, G. (2001). Subgroup analysis in randomised controlled trials: Quantifying the risks of false-positives and false-negatives, Health Technology Assessment, 5(33), 1–56. https://doi.org/10.3310/hta5330
  • Kerr, N. L. (1998). HARKing: hypothesizing after the results are known. Personality and Social Psychology Review: An Official Journal of the Society for Personality and Social Psychology, Inc, 2(3), 196–217. https://doi.org/10.1207/s15327957pspr0203_4
  • Leung, K. (2011). Presenting post hoc hypotheses as a priori: Ethical and theoretical issues. Management and Organization Review, 7(3), 471–479. https://doi.org/10.1111/j.1740-8784.2011.00222.x
  • Weston, S. J., Ritchie, S. J., Rohrer, J. M., & Przybylski, A. K. (2019). Recommendations for increasing the transparency of analysis of preexisting data sets. Advances in Methods and Practices in Psychological Science, 2(3), 214–227. https://doi.org/10.1177/2515245919848684
Making unsupported conclusions Interpreting research findings or their implications in a way that is not backed by evidence. None (1) Researcher concludes that a treatment is effective for groups and contexts that were not considered in the study. (2) Researcher concludes that a treatment worked, however, the treatment effect did not differ from the effect of the control condition (or no control condition was used). (3) Researcher implies causality based on a research design or that does not allow causal inference. NA
  • Make it clear that the evidence is limited to certain contexts
  • Make sure that every interpretation is properly supported by evidence
  • Use conditional statements where evidence is weak or the researcher uses extrapolation
Yes
  • Causal claims are made without the methodology or analysis allowing causal inference
  • Results are generalized to contexts outside of the study’s scope
  • The chosen methodology and statistical analysis do not allow to answer the hypothesis
  • The statistical results do not match the conclusions
  • Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on Generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(6), 1123–1128. https://doi.org/10.1177/1745691617708630
  • Yarkoni, T. (2020). The generalizability crisis. The Behavioral and Brain Sciences, 45, e1. https://doi.org/10.1017/S0140525X20001685
Incorrect reporting of test statistics Not using statistical test reporting conventions to obscure exact results and assume that they are above or below threshold values. None (1) Researcher is ‘rounding off’ a p-value in a paper (e.g., reporting that a p-value of .054 is less or equal to .05). (2) Researcher reports p-values only and conceals test statistics. (3) Researcher reports a correlation without disclosing the degrees of freedom, number of observations, or confidence interval, so it seems like the effect is large (for example r=.70, n=15, CI=[.01; .90]). (4) Researcher does a model comparison and only reports fit statistics that are in favor of the preferred model.
  • Inflated type I or type II error
  • Reduced reproducibility
  • Adhere to reporting conventions (e.g., APA)
  • Publish data
  • Publish processing and analysis code
  • Use literate programming (e.g., RMarkdown, quarto, jupyter)
  • Work with software that supports you in producing and checking your write-up (e.g., papaja, stat-check)
Yes
  • Absence of open code
  • Absence of open data
  • Anomalies in reported statistics, e.g., test statistics are incompatible with p-values
  • Fit statistics are missing without proper explanation
  • Statistics are not reported according to conventions (e.g., three digits for p-values, reporting of df)
  • Bakker, M., & Wicherts, J. M. (2011). The (mis)reporting of statistical results in psychology journals. Behavior Research Methods, 43(3), 666–678. https://doi.org/10.3758/s13428-011-0089-5
  • Jackson, D. L., Gillaspy, J. A., & Purc-Stephenson, R. (2009). Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychological Methods, 14(1), 6–23. https://doi.org/10.1037/a0014694
  • Nuijten, Michèle B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2
  • Nuijten, Michele B., van Assen, M. A. L. M., Hartgerink, C. H. J., Epskamp, S., & Wicherts, J. M. (2017). The validity of the tool “statcheck” in discovering statistical reporting inconsistencies. In PsyArXiv. https://doi.org/10.31234/osf.io/tcxaj
Omitting important details of the scientific process Incomplete methods or results section Not reporting important details of the methodology and statistical analysis.
  • Cherry-picking
(1) Researcher omits sample characteristics, such as the sample was recruited on MTurk or participants received compensation for participation. (2) Researcher reports correlations without specifying the type, e.g., Spearman. (3) Researcher does not share study materials on request or does not report exact questionnaire items. (4) Researcher fails to mention pilot studies that were conducted to arrive at the final design.
  • Compromised generalizability
  • Reduced replicability
  • Reduced reproducibility
  • Preregister the study or written research plan before conducting the study
  • Report every important detail of the scientific process
  • Use a lab log during data collection to keep track of changes in the scientific process
Maybe
  • Absence of open study materials
  • Details that are usually shared are missing
  • Replication is not possible from published methods
  • Gernsbacher, M. A. (2018). Writing Empirical Articles: Transparency, Reproducibility, Clarity, and Memorability. Advances in Methods and Practices in Psychological Science, 1(3), 403–414. https://doi.org/10.1177/2515245918754485
  • National Academies of Sciences, Engineering, and Medicine, Policy and Global Affairs, Committee on Science, Engineering, Medicine, and Public Policy, Board on Research Data and Information, Division on Engineering and Physical Sciences, Committee on Applied and Theoretical Statistics, Board on Mathematical Sciences and Analytics, Division on Earth and Life Studies, Nuclear and Radiation Studies Board, & Division of Behavioral and Social Sciences and Education. (2019). Reproducibility and Replicability in Science. National Academies Press.
  • Wagenmakers, E.-J., Sarafoglou, A., Aarts, S., Albers, C., Algermissen, J., Bahník, Š., van Dongen, N., Hoekstra, R., Moreau, D., van Ravenzwaaij, D., Sluga, A., Stanke, F., Tendeiro, J., & Aczel, B. (2021). Seven steps toward more transparency in statistical practice. Nature Human Behaviour, 5(11), 1473–1480. https://doi.org/10.1038/s41562-021-01211-8
Selective reporting of hypotheses Cherry-picking hypotheses, Chrysalis effect, Fishing expedition Reporting hypothesis test only if it fits the researcher's expectation.
  • Cherry-picking
(1) Researcher formulates five hypotheses, of which only three are supported by the data - only these 3 get reported in the final research report (Chrysalis effect). (2) Fishing expedition - The researcher surveys college students about the outfit they are wearing and their scores on several tests which allows for many possible analyses (examining different colors, types of clothing, tests, score cutoffs, etc.). They end up reporting only a subset of findings to claim college students perform significantly better on tests when they are wearing green. See also Modifying measurement, Selective reporting of indicator variables, and Selective reporting of outcomes.
  • Inflated confidence in the research
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • If some hypotheses get left out due to the scope of the write-up be transparent about it
  • Preregister the study
  • Report all hypotheses in the write-up regardless of whether they were confirmed or not
Maybe
  • Number of hypotheses in preregistration (or dissertation) exceeds the number in the publication
  • Andrade C (2021). HARKing, cherry-picking, P-hacking, fishing expeditions, and data dredging and mining as questionable research practices. Journal of Clinical Psychiatry , 82(1), 20f13804. https://doi.org/10.4088/JCP.20f13804
  • O’Boyle, E. H., Jr, Banks, G. C., & Gonzalez-Mulé, E. (2017). The Chrysalis Effect: How ugly initial results metamorphosize into beautiful articles. Journal of Management, 43(2), 376–399. https://doi.org/10.1177/0149206314527133
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Visualizing data in a misleading way Choosing suboptimal visualizations or altering figure properties in order to exaggerate or diminish effects. None (1) Researcher truncates the y-axis so it is not starting at zero and/or does not add error bars. This makes differences seem larger and more significant than they are in reality. (2) Researcher uses arbitrary categories to present interval data on a map. (3) Researcher displays a pie chart with percentage numbers falling below or exceeding 100. NA
  • Follow best practices on how to visualize data
Yes
  • Chartjunk (e.g., 3D elements, ornaments) is present on the plot
  • In a plot y-axis is starting at an arbitrary point
  • Only summary statistics are shown without individual data points
  • Scale or response options in text do not match how they are presented in a plot
  • Statistical uncertainty (e.g., error bars) is not shown on plots
  • Visualization does not match the reported results in the text
  • Nguyen, V. T., Jung, K., & Gupta, V. (2021). Examining data visualization pitfalls in scientific publications. Visual Computing for Industry, Biomedicine, and Art, 4(1), 27. https://doi.org/10.1186/s42492-021-00092-y
  • Weissgerber, T. L., Winham, S. J., Heinzen, E. P., Milin-Lazovic, J. S., Garcia-Valencia, O., Bukumiric, Z., Savic, M. D., Garovic, V. D., & Milic, N. M. (2019). Reveal, don’t conceal: Transforming data visualization to improve transparency. Circulation, 140(18), 1506–1518. https://doi.org/10.1161/circulationaha.118.037777
  • Wilke, C. O. (2019). Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures (1st ed.). O’Reilly Media.
Citing unreliable research Citing an unreliable publication to support the study's narrative.
  • Citation engineering
(1) Researcher cites a publication that presents low-level evidence to support a claim with no reference to the study’s limitations or no reference to other studies. (2) The researcher cites a retracted paper.
  • Increasing the credibility of low-evidence research
  • Inflated credibility of statements
  • Never cite a retracted study
  • Only cite publications that properly support their claims
Yes
  • The cited publication provides no or low-quality evidence to its claims
  • The cited publication is retracted or otherwise discredited or its claims are refuted
  • American Psychological Association. (2023, November). Journal Article Reporting Standards (JARS). https://apastyle.apa.org/jars
  • Balshem, H., Helfand, M., Schünemann, H. J., Oxman, A. D., Kunz, R., Brozek, J., Vist, G. E., Falck-Ytter, Y., Meerpohl, J., Norris, S., & Guyatt, G. H. (2011). GRADE guidelines: 3. Rating the quality of evidence. Journal of Clinical Epidemiology, 64(4), 401–406. https://doi.org/10.1016/j.jclinepi.2010.07.015
  • Letrud, K., & Hernes, S. (2019). Affirmative citation bias in scientific myth debunking: A three-in-one case study. PLOS ONE, 14(9), e0222213. https://doi.org/10.1371/journal.pone.0222213
Selective reporting of indicator variables Cherry-picking indicator variables, Cherry-picking conditions/ groups Reporting only the indicator variables (or predictors, features, independent variables) that are used in analyses that produce expected results.
  • Cherry-picking
(1) Researcher reports indicators that are associated with the outcome rather than including all measured indicator variables in the results section. (2) Researcher drops one or more conditions or groups/merges two or more groups into one / splits a group into more groups than were initially planned depending on statistical results.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Reduced replicability
  • Perform blinded data analysis
  • Preregister the study
  • Report all indicators
Maybe
  • Indicators reported in Supplemental Material but not mentioned in main text
  • Measures get reported in the methods section but not in the results section
  • Number of preregistered indicators exceeds the number of indicators in publication
  • Reported mean time of participation does not match the number of reported measures
  • Gernsbacher, M. A. (2018). Writing empirical articles: Transparency, reproducibility, clarity, and memorability. Advances in Methods and Practices in Psychological Science, 1(3), 403–414. https://doi.org/10.1177/2515245918754485
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Selective reporting of outcomes Cherry-picking outcomes Reporting only the outcomes (or dependent variables) that are used in analyses that produce expected results.
  • Cherry-picking
(1) Researcher uses several scales to measure the same construct but only reports the one that produces expected results. (2) Researcher tests effectiveness of a new intervention for depression by measuring its effects on anxiety, sleep quality, and stress and only reports the outcome that shows the desired effect.
  • Inflated or deflated effect size estimates
  • Inflated type I or type II error
  • Compromised generalizability
  • Reduced replicability
  • Perform blinded data analysis
  • Preregister the study
  • Report all outcomes
Maybe
  • Measures get reported in the methods section but not in the results section
  • Number of preregistered outcomes exceeds number outcomes in publication
  • Outcomes reported in Supplemental Material but not mentioned in main text
  • Reported mean time of participation does not match number of reported measures
  • Gernsbacher, M. A. (2018). Writing Empirical Articles: Transparency, Reproducibility, Clarity, and Memorability. Advances in Methods and Practices in Psychological Science, 1(3), 403–414. https://doi.org/10.1177/2515245918754485
  • Pigott, T. D., Valentine, J. C., Polanin, J. R., Williams, R. T., & Canada, D. D. (2013). Outcome-reporting bias in education research. Educational Researcher, 42(8), 424–432. https://doi.org/10.3102/0013189X13507104
  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. https://doi.org/10.1177/0956797611417632
Using unjustified references Selectively citing works by specific researchers or journals to inflate citation metrics or boost a journal’s impact factor.
  • Citation engineering
(1) Researcher selectively cites their own publications for boosted citation metrics. (2) Citation networks: Researcher cites a colleague’s unrelated work in order to get cited in a similar way.
  • Inflated credibility of publications
  • Inflated credibility of journals
  • Cite only relevant studies
  • Provide comprehensive coverage of related scholarly literature
Yes
  • Publications are cited without relevance to the claims
  • Specific authors or journals are cited disproportionately frequently
Not disclosing deviations from preregistration Deviating from the preregistration without transparency and proper justification in the publication.
  • Misusing open science practices
(1) Researcher preregisters collecting data from a non-student sample, but ends up including students, and does not disclose this deviation. (2) Researcher preregisters a data analysis using linear regression but uses robust regression instead without reporting the discrepancy. (3) See also Example 1 in Selective reporting of hypotheses.
  • Inflated confidence in the research
  • Reduced replicability
  • Avoid vagueness in preregistration
  • Disclose and justify every divergence from the preregistration
  • Use methods that will provide robust results even when preregistration is not specific at points (e.g. blind data analysis, cross-validation)
Yes
  • Link to the preregistration in the manuscript does not work or leads to a page that cannot be accessed
  • Preregistration and published study differ on important aspects
  • Adam, D. (2019). A solution to psychology’s reproducibility problem just failed its first test. Science. https://doi.org/10.1126/science.aay1207
  • Claesen, A., Gomes, S., Tuerlinckx, F., & Vanpaemel, W. (2021). Comparing dream to reality: an assessment of adherence of the first generation of preregistered studies. Royal Society Open Science, 8(10), 211037. https://doi.org/10.1098/rsos.211037
  • Lakens, D. (2024). When and how to deviate from a preregistration. Collabra. Psychology, 10(1), 117094. https://doi.org/10.1525/collabra.117094
  • Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The pre-registration revolution. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2600–2606. https://doi.org/10.1073/pnas.1708274114
Selective citing Cherry-picking citations Avoiding to mention studies that do not support the hypothesis of the research or even those that do support the hypotheses to make the study appear more novel.
  • Citation engineering
  • Cherry-picking
(1) Researcher overly cites empirical work that supports their hypotheses and withholds citing work that did not find the effect at all or even the opposite. (2) Researcher omits other null findings to maximize the perceived value of a null finding.
  • Inflated credibility of statements
  • Inflated confidence in the research
  • Provide comprehensive coverage of related scholarly literature
Yes
  • Cited studies only point into one direction
  • Important studies and experts are missing from references
  • Systematic reviews and meta-analyses are not cited
  • Duyx, B., Urlings, M. J. E., Swaen, G. M. H., Bouter, L. M., & Zeegers, M. P. (2017a). Scientific citations favor positive results: a systematic review and meta-analysis. Journal of Clinical Epidemiology, 88, 92–101. https://doi.org/10.1016/j.jclinepi.2017.06.002
  • Gøtzsche, P. C. (2022). Citation bias: Questionable research practice or scientific misconduct? Journal of the Royal Society of Medicine, 115(1), 31–35. https://doi.org/10.1177/01410768221075881
Using irrelevant references Using citations that are not connected to the claims to increase the credibility of a statement.
  • Citation engineering
(1) Researcher supports a statement with three citations and two of them are unrelated to the statement.
  • Inflated credibility of statements
  • Cite only relevant studies
Yes
  • Publications are cited without relevance to the claims
Publication
Not linking the preregistration to the published study Creating a preregistration but not associating it with the published study. None (1) Researcher preregisters a study and after conducting the research the preregistration is not mentioned in the manuscript because of too many diversions. NA
  • Always link the preregistration to the manuscript and report discrepancies
Maybe
  • A preregistration that fits the study is findable
  • Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E., Bakker, M., Van Aert, R., & Van Assen, M. A. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in psychology, 1832. https://doi.org/10.3389/fpsyg.2016.0183
Creating multiple publications from the same study Publication overlap, Salami slicing Breaking up of research findings from the same dataset into several publications without proper justification and the disclosing of related papers.
  • Citation engineering
(1) Researcher conducts a study measuring several outcomes (or predictors) and publishes results in several papers with each paper focusing on just one outcome (or predictor), while not disclosing the other papers. (2) A study on cross-cultural differences with 20 participating labs from 20 countries results in 10 publications where in each one two countries are compared.
  • Biased effect size estimates in meta-analyses (due to non-independence of results)
  • Inflated type I or type II error due to unknown family-wise error rate
  • Preregister the publication strategy
  • Publish study results in one single publication or disclose all related papers
Maybe
  • Absence of open data
  • Description of the sample is the same over several studies by the same researcher or lab
  • Several papers exist with similar outcomes or predictors based on the same dataset by the same researcher or lab
  • The methods suggest a large study but the scope of the paper is narrow
Declaring false authorship Attribution and arrangement of authorship that does not correspond to the authors’ contributions, in order to influence the publishing process, and increase the credibility of the study.
  • Citation engineering
(1) Honorary authorship: Researcher adds a co-author who did not contribute to the manuscript. (2) Ghost authorship: The researcher excludes a co-author who significantly contributed to the project. (3) Controversial researcher writes a paper and publishes it under a pseudonym, so it seems that more than one person shares the same view.
  • Lack of deserved credit for contributing authors
  • Inflated confidence in the research based on the reputation of authors who were included in (or excluded from) the author list
  • Explicitly declare contributions to the project (e.g., CRediT taxonomy)
  • Include everyone who made a significant contribution to the project
  • Only include authors who contributed to the project
No None
Not making data accessible The datasets and/or codebooks are not made accessible to the public and/or peer-reviewers without justifiable cause.
  • Misusing open science practices
(1) Researcher doesn’t provide a publicly accessible repository link to the dataset. (2) Data repository link is accessible, but the data is not comprehensible (e.g. lacks cleaning and organization, codebook and instructions, etc), hence it is difficult or impossible to use to reproduce findings.
  • Restricted potential for secondary data analysis
  • Reduced reproducibility
  • Data should be shared based on the FAIR (findable, accessible, interoperable, and reusable) principles and legislative context of the researcher
  • If confidential and personal information makes participants identifiable, apply masking and anonymization, and then share data
  • Use synthetic datasets when original data can’t be shared
Yes
  • Data are not shared according to FAIR principles
  • No information in the publication on the availability of the data
  • Boué, S., Byrne, M., Hayes, A. W., Hoeng, J., & Peitsch, M. C. (2018). Embracing transparency through data sharing. International Journal of Toxicology, 37(6), 466–471. https://doi.org/10.1177/1091581818803880
  • Ellis, S. E., & Leek, J. T. (2018). How to share data for collaboration. The American Statistician, 72(1), 53–57. https://doi.org/10.1080/00031305.2017.1375987
  • Quintana, D. S. (2020). A synthetic dataset primer for the biobehavioural sciences to promote reproducibility and hypothesis generation. eLife, 9. https://doi.org/10.7554/eLife.53275
  • Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018. https://doi.org/10.1038/sdata.2016.18
Publishing studies selectively File drawer problem Choosing which study to publish or share based on whether the findings fit expectations.
  • Cherry-picking
(1) Researcher runs a study and finds out the results do not support their hypothesis (e.g., no significant findings). Thus the researcher does not try to publish or share the study publicly. (2) Researcher runs several studies, and publishes only those that support the hypothesis in a multi-study paper.
  • Inflated confidence in a multi-study paper
  • Publication bias
  • Only preregister on platforms that will eventually publish all preregistrations
  • Publish all studies, even when the findings do not support hypotheses
No
  • Publication bias can be estimated in meta-analysis