
Join us in Nashville to celebrate 40 years of Stata—a milestone in trusted, reproducible statistical software. This year, the conference will be unique, with invited speakers and a special social hour celebration following day one that all participants are invited to attend. Don't miss your opportunity to learn new and exciting applications of Stata, engage with StataCorp's developers, and network with researchers from across all disciplines.
Embrace the Music City spirit as you attend the 2025 Stata Conference. From the iconic Ryman Auditorium to the city’s famous hot chicken and live music on Broadway, Nashville’s captivating blend of music, cuisine, and Southern charm promises an unforgettable experience.
You will not want to miss this opportunity to attend our workshop, Causal inference and treatment effects using Stata, on 30 July, the day prior to the conference. This three-hour session will offer a look at Stata's treatment-effect estimators, including those available via the new cate command in Stata 19. Details and registration information are included below.
We present the development and usage of the generalized model specification system (GMSS) Mata library to support the specification of regression models. This library allows users to gain access to all the options in Stata’s premier optimization routines. The library consists of classes for distributions and link functions. In addition, the library defines optimization routines callable from moptimize(). The library consists of approximately 30 distributions for which users may specify associated covariates for each of the parameters. General models (zero-inflated, zero-altered, zero-marginalized, zero-truncated, and heaped) are usable with any count distribution in the library. Users are also free to develop and add distributions and link functions. All distributions have default link functions, but users are free to specify links as well. An ado-file is available for those users who prefer Stata-language implementations. Support commands for estat and predict are also included in the library. Factor-variable specification of covariate lists is allowed, and the library automatically generates associated constraint matrices. Example usage of do-files calling the Mata routines will be presented along with usage of the ado-file specification. While there is substantial overlap with specific existing models, the GMSS library includes many new distributions for interested developers and applications.
Pre- and posttest visualizations are powerful—they can tell a story of change or lack of change within a population and help researchers inform their insights into humanity and the surrounding environment. Stata is very useful for gathering insights from pre- and posttests, and in conjunction with Excel and Tableau, one has the capacity to create pre- and posttest visualizations that are efficient and visually appealing.
To create these visualizations, I will discuss a multistep method. The use of Stata functions, such as asdoc and reshape, will be a critical component to the data analysis process. To accompany this, data manipulation and data functions, such as if-then statements, will be used in Excel from Stata output to generate a dataset that can be imported into the Tableau interface. And with the Tableau interface, visualizations can be created that can provide insights into sample sizes, statistical significance, and change all in one view.
Linear panel models and the “event-study plots” that often accompany them are popular tools for learning about policy effects. We introduce the xtevent package, which enables the construction of event-study plots following the suggestions in Freyaldenhoven et al. (Forthcoming). The package implements various procedures to estimate the underlying policy effects and allows for nonbinary policy variables and estimation adjusting for preevent trends.
Researchers often use control-function methods when traditional instrumental-variables methods lack desired flexibility. The new Stata commands cfregress and cfprobit allow for specification of control-function linear and probit models and accommodate continuous, binary, fractional, and count endogenous variables, all while returning appropriately adjusted standard errors. This presentation will give a practical introduction to control functions and show how these commands can be useful in empirical work.
If all you have are the original Stata regular expression functions in your toolkit, all text data can look like a nightmare. Even with better regular expression functions, the lack of high-quality documentation, guides, and examples makes it more difficult to use regular expressions effectively. Have you ever wondered about the differences between possessive and greedy expressions? Have you ever wondered how to use positive and negative look-ahead and look-behind functionality? Do you find some commands using regular expressions taking much longer to execute than you would like? If so, this is the talk for you.
During the talk, I will explain how regular expressions work, and what the different metacharacters, matching types, look-arounds, and different character classes do, describe how to improve the performance of your regular expressions, and describe misconceptions along the way.
Reproducibility is essential for understanding how social scientists reach their conclusions. Following recent reproducibility and transparency standards adopted by social science journals, ensuring that code is reproducible has become a priority for researchers and institutions. Reproducibility issues may arise from the use of commands that introduce unnoticed randomness, depend on system-specific settings, or create unexpected inconsistencies across code runs. To address this challenge, the Development Impact Analytics team at the World Bank has developed repscan, a Stata command designed to enhance the reproducibility of research code. Part of the repkit Stata package, repscan scans a do-file, detecting and flagging commands known to compromise reproducibility. By alerting users of potential problems—such as commands affected by uncontrolled randomness, system-dependent sortings, or unstable default behaviors—repscan allows researchers to refine their code and ensure their results can be consistently reproduced. This presentation will provide an overview of repscan’s functionality, demonstrating its application in typical coding tasks and showcasing how it can enhance reproducibility in social science research.
This study applies Stata’s quaidsce command to estimate a censored demand system for food consumption in Mexican households using data from the 2022 National Survey of Household Income and Expenditures. The high prevalence of zero expenditures across food groups presents a challenge for traditional demand estimation methods. quaidsce implements a two-step procedure that corrects selection bias, ensuring more accurate estimates of price and income elasticities. Our findings demonstrate that failing to account for censoring leads to systematic biases in elasticity estimates, with distortions increasing as the proportion of censored observations grows. By efficiently handling censored dependent variables, quaidsce enhances the reliability of demand system estimation, making it a valuable tool for researchers working with consumption and expenditure data.
Giordano and Broderick (2024) introduced infinitesimal jackknife (IJ) standard errors for Bayesian estimators (posterior means). Just like resampling standard errors, IJ standard errors are robust to model misspecification and can be adapted to account for clustering. Importantly, IJ standard errors do not require resampling but can be obtained from a single MCMC run. Standard Bayesian quantile regression, as implemented in bayes: qreg, is generally misspecified. This is because the motivation for the asymmetric Laplace (AL) likelihood is merely that its maximum coincides with the classical quantile regression estimator of Koenker and Bassett (1978). There is no reason to believe that the AL distribution is a plausible data-generating mechanism. For example, the shape of the distribution depends on the quantile you are interested in. While point estimation is consistent, credible intervals often have poor frequentist coverage. We therefore propose using IJ standard errors for Bayesian quantile regression and show, via simulations, that they have good frequentist properties, both for independent and clustered data. If made available as an option in bayes: and bayesmh, IJ standard errors may soon become as popular for Bayesian inference as the vce(robust) option for frequentist inference.
Giordano, R., and Broderick, T. 2024. The Bayesian infinitesimal jackknife for variance. arXiv:2305.06466.
Ji, Feng, Lee, JoonHo, and Rabe-Hesketh, S. 2024. Valid standard errors for Bayesian quantile regression with clustered and independent data. arXiv:2407.09772v1.
We propose a per-cluster instrumental-variables approach (PCIV) for estimating correlated random coefficient models in the presence of contemporaneous endogeneity and two-way fixed effects using Stata. Our estimator uses variation across clusters to estimate coefficients with homogeneous slopes (such as time effects) and within-cluster variation to estimate the cluster-specific heterogeneity. We aggregate cluster-specific estimates to population averages. We demonstrate consistency, showing robustness over standard estimators, and provide analytic standard errors for robust inference. Our Stata package allows for straightforward implementation. In Monte Carlo simulation, PCIV performs relatively well against pooled 2SLS and fixed-effects IV (FEIV) with a finite number of clusters or finite observations per cluster. We apply PCIV in estimating the price elasticity of gasoline demand using state fuel taxes as instrumental variables. PCIV estimation allows for greater transparency of the underlying data. It produces graphs depicting divergence in the implicit weighting when applying FEIV from the natural weights applied in PCIV and evidence of correlations between heterogeneity in the first and second stages, violating a key assumption underpinning the consistency of standard estimators. In our application, overlooking effect heterogeneity with standard estimators is consequential. Our estimated distribution of elasticities reveals significant heterogeneity and meaningful differences in estimated averages.
Cost-effectiveness analysis (CEA) is often conducted alongside a randomized clinical trial to establish whether the new therapy is likely to have a favorable value for its cost. One common approach is to estimate an incremental cost-effectiveness ratio (ICER), the marginal health benefit relative to the marginal cost, and compare the point estimate with a prespecified “willingness to pay”. Alternatively, net monetary benefit (NMB) may be used to keep benefits and costs linear and on the same scale. Costs and benefits may be modeled separately and often assume different distributions, especially for the ICER, where benefits are generally constrained to the -1 to 1 interval. CEA also involves use of graphs to assess uncertainty about the decision being made. The first, the cost-effectiveness plane, plots bootstrapped replicates of incremental cost against incremental benefits with confidence ellipses. The second, the cost-effectiveness acceptability curve (CEAC), is a plot showing the probability that a new treatment will be cost-effective at different willingness-to-pay values. In this presentation, we will introduce a new suite of Stata CEA commands. They use standard Stata command syntax to fit models and obtain the ICER or NMB and then provide comprehensive postestimation support and graphing.
Join us for a happy hour celebrating 40 years of Stata! All registered participants are welcome to attend.
There is a lot of policy interest in the issue of household access to consumer credit in low and moderate income (LMI) and so-called banking desert areas. Under the Community Reinvestment Act, bank regulators also devote a lot of resources to this issue. LMI areas are census tracts with median family income less than 80% of the relevant metro area or district. Banking deserts are counties with no bank or credit union branches. We examine access to consumer credit in these areas using a representative 5% sample of credit records and both regression discontinuity design (RDD) and matching estimators of the average treatment effect on the treated (ATT). The RDD results are local; for example, they apply close to an 80% median family income boundary for an LMI designation. The matching results apply more generally, albeit to areas with reasonable overlap in propensity scores, etc. Using both approaches, we find little support for the claim that households in LMI and banking desert areas face reduced access to consumer credit.
geoplot is a new Stata command for drawing maps from shape files and other datasets. Multiple layers of elements such as regions, borders, lakes, roads, labels, and symbols can be freely combined, and the look of elements (for example, their color) can be varied depending on the values of variables. Compared with previous solutions in Stata, geoplot provides more user convenience, more functionality, and more flexibility. In this talk, I will give an overview of the command and illustrate its use with examples.
Background: Diabetes is a growing global health concern, with its burden varying significantly across regions and populations. Accurate estimation of this burden is crucial for informing policy and resource allocation. This study employs machine learning techniques and mathematical modeling in Stata to analyze the global disease burden of diabetes using large-scale epidemiological datasets.
Method: The analysis integrates data from international health surveys and demographic databases to develop predictive equations that account for key risk factors, including socioeconomic and behavioral determinants. Machine learning algorithms are utilized to enhance predictive accuracy and identify nonlinear relationships often overlooked in traditional methods. A key focus is the application of community-contributed Stata commands to implement and validate machine learning models, including decision trees, random forests, and gradient boosting. The study also explores the integration of these approaches with classical epidemiological models to improve robustness and interpretability.
Results: Preliminary findings demonstrate the feasibility of combining machine learning with Stata’s analytical tools to provide nuanced insights into diabetes trends across diverse populations. The study highlights how Stata’s flexibility supports the application of advanced methods, offering an accessible framework for epidemiological research.
Conclusion: This presentation aims to showcase the methodological innovations applied, share insights on computational challenges, and discuss implications for future research and global health policy.
I will show how an extended two-way fixed-effects estimator can be applied when an intervention variable has more than two levels. The intervention measure may have quantitative meaning—say, a continuous treatment—or it may be discrete and take on more than two levels. Estimation and inference, and aggregating effects across a treatment cohort, can be done using standard regression commands in Stata. The regression framework allows testing for pretrends and modeling heterogeneous trends. An application to the effects of Walmart openings at the county level to retail employment will be used for illustration.
In this presentation, I will talk about professional statistical software development in Stata and the challenges of producing and supporting a statistical software package. I will share some of my experience on how to produce high-quality software, including verification, certification, and reproducibility of the results, and on how to write efficient and stable Stata code. I will also discuss some of the aspects of commercial software development such as clear and comprehensive documentation, consistent specifications, concise and transparent output, extensive error checks, and more.
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.
As empirical research grows in scale and complexity, reproducibility has become critical. Many journals now mandate the submission of code, yet researchers often lack training in writing structured, readable, and reusable code. This leads to inefficiencies, verification delays, and costly revisions.
This presentation shares the authors' experience implementing regular peer code review in a large research institution. Participants exchange, run, and provide feedback on each other’s code in progress, using structured checklists to promote consistency. Standardized feedback helps identify common coding issues and develop targeted training and tools.
This presentation discusses the motivation behind peer code review, focusing on its impact on Stata code quality, error detection, reproducibility, and collaboration. We highlight how it fosters a culture of continuous improvement, helping Stata practitioners enhance their coding practices from the start, rather than retrofitting for reproducibility at the end.
In this presentation, I provide Stata users of intermediate skill with a brief overview of five discrete skills that are crucial to learn to produce fully customizable figures with relative ease and consistency. The tips are as follows:
1. Feel free to try out custom graph wrappers and commands; but when in doubt, collapse to a dataset with one case per data point you want to visualize and use twoway.
2. The margins command has an option called saving. Use it and tip #1 to customize your coefficient plots.
3. Use loops and macros to automate your axis labels.
4. Create separate variables in string format to fully customize marker labels.
5. When you must get fancy, do not forget to use loops and good old algebra.
This presentation does not introduce any new commands or tools but rather introduces techniques users can use to get the most out of commonly used Stata commands. The presentation will include examples of figures as well as the code used to create them, and all code will be made available to conference participants.
Wednesday 30 July | 1:00–4:00 p.m.
Fisk meeting room (2nd Floor) | Renaissance Nashville Hotel
In this workshop, we discuss methods for drawing causal inferences when analyzing observational rather than experimental data. We present cross-sectional estimators for average treatment effects (ATEs) and average treatment effects on the treated (ATETs) and panel-data estimators for ATET parameters. We also introduce a new set of estimators that allow us to target parameters that go beyond population averages and instead go after individual or group treatment effects using the new cate command. The workshop will briefly cover the conceptual and theoretical underpinnings of the estimators and then illustrate how to obtain the effects of interest using Stata.
Enrique Pinzón is the Director, Econometrics and part of the statistical development team at StataCorp LLC. He teaches a variety of Stata courses and is a frequent contributor to The Stata Blog. He holds a master's degree in economics from the Universidad de los Andes and a PhD from the University of Wisconsin–Madison.
Preconference workshop
$40
Users' dinner
$50
Stata Conference attendees are invited to join us for our annual users’ dinner at The Mockingbird on Thursday, 31 July at 6:30 p.m. Enjoy a globally inspired take on American comfort food, while you network with other Stata users and Stata developers. Limited seating is available, and you must register above to attend.
The Mockingbird
121 12th Ave N
Nashville, TN 37203
(615) 741-9900
611 Commerce Street
Nashville, TN 37203
The conference venue and hotel is located at the Renaissance Nashville Hotel in downtown Nashville. The conference hotel is offering a special group rate for Stata Conference attendees staying between 30 July–2 August. There is limited availability, so book your room now to avoid missing out on the special rate.
The scientific committee is responsible for the Stata Conference program. With submissions encouraged from both new and longtime Stata users from all backgrounds, the committee will review all abstracts in developing an exciting, diverse, and informative program. We look forward to seeing you in Nashville!
Have questions about the Stata Conference? Our FAQs have you covered. Discover important details on registration, logistics, and more.
Expand all descriptions
The 2025 Stata Conference will be held between 8:15 a.m. and 4:30 p.m. on Thursday, 31 July and Friday, 1 August in Nashville, Tennessee. Everyone is also invited to join an optional preconference workshop on Wednesday, 30 July and a users' dinner on Thursday night. The venue and accommodations will be at the Renaissance Nashville Hotel in the heart of downtown Nashville.
The Stata Conference is open to users of all disciplines and experience levels, bringing together a unique mix of experts and professionals. You will hear from Stata users at the top of their fields, as well as Stata's own researchers and developers. Presentation topics will include new community-contributed commands, methods and resources for teaching with Stata, new approaches to using Stata together with other software, and much more. Anyone interested in Stata is welcome to attend.
Look forward to meeting the following StataCorp employees:
Yes! The Stata community is full of users from all disciplines, including people you may have met online but would like to meet in person. There will be breaks between sessions where you can take a moment to talk to the people around you and an open panel discussion where you can ask questions and share feedback with Stata developers.
Everyone is also invited to join an optional users' dinner Thursday night.
Want to start socializing now? Follow @Stata on X. Throughout the conference, we will be live tweeting using the conference hashtag #Stata2025.
The conference presentations will not be recorded, but proceedings and slides will be made available on this page in the following weeks after the conference.
Be the first to receive notifications regarding presentation submissions, conference agenda, registration, and lodging information for the 2025 Stata Conference.