Home  /  Stata Conferences  /  2025 Stata Conference Nashville

2025 Stata Conference

31 July–1 August 2025 | Nashville, TN

Join us in Nashville to celebrate 40 years of Stata—a milestone in trusted, reproducible statistical software. This year, the conference will be unique, with invited speakers and a special social hour celebration following day one that all participants are invited to attend. Don't miss your opportunity to learn new and exciting applications of Stata, engage with StataCorp's developers, and network with researchers from across all disciplines.

Celebrate 40 years of Stata in the Music City

Embrace the Music City spirit as you attend the 2025 Stata Conference. From the iconic Ryman Auditorium to the city’s famous hot chicken and live music on Broadway, Nashville’s captivating blend of music, cuisine, and Southern charm promises an unforgettable experience.

Preconference workshop

You will not want to miss this opportunity to attend our workshop, Causal inference and treatment effects using Stata, on 30 July, the day prior to the conference. This three-hour session will offer a look at Stata's treatment-effect estimators, including those available via the new cate command in Stata 19. Details and registration information are included below.

Invited speakers

Ben Jann

University of Bern

Drawing maps in Stata using geoplot

Sophia Rabe-Hesketh

University of California, Berkeley

Valid standard errors for misspecified Bayesian models

Jeffrey Wooldridge

Michigan State University

Difference in difference for nonbinary treatments in Stata

Program

Thursday, 31 JulyAll times Eastern Daylight Time

8:15 a.m.
Registration and continental breakfast
8:55 a.m.
Welcome + introductions
9:00 a.m.
A generalized model specification system in Mata
James Hardin, University of South Carolina
View

We present the development and usage of the generalized model specification system (GMSS) Mata library to support the specification of regression models. This library allows users to gain access to all the options in Stata’s premier optimization routines. The library consists of classes for distributions and link functions. In addition, the library defines optimization routines callable from moptimize(). The library consists of approximately 30 distributions for which users may specify associated covariates for each of the parameters. General models (zero-inflated, zero-altered, zero-marginalized, zero-truncated, and heaped) are usable with any count distribution in the library. Users are also free to develop and add distributions and link functions. All distributions have default link functions, but users are free to specify links as well. An ado-file is available for those users who prefer Stata-language implementations. Support commands for estat and predict are also included in the library. Factor-variable specification of covariate lists is allowed, and the library automatically generates associated constraint matrices. Example usage of do-files calling the Mata routines will be presented along with usage of the ado-file specification. While there is substantial overlap with specific existing models, the GMSS library includes many new distributions for interested developers and applications.

9:30 a.m.
Creating pre- and posttest visualizations with Stata, Excel, and Tableau—a dynamic approach
Sergio Cervantes, WestEd
View

Pre- and posttest visualizations are powerful—they can tell a story of change or lack of change within a population and help researchers inform their insights into humanity and the surrounding environment. Stata is very useful for gathering insights from pre- and posttests, and in conjunction with Excel and Tableau, one has the capacity to create pre- and posttest visualizations that are efficient and visually appealing.

To create these visualizations, I will discuss a multistep method. The use of Stata functions, such as asdoc and reshape, will be a critical component to the data analysis process. To accompany this, data manipulation and data functions, such as if-then statements, will be used in Excel from Stata output to generate a dataset that can be imported into the Tableau interface. And with the Tableau interface, visualizations can be created that can provide insights into sample sizes, statistical significance, and change all in one view.

9:50 a.m.
xtevent: Estimation and visualization in the linear panel event-study design
Jorge Pérez Pérez, Banco de México
Coauthors: Jesse M. Shapiro, Harvard University and NBER; Christian B. Hansen, University of Chicago Booth School of Business; Simon Freyaldenhoven, Federal Reserve Bank of Philadelphia; Constantino Carreto, Banco de México
View

Linear panel models and the “event-study plots” that often accompany them are popular tools for learning about policy effects. We introduce the xtevent package, which enables the construction of event-study plots following the suggestions in Freyaldenhoven et al. (Forthcoming). The package implements various procedures to estimate the underlying policy effects and allows for nonbinary policy variables and estimation adjusting for preevent trends.

10:10 a.m.
Break
10:40 a.m.
Control-function linear and probit models in Stata
Tom Stringham, StataCorp
View

Researchers often use control-function methods when traditional instrumental-variables methods lack desired flexibility. The new Stata commands cfregress and cfprobit allow for specification of control-function linear and probit models and accommodate continuous, binary, fractional, and count endogenous variables, all while returning appropriately adjusted standard errors. This presentation will give a practical introduction to control functions and show how these commands can be useful in empirical work.

11:40 a.m.
Boosting performance of regular expressions
Billy Buchanan, SAG Corporation
View

If all you have are the original Stata regular expression functions in your toolkit, all text data can look like a nightmare. Even with better regular expression functions, the lack of high-quality documentation, guides, and examples makes it more difficult to use regular expressions effectively. Have you ever wondered about the differences between possessive and greedy expressions? Have you ever wondered how to use positive and negative look-ahead and look-behind functionality? Do you find some commands using regular expressions taking much longer to execute than you would like? If so, this is the talk for you.

During the talk, I will explain how regular expressions work, and what the different metacharacters, matching types, look-arounds, and different character classes do, describe how to improve the performance of your regular expressions, and describe misconceptions along the way.

12:10 p.m.
Lunch (included with registration)
1:10 p.m.
Introducing repscan: Automated detection of Stata commands linked to common reproducibility failures
Luis Eduardo San Martin, The World Bank
View

Reproducibility is essential for understanding how social scientists reach their conclusions. Following recent reproducibility and transparency standards adopted by social science journals, ensuring that code is reproducible has become a priority for researchers and institutions. Reproducibility issues may arise from the use of commands that introduce unnoticed randomness, depend on system-specific settings, or create unexpected inconsistencies across code runs. To address this challenge, the Development Impact Analytics team at the World Bank has developed repscan, a Stata command designed to enhance the reproducibility of research code. Part of the repkit Stata package, repscan scans a do-file, detecting and flagging commands known to compromise reproducibility. By alerting users of potential problems—such as commands affected by uncontrolled randomness, system-dependent sortings, or unstable default behaviors—repscan allows researchers to refine their code and ensure their results can be consistently reproduced. This presentation will provide an overview of repscan’s functionality, demonstrating its application in typical coding tasks and showcasing how it can enhance reproducibility in social science research.

1:30 p.m.
Estimating censored food demand in Mexico with quaidsce
Miguel Perez, MultiON Consulting
View

This study applies Stata’s quaidsce command to estimate a censored demand system for food consumption in Mexican households using data from the 2022 National Survey of Household Income and Expenditures. The high prevalence of zero expenditures across food groups presents a challenge for traditional demand estimation methods. quaidsce implements a two-step procedure that corrects selection bias, ensuring more accurate estimates of price and income elasticities. Our findings demonstrate that failing to account for censoring leads to systematic biases in elasticity estimates, with distortions increasing as the proportion of censored observations grows. By efficiently handling censored dependent variables, quaidsce enhances the reliability of demand system estimation, making it a valuable tool for researchers working with consumption and expenditure data.

1:50 p.m.
Valid standard errors for misspecified Bayesian models
Sophia Rabe-Hesketh, University of California, Berkeley
Coauthors: Feng Ji, University of California, Berkeley; JoonHo Lee, University of Alabama
View

Giordano and Broderick (2024) introduced infinitesimal jackknife (IJ) standard errors for Bayesian estimators (posterior means). Just like resampling standard errors, IJ standard errors are robust to model misspecification and can be adapted to account for clustering. Importantly, IJ standard errors do not require resampling but can be obtained from a single MCMC run. Standard Bayesian quantile regression, as implemented in bayes: qreg, is generally misspecified. This is because the motivation for the asymmetric Laplace (AL) likelihood is merely that its maximum coincides with the classical quantile regression estimator of Koenker and Bassett (1978). There is no reason to believe that the AL distribution is a plausible data-generating mechanism. For example, the shape of the distribution depends on the quantile you are interested in. While point estimation is consistent, credible intervals often have poor frequentist coverage. We therefore propose using IJ standard errors for Bayesian quantile regression and show, via simulations, that they have good frequentist properties, both for independent and clustered data. If made available as an option in bayes: and bayesmh, IJ standard errors may soon become as popular for Bayesian inference as the vce(robust) option for frequentist inference.

Giordano, R., and Broderick, T. 2024. The Bayesian infinitesimal jackknife for variance. arXiv:2305.06466.
Ji, Feng, Lee, JoonHo, and Rabe-Hesketh, S. 2024. Valid standard errors for Bayesian quantile regression with clustered and independent data. arXiv:2407.09772v1.

2:40 p.m.
Break
3:10 p.m.
Estimating correlated random coefficient models with endogeneity
Seolah Kim, California State University, Los Angeles
Coauthor: Michael Bates, University of California, Riverside
View

We propose a per-cluster instrumental-variables approach (PCIV) for estimating correlated random coefficient models in the presence of contemporaneous endogeneity and two-way fixed effects using Stata. Our estimator uses variation across clusters to estimate coefficients with homogeneous slopes (such as time effects) and within-cluster variation to estimate the cluster-specific heterogeneity. We aggregate cluster-specific estimates to population averages. We demonstrate consistency, showing robustness over standard estimators, and provide analytic standard errors for robust inference. Our Stata package allows for straightforward implementation. In Monte Carlo simulation, PCIV performs relatively well against pooled 2SLS and fixed-effects IV (FEIV) with a finite number of clusters or finite observations per cluster. We apply PCIV in estimating the price elasticity of gasoline demand using state fuel taxes as instrumental variables. PCIV estimation allows for greater transparency of the underlying data. It produces graphs depicting divergence in the implicit weighting when applying FEIV from the natural weights applied in PCIV and evidence of correlations between heterogeneity in the first and second stages, violating a key assumption underpinning the consistency of standard estimators. In our application, overlooking effect heterogeneity with standard estimators is consequential. Our estimated distribution of elasticities reveals significant heterogeneity and meaningful differences in estimated averages.

3:40 p.m.
cea: A suite of commands for trial-based cost-effectiveness analysis
Rebecca Raciborski, US Department of Veterans Affairs, Center for Mental Healthcare and Outcomes Research
Coauthors: Rafal Raciborski, Michelin North America; Jacob T. Painter, J. Silas Williams, Chenghui Li, Jeffrey Pyne, US Department of Veterans Affairs, Center for Mental Healthcare and Outcomes Research
View

Cost-effectiveness analysis (CEA) is often conducted alongside a randomized clinical trial to establish whether the new therapy is likely to have a favorable value for its cost. One common approach is to estimate an incremental cost-effectiveness ratio (ICER), the marginal health benefit relative to the marginal cost, and compare the point estimate with a prespecified “willingness to pay”. Alternatively, net monetary benefit (NMB) may be used to keep benefits and costs linear and on the same scale. Costs and benefits may be modeled separately and often assume different distributions, especially for the ICER, where benefits are generally constrained to the -1 to 1 interval. CEA also involves use of graphs to assess uncertainty about the decision being made. The first, the cost-effectiveness plane, plots bootstrapped replicates of incremental cost against incremental benefits with confidence ellipses. The second, the cost-effectiveness acceptability curve (CEAC), is a plot showing the probability that a new treatment will be cost-effective at different willingness-to-pay values. In this presentation, we will introduce a new suite of Stata CEA commands. They use standard Stata command syntax to fit models and obtain the ICER or NMB and then provide comprehensive postestimation support and graphing.

4:15 p.m.
Social networking/mixer
View

Join us for a happy hour celebrating 40 years of Stata! All registered participants are welcome to attend.

5:30 p.m.
Adjourn
6:30 p.m.
Optional users' dinner at The Mockingbird

Friday, 1 August

Time
Session
Speaker
Abstract
8:30 a.m.
Registration and continental breakfast
9:00 a.m.
Household access to consumer credit in low- and moderate-income areas and banking deserts
Anthony Murphy, Federal Reserve Bank of Dallas
Coauthor: Dylan Ryfe, Federal Reserve Bank of Dallas
View

There is a lot of policy interest in the issue of household access to consumer credit in low and moderate income (LMI) and so-called banking desert areas. Under the Community Reinvestment Act, bank regulators also devote a lot of resources to this issue. LMI areas are census tracts with median family income less than 80% of the relevant metro area or district. Banking deserts are counties with no bank or credit union branches. We examine access to consumer credit in these areas using a representative 5% sample of credit records and both regression discontinuity design (RDD) and matching estimators of the average treatment effect on the treated (ATT). The RDD results are local; for example, they apply close to an 80% median family income boundary for an LMI designation. The matching results apply more generally, albeit to areas with reasonable overlap in propensity scores, etc. Using both approaches, we find little support for the claim that households in LMI and banking desert areas face reduced access to consumer credit.

9:30 a.m.
Drawing maps in Stata using geoplot
Ben Jann, University of Bern
View

geoplot is a new Stata command for drawing maps from shape files and other datasets. Multiple layers of elements such as regions, borders, lakes, roads, labels, and symbols can be freely combined, and the look of elements (for example, their color) can be varied depending on the values of variables. Compared with previous solutions in Stata, geoplot provides more user convenience, more functionality, and more flexibility. In this talk, I will give an overview of the command and illustrate its use with examples.

10:30 a.m.
Break
11:00 a.m.
Leveraging machine learning and advanced modeling to estimate the global disease burden of diabetes using Stata
Ali Alfalki, University of South Carolina
View

Background: Diabetes is a growing global health concern, with its burden varying significantly across regions and populations. Accurate estimation of this burden is crucial for informing policy and resource allocation. This study employs machine learning techniques and mathematical modeling in Stata to analyze the global disease burden of diabetes using large-scale epidemiological datasets.

Method: The analysis integrates data from international health surveys and demographic databases to develop predictive equations that account for key risk factors, including socioeconomic and behavioral determinants. Machine learning algorithms are utilized to enhance predictive accuracy and identify nonlinear relationships often overlooked in traditional methods. A key focus is the application of community-contributed Stata commands to implement and validate machine learning models, including decision trees, random forests, and gradient boosting. The study also explores the integration of these approaches with classical epidemiological models to improve robustness and interpretability.

Results: Preliminary findings demonstrate the feasibility of combining machine learning with Stata’s analytical tools to provide nuanced insights into diabetes trends across diverse populations. The study highlights how Stata’s flexibility supports the application of advanced methods, offering an accessible framework for epidemiological research.

Conclusion: This presentation aims to showcase the methodological innovations applied, share insights on computational challenges, and discuss implications for future research and global health policy.

11:20 a.m.
Difference in difference for nonbinary treatments in Stata
Jeffrey Wooldridge, Michigan State University
View

I will show how an extended two-way fixed-effects estimator can be applied when an intervention variable has more than two levels. The intervention measure may have quantitative meaning—say, a continuous treatment—or it may be discrete and take on more than two levels. Estimation and inference, and aggregating effects across a treatment cohort, can be done using standard regression commands in Stata. The regression framework allows testing for pretrends and modeling heterogeneous trends. An application to the effects of Walmart openings at the county level to retail employment will be used for illustration.

12:20 p.m.
Lunch (included with registration)
1:20 p.m.
Professional statistical software development: What, why, and how
Yulia Marchenko, StataCorp
View

In this presentation, I will talk about professional statistical software development in Stata and the challenges of producing and supporting a statistical software package. I will share some of my experience on how to produce high-quality software, including verification, certification, and reproducibility of the results, and on how to write efficient and stable Stata code. I will also discuss some of the aspects of commercial software development such as clear and comprehensive documentation, consistent specifications, concise and transparent output, extensive error checks, and more.

2:20 p.m.
Open panel discussion with Stata developers
View

Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.

3:20 p.m.
Break
3:50 p.m.
Peer code review: Streamlining, standardizing, and improving Stata code
Ankriti Singh, The World Bank
Coauthor: Maria Ruth Jones, The World Bank
View

As empirical research grows in scale and complexity, reproducibility has become critical. Many journals now mandate the submission of code, yet researchers often lack training in writing structured, readable, and reusable code. This leads to inefficiencies, verification delays, and costly revisions.

This presentation shares the authors' experience implementing regular peer code review in a large research institution. Participants exchange, run, and provide feedback on each other’s code in progress, using structured checklists to promote consistency. Standardized feedback helps identify common coding issues and develop targeted training and tools.

This presentation discusses the motivation behind peer code review, focusing on its impact on Stata code quality, error detection, reproducibility, and collaboration. We highlight how it fosters a culture of continuous improvement, helping Stata practitioners enhance their coding practices from the start, rather than retrofitting for reproducibility at the end.

4:10 p.m.
Five ways to get better at using twoway: Tips and tricks for fully customizing Stata figures
Brian Fitzpatrick, Gibson Consulting
View

In this presentation, I provide Stata users of intermediate skill with a brief overview of five discrete skills that are crucial to learn to produce fully customizable figures with relative ease and consistency. The tips are as follows:

1. Feel free to try out custom graph wrappers and commands; but when in doubt, collapse to a dataset with one case per data point you want to visualize and use twoway.
2. The margins command has an option called saving. Use it and tip #1 to customize your coefficient plots.
3. Use loops and macros to automate your axis labels.
4. Create separate variables in string format to fully customize marker labels.
5. When you must get fancy, do not forget to use loops and good old algebra.

This presentation does not introduce any new commands or tools but rather introduces techniques users can use to get the most out of commonly used Stata commands. The presentation will include examples of figures as well as the code used to create them, and all code will be made available to conference participants.

4:30 p.m.
Adjourn

Preconference workshop

Causal inference and treatment effects using Stata

Wednesday 30 July | 1:00–4:00 p.m.

Fisk meeting room (2nd Floor) | Renaissance Nashville Hotel

In this workshop, we discuss methods for drawing causal inferences when analyzing observational rather than experimental data. We present cross-sectional estimators for average treatment effects (ATEs) and average treatment effects on the treated (ATETs) and panel-data estimators for ATET parameters. We also introduce a new set of estimators that allow us to target parameters that go beyond population averages and instead go after individual or group treatment effects using the new cate command. The workshop will briefly cover the conceptual and theoretical underpinnings of the estimators and then illustrate how to obtain the effects of interest using Stata.


Enrique Pinzón is the Director, Econometrics and part of the statistical development team at StataCorp LLC. He teaches a variety of Stata courses and is a frequent contributor to The Stata Blog. He holds a master's degree in economics from the Universidad de los Andes and a PhD from the University of Wisconsin–Madison.

Registration

Professional

All access pass to event sessions

$195


Student

Discounted student pricing

$95

Additional events

Preconference workshop

$40

Users' dinner

$50

Add options during registration

Users' dinner

Stata Conference attendees are invited to join us for our annual users’ dinner at The Mockingbird on Thursday, 31 July at 6:30 p.m. Enjoy a globally inspired take on American comfort food, while you network with other Stata users and Stata developers. Limited seating is available, and you must register above to attend.

The Mockingbird
121 12th Ave N
Nashville, TN 37203
(615) 741-9900

Venue + accommodations

Renaissance Nashville Hotel

611 Commerce Street

Nashville, TN 37203

The conference venue and hotel is located at the Renaissance Nashville Hotel in downtown Nashville. The conference hotel is offering a special group rate for Stata Conference attendees staying between 30 July–2 August. There is limited availability, so book your room now to avoid missing out on the special rate.

Scientific committee

The scientific committee is responsible for the Stata Conference program. With submissions encouraged from both new and longtime Stata users from all backgrounds, the committee will review all abstracts in developing an exciting, diverse, and informative program. We look forward to seeing you in Nashville!

Shasha Bai

Emory University

William D. Dupont

Vanderbilt University Medical Center

Austin Nichols

Amazon

Tim Sahr

The Ohio State University

Yuya Sasaki

Vanderbilt University

Phil Schumm

University of Chicago

Margaret Stedman

Stanford University

FAQs

Have questions about the Stata Conference? Our FAQs have you covered. Discover important details on registration, logistics, and more.

Expand all descriptions

When and where will the conference be held?

The 2025 Stata Conference will be held between 8:15 a.m. and 4:30 p.m. on Thursday, 31 July and Friday, 1 August in Nashville, Tennessee. Everyone is also invited to join an optional preconference workshop on Wednesday, 30 July and a users' dinner on Thursday night. The venue and accommodations will be at the Renaissance Nashville Hotel in the heart of downtown Nashville.

Who should attend the conference?

The Stata Conference is open to users of all disciplines and experience levels, bringing together a unique mix of experts and professionals. You will hear from Stata users at the top of their fields, as well as Stata's own researchers and developers. Presentation topics will include new community-contributed commands, methods and resources for teaching with Stata, new approaches to using Stata together with other software, and much more. Anyone interested in Stata is welcome to attend.

Who will be attending the conference from StataCorp?

Look forward to meeting the following StataCorp employees:

  • Alan Riley, President
  • Chinh Nguyen, Vice President of Software Design
  • Yulia Marchenko, Vice President of Statistics and Data Science
  • Karen Strope, Vice President of Marketing
  • Hua Peng, Executive Director of Software Engineering and Data Science
  • Kristin MacDonald, Executive Director of Statistical Services
  • Enrique Pinzón, Director of Econometrics
  • Tom Stringham, Senior Econometrician and Software Developer

Will there be networking opportunities at the conference?

Yes! The Stata community is full of users from all disciplines, including people you may have met online but would like to meet in person. There will be breaks between sessions where you can take a moment to talk to the people around you and an open panel discussion where you can ask questions and share feedback with Stata developers.

Everyone is also invited to join an optional users' dinner Thursday night.

Want to start socializing now? Follow @Stata on X. Throughout the conference, we will be live tweeting using the conference hashtag #Stata2025.

Will the conference be recorded or available online?

The conference presentations will not be recorded, but proceedings and slides will be made available on this page in the following weeks after the conference.