Skip to Main Content

StatsChats: Discussion Topic of the week

2025 Spring-Fall

Sep 2. Welcome to the New Semester This week we'll be introducing StatsChats to new members & planning the rest of the semester. 

Sep 9 Open discussion day. Lili Antonelli ask about conducting an analysis on small dataset. The group suggested different techniques and complementing her data with additional studies (meta analysis). 

Sep 16 Today was an open discussion day and Nicole Wagoner took advantage to ask ideas about how to perform a nested analysis on geotermal features in her study site. 

Sep 23 Open discussion day. Ari Partore showed us some of her recent work. 

Sep 30 Open discussion day

Oct 7: Steve, a graduate student from EECB received feedback in her experimental design and regression analysis. 

Oct 14 : Code club.  Today we have special guests: The CodeClub, a self-organized group of graduate students in Environmental Sciences aiming to review and provide feedback to each other’s research code. They will introduce their group to us and we hope to find common ground between statschats and their code club.

Oct 21: Connor from the Department of Anthropology will lead the session. He will present his work on integrating paleoclimate and archaeology and is seeking feedback on methodological strategies for combining these fields in his research.

Oct 28: Open discussion day. Lilli took advantage to show a group project in that explores divergences between tooth anatomical differences between regions across the world. 

Nov 4: TBD

Nov 11: Veterans Day - No discussion

Nov 18: Sean ONeil

Nov 25: Thanks giving week, no discussion this week. 

Dec 2 : Rachel Kozloski is requesting feedback on modeling spatial relationships between microplastics concentrations and waste management practices in the Mekong river. 

 

-----------------Upcoming meetings------------------

Jan 29: Welcome to the New Semester

Proposed discussion topics for the semester:

  • Research Design: Key insights to know before data collection, common sampling flaws, and mistakes.
  • Data Entry Tips: Ensuring data is ready for standardization.
  • Techniques for Cleaning Datasets.
  • Inputting Data: Best practices and common pitfalls. (Or avoiding it!)
  • Exploring AI Tools: Focus on Large Language Models like ChatGPT.
  • Tips to wrangle BIG data

Feb 5: Open Discussion

  • No specific topic. Feel free to drop in with any questions or discussion topics for the group.

Feb 12: Data collection and entry tips: How can we ensure data is optimized for later use during analysis? 

Feb 19: No StatsChats. See you at the Data Science Conference Here's the link

Feb 26: Uriel Cholula Rivera: Non-invasive evaluation of alfalfa root traits under water stress using ground penetrating radar". The main objective is to examine the use of ground penetrating radar (GPR) as a non-invasive, time efficient technique for quantifying alfalfa root traits (such as diameter and biomass) under varying levels of water stress. I fitted a simple linear regression model for each root parameter as a function of amplitude separated by irrigation treatment. I would like to receive some feedback about the approach that I used to do the analyses.

Mar 5: Open Discussion

  • No specific topic. Feel free to drop in with any questions or discussion topics for the group.

Mar 12: TBD

Mar 19: TBD

Mar 26: Spring Break - No Discussion this week

Apr 2: Open Discussion

  • No specific topic. Feel free to drop in with any questions or discussion topics for the group.

Apr 9: Lillian Antonelli

Apr 16: Jordan Zabrecky: "I have some algal and cyanobacteria microbiome community relative abundance data that was collected weekly to biweekly for three months across three rivers. I am interested in how to best analyze this data across time and rivers, and with regard to other parameters such as nutrient and cyanotoxin concentrations. I have started out with NMDS but not sure if there is a better option to consider." 

Apr 23: Ari Pastore 

Apr 30: Possible last meeting of the semester will be on Apr 23rd, The 30 could happen if there are takers. 

Spring & Fall 2024

-----------------Upcoming meetings------------------

Sep 3: Welcome! This week we'll be introducing StatsChats to new members & planning the rest of the semester. 

Sep 10: Miscellaneous discussions on experimental design, discrepancies in ranked data analysis and climbing survey data.

Sep 17: Mark Kolwyck: This week, Mark would like to discuss this: "I have a dataset from a fisheries dietary project conducted in Lake Washington, WA. These data include net depth, soak time, species caught, species length, lat. and long.,  etc; I do not have the diet data. As I do not have a hypothesis for these data, I was postulating what utility I could glean from them, maybe some 3D visualization."  

Sep 24:  Brian Folt will seeks input on a pending project:

I am toward the tail-end of an analysis and I am trying to get all the details ‘just right’. My analysis involves estimating population growth rates for wild horse populations across the western US. This information will be used by Bureau of Land Management (BLM) staff to plan where and when horse population management occurs. To do this, I used information from the BLM describing horse population size and uncertainty through time (population estimates from their own internal analyses). I am doing ‘statistics on statistics’ and have built a Bayesian state-space model in JAGS that (1) estimates realized population growth rates over the last 8 years, (2) estimates significant effects on population growth, (3) makes predictions for what population growth might be in the absence of management, and (4) compares realized population growth and predicted population growth with no management to infer what change in population growth was due to management in recent years. Things I could use help understanding: 

- ‘Goodness of fit’ testing / posterior predictive checks  does anyone have suggestions on this?
- How to calculate percent changes of population growth rate  seems like a simple process but it makes my brain hurt a little bit
- Changing the model from normal to lognormal to better accommodate the information I am using as data

I have a few slides to illustrate where I am at and then hope we can have a discussion. Thank you for your consideration.

Oct 1: This week we'll have two discussion topics.

Nicole Wagoner: "I am working on modeling geothermal favorability over the Great Basin region. I have done some work at the regional scale but now I am investigating potential sub-regions using PCA and k-means clustering. I am then using loading values from that analysis to assess which datasets are key to differentiating the clusters. The goal is to try to use any identified differences to evaluate if different models are needed to model geothermal favorability for the different sub-regions. I am curious what you all think about this approach, if you have any other ideas/suggestions, or suggestions for visualizations." 

Marco Donoso, Claire Williams: " Our research investigates how pre-wildfire fuel reduction treatments influence the resilience of plant communities following wildfire events. While modeling our data to address this question, we encountered convergence issues using the lme function, which prevent us from including data prior to 2008. We are seeking guidance on whether to switch to a different modeling approach—such as lm or lmer—or if adjusting the convergence settings within lme is advisable to effectively utilize our full dataset."

Oct 8: Tracy Shane

Oct 15: Deandre L Presswood. Dre is working on a project exploring how stream network expansion and contraction affect solute fluxes in a watershed (Dog Valley near Verdi). With 20 monitoring sites, Dre aims to quantify the influence of different subwatersheds on solute fluxes (tributary leverage).

The main question is: What controls the timing, magnitude, and composition of solute fluxes as streams wet up and dry down?

Dre is seeking advice on statistical models that account for spatial autocorrelation, nested sites, and the challenges of data loss as streams dry.

Oct 22: Zinat Ara

Oct 29: Tracy Shane

Nov 5: Julian Cardona

Nov 12: Jeff Falke Session for NRES 715 Design of Environmental Research Projects 

Nov 19: Ashley Thompson, working with categorical and survey data 

Nov 26: TBD

Dec 3: TBD

 

------------------------Past meetings---------------------------

April 18. Tracey Shane's question is in regards to short time series field data: "I have two treatments and four years of data. I could do a repeated measures ANOVA, but there might be linear mixed model or alternative Bayesian approach that might work better. The data I have collected include categorical cover data, plant height, density, biomass, and diameter x2. There is also likely spatial autocorrelation for some of the datasets. One first attempt at categorizing biomass spatial autocorrelation showed that my stratified sampling protocol within plot, and the spatially-balanced random sampling method between plots resulted in no spatial autocorrelation in my biomass sampling for one site. I have all these data collected for four sites."

April 11. No formal topic. 

April 4. No formal topic. Tracy Shane requested feedback on fitting models to fill gaps in satellite imagery. 

Mar 21: No formal topic. 

Mar 28: No formal topic. 

Mar 14: Hold- Halina K North. Halina’s research looks at the effects of tree mortality on understory plant phenology across different microsite types in treeline environments in the Central Sierra Nevada. She has two years of field plant phenological observations as well as a variety of topographical and climatic variables. She is looking for input and feedback on variable selection and model construction to create a strong model to predict plant phenology.

Mar 7: UNR Data Science Conference (No discussion this day, see you all there)

Feb 29: Kierstin Acuna & Ellen McMullen. Have you ever worried that the code you wrote is not really doing what you think it is? Do you want to increase transparency and replicability in science? You should join our new code club for student-to-student peer review of code in EECB and related life sciences! We were inspired by the paper in this link, and wanted to create a group to facilitate code-checks among students. At StatsChats this week, we'll be discussing our new club, as well as data reproducibility!"

Feb 22: Ally Fitts. Ally is interested in comparing snow surface reflectance across scales of observation. Ally has three types of spectral data, including ground (field spectrometer), airborne (AVIRIS-NG), and satellite (Landsat 8/9 OLI Surface Reflectance). She would like to discuss the best statistical method to compare the spectral data across scales of observation. She would also like to gain insight into other’s experiences with comparing remote sensing data to ground observations. 

Feb 15: No topic this week

Feb 8: Cassandra Hui's focuses on how social interactions can be mediated by the effects of artificial light at night on behavior, physiology, and gene expression. Cassandra has three types of data with four treatment conditions at four time points throughout the day. She would like to discuss the best statistics for her data and some ideas for how to look at the interactions between different data types. 

Background:
4 treatment groups: isolated ALAN, isolated dark, social ALAN, social dark
6 birds per time point per treatment group (24 per treatment group)
4 times of day 
Bird_Id (birds in isolation can be looked at, at the bird level; birds in social condition can be looked at, at the cage level)

Current model is nesting everything within Bird_Id, which is accurate for the isolated birds but not for the social birds. (scrap current analysis)

Some ideas:

•    Pull out chunks of time and do a basic group comparison to see if the isolated/social groups differ in terms of activity
•    Think about which subgroups should have big differences depending on time, condition, etc.
•    Low vs high (e.g., BMAL levels) can serve as its own control, rather than using the control birds
•    Separate by time point to look at patterns (e.g., melatonin/cry1 graphs) to reduce the noise in the data
•    Be more lenient with p-values since you’re working with small sample sizes per group and some of the analyses will be more exploratory in nature
•    First use subsets of data that can feasibly answer a question, before lumping everything together
•    For the 4 plots, try coding the dots by something else, like time of day or sex (instead of one color for everyone in the same treatment group). This might show you some visual cues of different groupings or trends in the data, separated by treatment group. 
•    Keep in mind rule of thumb: one analysis per question
•    Great experimental design!
 

 

Spring & Fall 2023

Here are our weekly discussion topics:

Oct 31

Tory Taylor (TBD)

Oct 24

Wade Lieurance is working with qualitative body condition scores from a large camera trap dataset of feral horses. He is looking for advice on developing correction factors for the multinomial detection probabilities of the scores, and tying those correction factors back to a random variable modeling underlying health.

Oct 3rd

Anson Call is trying to understand what factors affect the distribution of endosymbiotic bacteria in aphids from across the western US. The list of potentially important factors is long and varied, and spatial structure is likely. He would like to explore some potential options for analyses, and get some advice on how to choose among many potentially important variables.

Oct 10th

Rachel Kozloski is working with microplastic count/concentration data and land use patterns. She is looking for advice on analysis methods comparing seasonal variation at individual locations, analysis of spatial variation between locations, and evaluating the relative contributions of certain land use parameters to microplastic concentrations. 

Oct 17th

Ally Coconis is modeling the factors associated with the abundance of two woodrat (Neotoma) species across a single mountain range in eastern Nevada. The survey sites are variable in size and survey effort and are spread across an elevational gradient. She would like to get advice on modeling methods, given the dataset, and the results of the models already run.

Sep 5th

StatsChats meets again this Fall '23  We'll discuss the activities for the semester and introduce our group to new members. We'll start populating our discussion calendar. 

Sep 12th

Maddie Lohman Today’s discussion will be led by Maddie Lohman, who is using wildlife demographic models that estimate both apparent survival and harvest mortality. However, in particular instances when sample sizes are small, parameter estimates add up to more than one (when in reality survival + natural mortality + harvest mortality = 1). They have some solutions to work around this problem, but would like input from others to see if there are additional techniques that could avoid this issue.

RQ: How do demographic rates vary over time and space in the prairie pothole region?

Harvest mortality + natural mortality + survival = 1, so harvest mortality + survival should be <1

Problem: models are adding up to >1

Current solution: tighten priors

Other options/considerations:

  • Does the model exclude the possibility of two events happening at the same time (dies naturally and is harvested)?
  • Model notation might need to be fixed
  • Kappa not currently being constrained in the multinomial pr
    • How to constrain kappa with respect to everything else in the pr
  • f term doesn’t come into play in the pr until the year of being recovered, how does it make sense to only have that accounted for, for the one year that a duck was recovered (not accounting for years that it survived)?
  • Be careful about how things are being added up in the model.

Sep 19th

Lauren Sankovitch is working on dating ancient/paleo record materials using Uranium-Thorium dating methods. She has done preliminary U-Th analysis that estimated ages plus 2 sigma uncertainty. She needs to do a weighted average of several sets of these ages as she dated several samples multiple times. She would like to have input in her analysis and double check it.

Issue 1: USGS data has 1-sigma uncertainty and needs to be 2-sigma

Issue 2: Rutgers data are for corals and need ratios converted, as well as uncertainties, to match the USGS data. Ratios need to be inverted.

How to generate weighted average with ages with 2sigma uncertainty, preserving the 2sigma?

Are activity ratios and uncertainties properly converted?

Questions/Suggestions

Using another database for proxy values?

            -but there is very little historical data associated with what Lauren is studying

Look “under the hood” at means/variances of the variables before transforming anything.

Delta method to get the weighted means

Can go back and forth doubling/halfing whatever sigma is to figure out the weighted means correctly

Sep 26th

Nina Miller is using high-precision GPS data to analyze surface deformation profiles across the northern and central Walker Lane, a transition zone separating the Basin and Range province from the Sierra Nevada. She has two competing non-nested and non-linear models that appear to explain the data equally well, based on root mean square error. She is seeking guidance in navigating the model selection process to help her identify the preferred model.

*********

 

January 30th

Welcome! In this week we'll be celebrating the returning of StatsChats and planning the rest of the semester. 

February 6

Martin Genova will discuss his plans to publish his undergraduate research project. His work involves generating regression analysis of environmental factors influencing the establishments of native and invasive species in tussocks and floating islands. Martin has done preliminary analysis and visualizations in R, and wants to have feedback on that. 

February 13

Please mind the change of location for this time only!!!!!!: DeLaMare Library basement (Data & GIS Depot)

Uriel Cholula-Rivera will talk about the results of two years of a deficit irrigation experiment in alfalfa. The main objective was to assess the effects of deficit irrigation on alfalfa dry yield, crop water productivity, and quality. The statistical analysis was done following two approaches: considering the seasonal dry yield and crop water productivity and considering the cuts as repeated measures.

February 27

(Postponed, snow day)

March 6

Mia Kirk will present her research examining the relationship between gambling and substance use in Nevada. She is using data from the Nevada Behavioral Risk Factor Surveillance System, an annual state-wide public health survey. She has done preliminary analysis in R and would like feedback on next steps. 

March 13

Brian Morra has data that were collected to measure the impact of postfire seeding on soil properties. The data have 5 levels of nesting and are replicated through time. He has some guesses about how to handle these data, but it would be helpful to get some input from others about defining models, and interpreting interactions.

March 20 Spring Break (No discussion)

March 27

Tory Taylor will present two two streams of her research. One looks at how the quality of online surveys (e.g., whether items contain ambiguity, scales are validated, formatting issues exist, etc.) affects responses. The second tests whether a follower's satisfaction with the organization (perceived organizational support, perceived opportunity to learn and grow, and perceived fit with the organization) mediates the effect of supervisor undermining on quiet quitting (normative commitment and organizational citizenship behavior.) She would like feedback on experimental design and statistical analysis. 

April 3

Describing statistical models is important for facilitating journal reviews, collaborator discussions, reproducibility, and increases the audience that will be able to provide feedback on statistical methods. However, communicating models is often omitted or replaced with descriptions of computer code. Perry Williams will be discussing a progression of regression models, beginning with simple linear regression, and continuing through multiple linear regression, mixed effects linear regression, and if time permits, generalized linear regression models. Perry will focus on discussing the models first, and then using computer code to obtain the estimates (fill in the gaps) of the statistical models.

April 10

Bryant Nagelson analyses the factors that contributed to old-growth giant sequoia mortality within the Mountain Home grove during the Castle Fire in 2020. He is considering using a random forest approach to rank the importance of tree-level characteristics, topographic variables, and management history classifications in determining survival of trees during the fire. He'd like feedback on his analyses. 

April 10

Manuel Rodriguez will lead our discussion.  This is his prompt: "The dataset I'm working on is the result of my previous research, and I'm trying to reanalyze it. The goal was to understand how vegetation composition and structure changed under contrasting levels of reindeer grazing pressure, and permafrost presence/absence. Back in 2020 I used generalized linear mixed models for my data, but when I brought this into the Research Design class and  they suggested a hierarchical bayesian model. As I have honestly not understood a single thing about those, I wanted to come in to see if I could get some clarification on why one would be better than the other, and how to set the model up in R, eventually."

April 24

Kelly Loria would like feedback on her time series analysis. Kelly is excited about using any sort of state space model for this type of time series analysis, but she’s new to this state-space-model concept. So there could be a more appropriate tool to use here. She plans to show slides on question rationale and data structure in case a different model framework suits this sort of exploratory analysis, and she would really love feedback on model statements, either for a MARSS model or any other time or type of Bayesian autoregressive model.