false
Catalog
Statistics for Clinicians (On-Demand)
March16WebinarRecording
March16WebinarRecording
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hi, everybody, we see that people are joining, and we'll give you a chance to do that. Oh, yeah, so all I can all we can see is the number of attendees is that all you're seeing Lauren. Yes. Okay. Yes, we don't know who's here, but we welcome you wherever you're coming from. If you click over on the right you can see. Oh, I see the participants at the bottom panel. Yeah, and I see 66 participants or 63 attendees but I can't see who they are the names. Oh, wait, you know what, I had it. Okay. Yeah, that I couldn't that was trying to be able to see my slides. Okay, great. All right, looks like it's kind of stalled out here. Okay, 70. Do you want to start maybe so that way we can. I'm going to make my screen small so I won't be able to see anybody. Okay, we have, we have a lot to cover. So, all right, welcome everyone to today's webinar series. I'm Dr Lauren Caldwell I'm going to be moderating today's session. Today's webinar is statistics for clinicians with Dr. Jennifer Wu. Our presentation is going to be interactive so we're going to have opportunities to practice your new skills and we'll try to leave about five to 10 minutes at the end to dedicate to any additional questions. So, Dr. Wu to give a little introduction and thank you for being here today. Dr. Wu is originally from Frederick, Maryland. She earned her BA in biology from Harvard University, and her MD from the University of California at San Francisco. She trained in obstetrics and gynecology at Brigham and Women's Hospital and Massachusetts General Hospital in Boston, Massachusetts, and then completed a fellowship in female pelvic medicine and reconstructive surgery at the University of North Carolina at Chapel Hill. Concurrently, she received her master in public health and epidemiology at the UNC School of Public Health. From 2007 to 2012, Dr. Wu served on the faculty at Duke University, and then rejoined the faculty at UNC in 2013. Since then, she has served as division director for urogynecology, the interim chair from 2019 to 2020, and the senior vice chair from 2020 to 2021 in the Department of OBGYN, and the vice dean for academic affairs, currently, a role that oversees the educational enterprise and faculty affairs and leadership development for the UNC School of Medicine. Before we begin, I'd just like to review some housekeeping items. This webinar is being recorded and live streamed. Please use the Q&A feature of the Zoom webinar to ask any questions, and use the chat feature if you have any technical issues, and our OGS staff will be monitoring the chat and can assist you. So thank you, Dr. Wu, and I'll turn it over to you. Thank you so much for that kind introduction. Everybody, welcome. Today we have the exciting topic of statistics for clinicians, so thanks so much for being here, and we're going to get started. I have no disclosures. And I am, let's see, I'm going to try to figure out how I can pull everybody up. I think I lost everybody on screen. Can you all hear me? Lauren, can you just come back? I like lost. Yes, I can hear you. I cannot see slides. I think I just stopped sharing them again for a second. Okay, let me do that again. Okay, and let me swap presenter view. All right. Can you see them now? Yes. Okay, great. All right, sorry, I was trying to use my mouse to form my slides, and that wasn't working, and then I lost your whole screen. Okay, and I'm back. I have no disclosures. So the objectives for today are to really, by the end of this talk, you'll be able to understand the different types of research data, describe commonly used statistical tests, and be able to determine what statistical tests you should use to analyze your data. Now, what will be helpful for you is to have sort of two items ready. So one, if you could find your phone, because I want you to take a photo of the next slide, which I think will be really helpful as we go through this talk. And then second, if you want to have a piece of paper and a pen, sometimes I think it's helpful, we're going to have a variety of questions. If you write down what you think the answer is, really commit to an answer, and then we'll certainly go through what the correct answer will be. Okay, so just grab your phone, have a piece of paper and a pen. All right, so this is sort of a key slide. And what I was saying before everyone joined was that usually when I would give this talk in person, I would have a handout so everyone could refer to it. But now that we're doing it in a webinar format, it'll be really helpful for you to take an image of this table, and we're going to go through this multiple times. And you're going to use this to help think through what's the right test to use to analyze some of the data that we're going to talk about. All right, so take a photo. I'm going to give everybody a moment. This is, again, sort of the key. If there was going to be one slide you want to remember, it's this slide. All right, and again, it'll show up again, but I just wanted to have you to have this as a reference. Okay, so first let's go through some statistical concepts. First thing I want to talk about is sort of different kinds of data. All right, so there's nominal or what I usually say is categorical data. Male, female, race categories, insurance status, menopausal status, sort of obesity, yes, no, so different categories. Okay, then there's ordinal data or ordered or ranked categories. So an example might be stages or prolapse stages or cancer stages. So you know that stage two is worse than stage one, but better than stage three. Or you might have functional status, low, moderate, or high status as a 1, 2, 3, for example. Then you might have discrete data. So this is like integers or numbers that you can't divide. So the number of children, you can have 1, 2, 3, 4 children, but you can't have 1.2 children. Parity would be a similar thing, or even APGAR scores. There's not like two point, you know, an APGAR score of 7.6. And then there's interval or continuous data. So that's data where there's sort of equal distances between the values. So age, BMI, cost would be examples. And this will be important to understand sort of the type of data when you're trying to think through what tests should you use to analyze your data. All right, then there's going to be measures of central tendency. So we commonly know mean, which is age. We use this for, again, interval or continuous data. And there's median. So that's the value where half the subjects have values that are lower and half the subjects have values that are higher. And we often use this for ordinal data. And then there's mode. So the most commonly occurring category. And again, we usually use this for categorical data. So do you want to talk about sort of distribution of your data? And so there's symmetrical distribution where the mode and the median and mean are all the same. And then there's sort of asymmetrical or skewed data where the mode, meaning the mode, mean and median are different. So this is an example of sort of right skewed or positive skewed data. So when I'm doing a data analysis, I will kind of go through all my variables and I'll do a histogram so I can just graph it to understand the distribution of my data. It also helps me to see if I have sort of weird outliers so that I know I might want to check on those sort of outlier data. But I kind of look for the distribution. Is it symmetric? Is it asymmetrical or skewed data? And then the normal distribution. So you commonly see this distribution. Again, this is many statistical tests that we're going to talk about. Assume the normal distribution and the values are sort of based on the number of standard deviations from the mean. And we know that if you're plus or minus like 1.96 or plus or minus two standard deviations from the mean, those values would encompass about 95 percent of the values for that variable that you're looking at. OK, so normal distribution, which would be symmetric distribution. All right, so here are some general guidelines or general principles. OK, we want to think through, are we going to use a parametric test or a nonparametric test? And this is really an important concept. So we have two slides on this. So parametric statistics are used for continuous data when they're normally distributed and then in general for larger sample sizes. So you could kind of use like 30 as kind of a cutoff. OK, again, these are general guidelines. So parametric statistics, continuous data, normally distributed, larger sample sizes. You would use nonparametric statistics if you have discrete or ordinal data or skewed distribution. So my common example you'll hear me talk about is parity, right? Having one or two or three babies is more common than having 12. So this is going to be really skewed data because not many people have sort of larger numbers of babies. CCI will be Charleston Comorbidity Index. So again, commonly we use this to look at comorbidities of our patients. And it's more common to have lower number of comorbidities. So like 0, 1, 2, 3. But we don't usually get into the higher numbers or even Apgar scores. That's going to be skewed data because Apgar scores are usually higher numbers, 7, 8, 9 versus like 0, 1, 2, 3, for example. And then you would consider nonparametric statistics when the sample size is small. So, again, a general guideline would be an N less than 30, for example. For those of you think a little bit more like in a table format, again, sort of parametric versus nonparametric. You need to make a decision based on our data that we want to analyze. Again, parametric statistics, continuous data, normal distribution, larger sample sizes, nonparametric, discrete or ordinal data, skewed distribution or smaller sample sizes. And then I put a little note at the bottom. So when you're in doubt, it's more conservative to use a nonparametric test. So if you use a nonparametric test and your results are significant, it will be significant if you use a parametric test. OK, so it's more conservative to use a nonparametric test. When you're not sure, just do the nonparametric test to see if your result is significant or not. So now let's talk about some statistical tests. So this is a very practical guideline on how you figure out what test to use for your data analysis. So, one, you want to think through what kind of data are you looking at? Is it continuous, ordinal, categorical? Are you going to use a parametric test or nonparametric test? How many groups are you comparing? And are the data paired or not? Are they related? We'll go through some examples of each of these. OK, so again, this is sort of that key table. And then I will say in the chat, there's a question, will the lecture recording be available after the session, which I think it should be. And then I can let others respond to that who are in the AUGS group. Again, this is sort of a key table. So if you haven't taken a photo because you joined a little late, please take a photo of this table because it will really help you later on as we go through question and answer. And then again, if you have a piece of paper and a pen, you can commit to the answers. Let's go through this. Let's see. OK. There we go. All right. So when you have one group, let's say you've got one cohort that you're describing, you might just describe a variable. If it's continuous data like age, you might describe the mean. If it's sort of a nonparametric, let's say skewed data. Again, my classic example is parity. You might use median. And then if it's categories or different nominal categorical data, you might use proportion. What percent are white, black or other race as an example? So typically when you're describing mean, you'll see it like in a typical table one. Usually again, mean is for those with symmetric distributions. You'll describe the mean and you'll write what the standard deviation is. So you write the mean plus or minus standard deviation. If you're using median to describe your data, again, ordinal data, skewed data, you usually will say the median, which is like the 50th percentile with the interquartile range. So the 25th to the 75th percentile. Some people might use the full range, but if you see IQR, that's the interquartile range. And that's commonly how we would describe sort of the data in like table one, for example. OK, so the most common thing we do is we kind of take our, you know, when we are doing our study, we have two groups that we're comparing. These are probably the most commonly used tests when you are, again, comparing two different groups. So if it's continuous data, you're going to use a student's t-test. If it's nonparametric test, let's say skewed data, ordinal data, you're going to use Mann-Whitney U. I commonly will use, say, Mann-Whitney U. Others, like more statisticians, will say Wilcoxon rank sum. And then for categories, categorical data, you use chi-square. OK, Fisher's exact is also used in a situation where the expected number in sort of view, if you were to draw a two by two table, is really low. So I think a Fisher's exactive, there's a rare characteristic you might want to think, oh, maybe I should use Fisher's exact. And typically your statistical software program that you're using will tell you, yes, you should use Fisher's exact in the situation. So, again, student's t-test, Mann-Whitney U, chi-square. OK, so this is where we're going to start going through some examples. All right. In a randomized control trial of treatment A, 100 people in this group versus treatment B, 100 people in this group. What test would you use to compare the following between these two groups? OK, so the questions are, what would you use to compare age, the percent who've had a hysterectomy in the past, BMI and parity? So I want you to take a moment, if you do have that piece of paper and pen to write down, what do you think you would use to compare each of these variables between these two groups? You can refer back to that table that you took a picture of and think through again, age, what kind of what kind of data is this? And what would be, again, the test that I would use in the two group row? OK, let's go through these answers. So for age, what I told you is that these groups are fairly large. Right. So it's like 200 people, 100 in each group. And we know that it's continuous data. Right. So you're going to be able to use sort of that continuous parametric column. And you have two groups, so that's going to be a student's t test. For those who've already had a hysterectomy, so that's percent or proportion of had a hysterectomy in treatment A, it's 30 percent in treatment B, it's 40 percent. How do I compare that? Are they different? And then in that in that sort of situation, right, it's a category. You either had a hysterectomy or you've not had a hysterectomy. It's a yes, no categorization of that variable. You would use a chi square. Again, these are sort of common things. A lot of people have had a hysterectomy to use chi square. Again, Fisher's exact. I put a little note to remind myself to say something about Fisher's exact would be when it's sort of a rare situation. That's when you think, oh, I might be needing to use a Fisher's exact. And your sister school software program will let you know if that's the right test. Or what about BMI? OK, so BMI is like age is continuous. We have a large group. We're thinking it's like a normal distribution, a continuous symmetric distribution. You would use a t-test, student's t-test. And then the last question is parity. So I kind of hone in a lot on parity. Again, it's my sort of classic example of sort of when you would use a nonparametric test against a Mann-Whitney U. Two groups, nonparametric Mann-Whitney U. Again, this is for skewed data. The other thing that I like to emphasize is this discrete data. OK, and that's, again, we can't really divide up the parity you have. You either have one child, two children, three children, but you don't have one point four parity or one point six parity. That's not possible. And then again, some people will refer to this as Wilcoxon ring sum. OK, so I do think what I've done is intersperse a lot of questions throughout. So just have that like image available on your phone. This is the second question. So in a randomized trial of three treatments, treatment A, treatment B and treatment C, all with about 100 participants in each of these groups, what tests would you use to compare the following between these three groups? So in the last example, we had two groups. Now we have three groups. So you're going to go to your table and look at the three group row. And then you're going to think through, well, what kind of data do I have? Age, again, proportion of how to hysterectomy in the past, yes or no, and parity. It's a lot easier, I think, when you have that table in front of you, so hopefully that's helping. So, again, from that table, we're going to the three group row and we're thinking age. We know it's a lot of people. It's definitely more than 30. It's continuous data. We're going to use a one-way ANOVA. It's in the continuous column under the three group row, one-way ANOVA. All right. Hysterectomy. We know that's categorical. You either had a hysterectomy or you did not. It's a pretty common thing. So we're going to use chi-square again. And then parity. So, again, my classic example for when you would use a non-parametric test is going to be Kruskal-Wallis, three groups, non-parametric. OK, so hopefully this table, you're finding it helpful. It is hard when you're doing a webinar format. You can't see any feedback, so I'm just going to keep talking to myself. OK, so let's talk a little bit about regression. OK, this can be a confusing topic as well. So let's say we want to predict a variable from another variable. That's sort of my simple way of saying it. And the most common regressions that we see are either linear regression or logistic regression. And I think it can get confusing to figure out, well, what kind of regression should I do? OK, so the goal is to assess, again, the impact of an independent variable on an outcome or my dependent variable. Right. The type of regression that you're going to use is really based on the outcome you're trying to assess, the dependent variable. So I highlighted outcome in blue. Outcome, outcome, outcome. That is the key. All right. You're going to use a linear regression if the outcome variable, the dependent variable is continuous. And you're going to use logistic regression if, again, the outcome variable is categorical or dichotomous. So typically it's a yes, no variable, either yes, no success, yes, no cure, yes, no fail. Something that's, again, a yes, no sort of categorical outcome. Then you're going to use logistic regression. OK, so let's go through an example. What would you use to assess objective cure, yes or no, after surgery B compared to surgery A while adjusting for age, race and BMI? And again, that should be on your table. Remember, the key is the dependent variable or the outcome. Are you going to use linear regression or logistic or Cox proportional hazards regression, which we didn't even talk about yet? We will in a bit. OK, if you want to write it down or commit to an answer in your mind, we're going to use logistic regression. OK, because the key thing is that the dependent variable cure, yes or no, is what's going to determine what kind of regression analysis we're going to do. Even though age and BMI are both continuous, those are not sort of the outcome variable. What we're trying to look at is, again, how do we sort of see what's associated with cure? Yes or no, because that outcome variable or dependent variable is categorical. You're going to use logistic regression and then you can put other variables or other potential confounders in your model, whether it's categorical data or continuous data. But the key thing to figure out what kind of regression you're going to do is based on the outcome. So, again, because this outcome is cure, yes or no, it's categorical. We're going to use logistic regression and not linear regression. All right, let's get to another question. How would you compare improvement in OAB based on an OAB questionnaire that scored from 0 to 100 between two different neurostimulation devices? And you're going to adjust for age, race and BMI. Would you use linear regression, logistic regression or cost proportional hazards regression? And you can take a look at that table and think about, again, what is your outcome, the dependent variable you're trying to figure out what's associated with it? Is this first stimulation, neurostimulation device better than the second one, for example? And you're going to use linear regression because the outcome is continuous. It's OAB sort of improvement in OAB based on this questionnaire that score from 0 to 100. That's a continuous outcome. OK, and we're trying to figure out is neurostimulation device number one or number two better in improving OAB based on, again, this linear regression. I mean, this continuous variable, which is, again, OAB sort of symptomatology improvement based on scores 0 to 100. And then again, you can adjust for anything you want age, race, BMI, baseline OAB severity, comorbidities. You can adjust for anything. But again, the key thing is the outcome is continuous. So you're going to use linear regression, not logistic regression. We're going to talk about cost proportional hazards model in a little bit. OK, so, again, let's go back to that table. We want to talk a little bit about sort of two pair groups. OK, so that's when sort of there's a cohort that's the same, but you're looking before and after commonly. OK, so an example be what test should be used for comparison of PFIQ-7 scores, say from 0 to 300 before and after surgery in 200 women. OK, that's the first question. So we're thinking, okay, before and after surgery, the large group of women, okay, I'm telling you the scores are from zero to 300, sounding continuous. What test do you think you would use? Right, and so if you look at, so again, I'm giving you a clue, giving that part of the table, but it's continuous data, it's sort of paired data, and so we're gonna use a paired t-test. All right, the second question is that you were to compare the percent of women with fecal incontinence, either you do or you do not have fecal incontinence, before and after using a new drug that you wanna study. So again, when you're thinking before and after, that's sort of paired, and then you wanna say, oh, well, proportion of women with fecal incontinence, either yes or no. That's categorical data, so therefore, based on this table, I'm gonna use McNamara's as a test that I'm gonna use to compare this data, okay? So I'm hoping as we go through these examples, if you just go back to that table, and understand, okay, first I need to think through what kind of data am I analyzing? Okay, is it continuous? Do I think I might be using a nonparametric test because it's skewed data, ordinal data, or is it categorical? And then you figure out how many groups or what kind of groups you're comparing, and then based on that table, you can kind of figure out what tests you need to use. All right, so I do wanna get to survival analysis for a bit, and I'm not gonna spend a lot of time on this, but I do wanna mention it. So if you have one group and you wanna just say like, what happens to this group over time, right? So a lot of times survival analysis has anything to do with time, okay? Or oftentimes you think of sort of cancer studies where you're trying to say like, what is survival over time? So if you have one group, you can sort of plot what happens to a group over time. If you have two groups, you might plot out what their survival curve looks like, their Kaplan-Meier survival curve, but then you can use a log-rank test to compare those two curves to see if there's a statistically significant difference. And then let's say you have a lot of great data over time and you wanna, again, do some kind of regression and you wanna predict one variable from another, you could then in those situations use a Cox proportional hazards regression model. So I think of a Cox proportional hazard regression is similar to maybe like a logistic regression analysis, but with time, right? So it's a regression analysis that includes time as a variable, then I'm thinking Cox proportional hazards regression. And then usually for those, the output would be a hazards ratio or an HR versus like an odds ratio that would be the output for logistic regression. And we'll go through that in a little bit too. So again, sometimes people think of this as time to event, time to death, time to recurrent prolapse, time to recurrent stress incontinence. Some, again, these are all sort of an analysis including a measure of time. Okay, so when you're thinking about Kaplan-Meier survival curves, okay, when you look at, let's say we look at the blue line on this graph, each step down is an event, okay? So we're looking at sort of survival, then sort of each death would be like a step down. So you see in the orange, there's steps down and in the blue line, there's steps down. Then sometimes you see these little hash marks that occur and that's when someone's censored. So potentially like lost to follow up. And so you don't know what happened to them. And so you mark it. So when you see these survival curves, if you see a lot of hash marks like this, then you're like, oh, there's just a lot of loss to follow up. You don't really know what happened to those individuals. So the goal is you don't want to see a lot of those hash marks. And then each time you see a step down, that's when an event happens, okay? In this situation, you've got two different lines or two groups that we're comparing. Well, what happened to them over time? And then you have a log rank test that you can apply and it will tell you, oh, are these two curves different? And so if it's a P value greater than 0.05, we think, oh, there's no statistically significant difference between these two curves. And then remember, if you're doing a regression analysis, you want to be able to sort of assess outcomes over time and adjust for other variables. That's when you use a Cox proportional hazards model. All right, and I'm keeping track of time because I do want to leave ideally about hopefully 15, maybe 20 minutes to do some more Q&A. Okay, let's talk about statistical test results. All right, so let's say again, we have treatment A and treatment B, about 100 people per group. And I'm telling you what the mean ages are in treatment A and treatment B with their standard deviations, right? So you might put that in table one. Treatment A, treatment B, here's mean age, plus or minus standard deviation. What test would you use to compare these two groups? Okay, so this is again, going back to our table, you'd use a student's t-test, right? So it's large groups, continuous data, two groups, we're going to use students t-test. And let's say the output is, okay, well, the p-value is 0.04. So is this significant? And we'd be like, yeah, that's less than 0.05. It's a significant difference, statistically significant difference in age between these two groups. What does the p-value of 0.04 really mean? Okay, so a p-value really just measures chance. Again, I sort of highlighted that in blue and underlined it. So I just wrote a lot of different ways to think about that. It's the probability that the difference seen in the study does not exist in the broader population. The probability that the difference between the values could have occurred by chance when there really is no true difference. You could also say that's type one error, the chance of a false positive conclusion. So that's when you state that there is a difference, but this is false, okay? So if I were to interpret a p-value of 0.04, it's a 4% probability that the age difference between treatment A and treatment B occurred by chance, right? So the lower that p-value, then you're more likely to believe that's a true difference. It didn't occur by chance, it really is there, okay? And we have typically used a p-value of 0.05 to determine statistical significance. This is helpful to sort of talk through p-values. Okay, let's talk a little bit about, you know, relative risk and odds ratio. Just so you know, when you're doing a cohort study or a randomized trial, the output is typically a relative risk, and those are easier to interpret. You know, you have 2.0 times the risk of having prolapse recurrence if you are obese versus if you're not obese, as an example, okay? For odds ratios, that will sort of be your output when you're doing a case control study. Anytime you do logistic regression, remember logistic regression is when the outcome is categorical, typically a yes, no outcome. If you're doing a logistic regression analysis, the output you will get will be an odds ratio. When there's an event that's rare, okay, and that's again, typically when we do a case control study, it's a rare event, then the odds ratio will be very similar to the relative risk. However, if your outcome is really common, your odds ratio will overestimate and sort of exaggerate what the relative risk should be. So you just want to be careful sometimes if you're doing logistic regression because your outcome is a yes, no categorical variable, but the outcome is pretty common, your odds ratio might sort of really overestimate the relative risk. And the goal is we're trying to do is to sort of identify what is the risk of this outcome between these two groups? So this might be, I sometimes also give a talk where I go through examples so you can see how this actually plays out, but I don't really have time in this talk because we're focusing on sort of more, I want you to be able to apply some of the things that we were learning, but just kind of remember this again, relative risk and odds ratios for better for rare events is the output of logistic regression. And again, you usually want to use a case control study for rare events. Okay, and the reason I want to talk a little bit about relative risk and odds ratio is that, this is kind of what we see a lot in the studies that we're reading and studies that we're conducting. So for both the relative risk odds ratios, as well as those hazards ratios that come from Cox proportional hazards models, if the output, if the relative risk or odds ratio or hazard ratio is one, there's really no effect, okay? If the relative risk or odds ratio is greater than one, then you have increased risk or increased odds of that outcome. And if it's less than one, you have decreased risk or decreased odds, okay? And then what's helpful when we report the relative risk and odds ratios, we often will give a 95% confidence interval. So if the 95% confidence interval around that relative risk or odds ratio does not include one, then we know it's statistically significant at a P value of less than 0.05, okay? And then again, I made a comment here. Odds ratios are similar to relative risks for rare outcomes again, which is what we would use for a case control study or when we would use a case control study. And then the other thing that's helpful when you're looking at confidence interval is it tells you about the reliability of the estimate, okay? So 95% confidence interval is that you're 95% certain that the true value of that odds ratio or relative risk, whatever that parameter is, lies within that range of that confidence interval. Okay, so you're 95% certain that the true value is within that range, okay? Also, the confidence interval helps you with understanding the precision of the estimate, okay? The more narrow the confidence interval, the more precise the estimate, okay? So I give you two examples below. They both have an odds ratio of 4.5, but in one of them, the 95% confidence interval is 1.1 to 20. Okay, so you're 95% that the true odds ratio is anywhere between one and 20. That's not very precise, okay? Versus the other one where the odds ratio is 4.5, but the 95% confidence interval is between three and a half and 5.6. So you're pretty certain that the true odds ratio in this study is within that range, and that's much more precise. So the other thing is when I'm looking at sort of odds ratios and studies, and I see this really wide confidence interval, I'm like, ooh, that's just like not very precise. You don't know where the true odds ratio is. Okay, so let's go to another question. You conduct a logistic regression analysis to evaluate the association of a constipation with prolapse recurrence after surgery while adjusting for age and BMI. Which of the following measures of association would be appropriate for this type of analysis, and is statistically significant at a p-value of less than 0.05? Okay, so take a moment to go through each of these examples. Okay. I will say one thing that's hard is that it's hard to know how much time people need. Can't see anyone like look up. Okay. So if you said B, you are correct. So let's think through this. Okay, so first of all, the first clue is logistic regression analysis. Okay, so then you first think, ah, the output for logistic regression is an odds ratio, not a relative risk. So I can eliminate numbers or letters C and D, okay? Then we're doing this logistic regression analysis to again evaluate if someone has constipation, what is the risk of prolapse recurrence, yes or no? Okay, so remember because it's a categorical yes or no outcome, that's why we're doing logistic regression. We're adjusting for age and BMI. But then we want to think what is statistically significant? Which one of these odds ratios is statistically significant? And in A, the key is again, the 95% confidence interval, it includes one. So that's not statistically significant. But B, where the odds ratio is 2.1, the confidence interval does not include one. It's 1.2 to three. That is statistically significant at a P value less than 0.05. Okay, I'm going to talk a little bit about sample size and then we're going to go into our Q&A. My goal is to have about 20 minutes for Q&A. Okay, sample size can be challenging because you have to talk about type one and type two error. And I'm like not great with this. So I always try to give people different ways to think through this, okay? So type one is your alpha. It's the probability of incorrectly rejecting the null hypothesis, okay? So other ways to think about that is the probability of detecting a difference when there really is not one, when none exists. Some people like to think about this as the false positive. You state there's a difference, but that is false. And oftentimes we'll set the alpha 0.05. Again, you want the alpha, the type one error to be low because you don't want to state that there is a difference if there really isn't one. That's a bad mistake to make, right? So you want that percentage, that alpha to be really low. You don't want to do this. You don't want to say there's a difference when there really isn't one. Okay, then there's type two or beta. This is a type two error. The probability of not detecting a difference when there's one that exists. Okay, so this is often called the false negative. State that there's no difference, but this is false. Oftentimes we'll set the beta at 10% or 20%. And this is higher than the alpha because this isn't as bad a mistake, okay? So this isn't as bad as saying, oh, there's not a difference when there is one. You just miss saying that there's a difference, but you really don't want to do a sort of type one error, a type two error. You can accept a little higher percentage. And then how does that relate to sample size? Okay, so let's talk about this. Ideally in a study, you want it to be large enough to have a high probability, which is the power of detecting a statistically significant sort of difference that's also clinically important, okay? So you need to have some data inputs to figure out what should the sample size of your study be, right? So you're conceptualizing a study. You want to figure out, is this going to be feasible or not? You want to know, well, what do you think the outcome is going to be in each of, let's say, the two groups you're comparing? Okay, so you would base this on literature that exists right now. What do you know is clinically relevant? You usually go to literature and say, well, what are the estimated outcomes you have in group A and group B, okay? Oftentimes, again, we'll set the alpha point of five. Power is one minus beta. So again, this is the probability of detecting a difference when one really exists. So you want a lot of power. So at least 80%, sometimes even 90%. And then if you have a continuous outcome, you also need to know the standard deviation, okay? If you don't have a continuous outcome, then you don't need to know the standard deviation. I'm going to give you an example, and this will come up a little bit more in the Q&A, but let's say I'm comparing antarapellar cure, okay, for your anterovaginal prolapse. Let's say I say it's 60%. I've got this new surgery, and I think that the cure is going to be 80%, okay? So this is sort of what I think the outcome is going to be in my two groups. I set the alpha point of five. I want 80% power to detect a difference if it's there. And I'm going to say, I'm going to have equal numbers in my anterior repair group and in my new surgery group. And then you would use a software program to figure out, well, what should the sample size be? Common one can be an Epi Info, which you can download from the CDC. And let's say the output is, again, you would need 91 per group. I will say, I also put in here in this talk a way to kind of write that out, because sometimes I feel like I read a paper and I'm like, I still can't tell how they came up with their sample size. So in this example, you could phrase this as, in order to detect a difference between a cure rate of 60% for anterior repair and 80% for this new surgery, we would need 91 women per group at an alpha point of five and power of 80%. And again, you can go to Epi Info to help with figuring out sample sizes. Okay, so I put some references here. I just want to highlight sort of the last two references. It's in a journal that you may not know of the Southern Medical Association. And I will say, I will credit Matt Barber for sending this to me. They're really good statistical papers, manuscripts for the non-statistician part one and part two. And that's really helpful, as well as the other references you see up here. Okay, so remember, this is sort of the key table, which I hope you took a photo of, because that'll help us with the rest of the Q&A. And then we'll go to the Q&A session. So I think this will still stay in a webinar format. And if I put, I'm gonna have Lauren kind of be like my main person to help with Q&A as we go through this. And then also, if you want to put some things in the chat, and I would say, if this is recorded, I don't know if we can record the chat, but I'm also happy if I can't get to everybody, because we have a whole bunch of questions to go through. And someone said, can I go back to the table for a second? Yes. So again, for those of you who came in later and didn't have a chance to take a picture of this table, I think this is sort of the key table. And so take an image of this, because we have a bunch of questions to go through, which I think is gonna help everyone really solidify some of these concepts. And then hopefully we'll also have some time to go through some more Q&A. Okay. Sorry, I see a bunch of questions in the chat. I'm like, what else to get through questions? So Lauren, you can help me. Okay, question one. And maybe some of your questions will be answered as we go through this. Okay. There are 300 patients with OAB who are randomized to one of three groups, drug A, drug B, or placebo for 12 weeks. At 12 weeks, we're looking at the number of urgent continence episodes per day in each group. And I listed them here. Which of the following tests should be used to compare the average number, again, of UUI episodes daily at 12 weeks between the three groups? So again, you're gonna look at your table, think through how many groups, okay? And then that's gonna figure out your row. And then you're gonna think about, okay, I'm looking at number of UUI episodes per day. And what kind of data is that? What column should I use? And that should help you think through, what should the test be? Answer number one. So we're gonna go to this part of the table again, three groups. And again, number of urgent continence episodes daily. And hopefully you got the response of one way ANOVA. Okay. So we're looking at a continuous variable between the three groups, right? And you could kind of see like, they're like, you know, something 0.4. You know, they're really continuous data. There's not like categories. And we've got three groups. It's a one-way ANOVA. Okay. Question two. And then also if I'm going too fast, if someone can put that in the chat and Lauren, can you monitor that for me and I can slow it down. It is hard to not sort of, again, see reactions from people and to know how much time people need. Okay. 300 patients with OAB were randomized to one of three groups again, drug A, drug B or placebo. Okay. The different variables at 12 weeks, the proportion of patients with dry mouth was the following 34%, 28%, 15%. So dry mouth, yes or no. Which of the following tests should be used to compare the proportion of patients with dry mouth in each group? So again, we've got three groups. We're in that three group row. You're looking at your table and you're thinking, okay, dry mouth, yes or no. What kind of data is that? Again, we're looking at the table, three groups. Doesn't sound continuous, right? It's categorical. It's a yes or no variable. You either have dry mouth or you don't. So the answer should be chi-square. Again, because we're looking at categorical variables among three groups. Okay. Question three. In a prospective cohort study, comparing a new stress incontinence surgery to the mid-urethra sling, the investigators want to assess if the two cohorts are similar or not. What tests should be used to compare age, parity, smoking status measured as yes or no smoker, and proportion on prednisone? And my clue is rare statistic. Okay, so you're gonna think through again, how many groups, figure out what row you're in, and you've got four different variables that you're gonna consider. So I'm gonna give you some time for this one. Okay. I'm just highlighting the row again. I mean, this table just comes up 5 million times. Okay, so two groups. We're gonna think through each of these variables to figure out like which of these tests in this row. We know it's two groups. Okay. So it's gonna be one of these tests. Let's think through what we're gonna use. Oh, goodness. Sorry, I didn't like bring them down in a sequence. Okay, so for age, right? We know that that was our classic example from the beginning. It's continuous data. So we're gonna use a student's t-test. Parity, right? So the thing about parity is skewed. There are a lot of people who have one or two or three babies, but not that many with 12, right? So it's kind of like looking like this. Skewed data, we're gonna use Mann-Whitney-Yew. Again, sometimes people will call it Wilcoxon ring sum. Okay, then the next one is, are you a smoker? I said smoking status was yes or no. You either are a smoker or you're not. Okay, or it could be current smoker, former smoker, you know, past smoker, something like that, or figuring out ways that you could do it. Never, never smoking. So current, past, never. Even if you had three groups, right? You would still use chi-square for that. So smoking status, yes, no. Even if it was sort of three different categories of smoking status, it would still be chi-square. And then my sort of clue, right, for prednisone is a rare characteristic. So the thing is, is that that's sort of your gut. Anything that's kind of rare, you should think, oh gosh, maybe it's not chi-square, maybe it's Fisher's exact. You'll use a statistical software program that should tell you. But sort of the other way that you would know is if you actually sort of wrote out the two-by-two table, you would have to figure out what the expected value is in one of those cells. And if the expected value is less than five, that's when you use a Fisher's exact. But no one's going to sit and do that, like write out a two-by-two table, figure out what the expected number is going to be. And so I usually think of it as if something's really rare, that's just like, you know, draw your attention to maybe it's not going to be chi-square, maybe it's going to be Fisher's exact. And then again, your program would say, hey, there's a little asterisk, like think of it, use a Fisher's exact in that situation. Okay. So anything rare, that's when you want to think about it. Okay. Then this is what happens, right? So we were going through all these tests, but we still never talk about like, if you're putting together your paper and you have to like put these data in table one, right? So I've got, you know, group one, group two, two columns from a two groups. Okay. And I want to compare my first line, row is age, then I've got parody, then I've got smoking status and I've got prednisone use, right? How would you actually present these data in table one? Okay. So my clue is that you might use mean, you might use media and you might use proportion or percent. What are you going to use for each of these variables? And I'll give you a moment here again. So on that table, remember again, it'll help you think through, okay, what would be the right measure of central tendency to use? Mean, median, would it just use a percent? Okay. Let's go through that. Okay. So for age, we know it's continuous. You're going to use mean plus or minus standard deviation. And then a reminder, you're going to use your student's t-test to compare, oh, are the age, the mean age is different in these two groups. Okay. Parody, my classic example, again, it's again, discrete data. You can't have 1.4 kits, right? You can't really have mean kits, but you could report the median. So let's say the median is one and the interquartile range is zero to three or zero to two or whatever that might be, right? So median is the 50th percentile, interquartile range is the 25th to the 75th. Again, sometimes people might put the full range in zero to 12, you know, parody. And then you'd use Mann-Whitney U to compare parody between these two groups. Smoking status would be a proportion, right? So I'm saying smoking status, yes, no, you smoke or you do not smoke. So it's going to be, you know, 10% compared to 15%. Prednisone use, we think that that's rare. So it's going to be like 0.5% to like 1% or something like that. And then you use Fisher-Exact to compare, is there a difference between the proportion who use prednisone in these two groups? Okay. So that's why you want to think about not only the test that you're going to use, but how are you going to present that data? What's the right way to present it, right? It does drive me kind of crazy when I see like 2.2 parity. I was like, there can't be 2.2 kids. Okay. That's sort of like one of my pet peeves. All right. Question four. Okay. Let's say you're interested in studying whether route of hysterectomy, laparoscopic or vaginal is associated with vaginal cuff dehiscence. What would be the best study design to address this question? I know this, I didn't give my study design talk here, but it kind of gave some clues to what the right answer might be. Okay. And so think about is the outcome is a common or not common. Okay. Case control study, right? So whenever you have a really rare outcome, you want to think about a case control study because again, vaginal cuff dehiscence is rare. You know, think about it. If you were doing a cohort study and a cohort study, you would basically have a bunch of people, either it could be retrospective or perspective. You have a bunch of people who had laparoscopic hysterectomy, a bunch of people with vaginal hysterectomy, and then you have to follow them in time to see who gets a vaginal cuff dehiscence. And because it's so rare, if you did this prospectively, you might have to do hundreds and hundreds of, you know, hysterectomies to follow over many, many years to see if there's a cuff dehiscence. Okay. Similarly, if you want to do this retrospectively, you would go back in time and you look at all the laparoscopic hysterectomy, all the vag hysterectomy, and then see who had a vaginal cuff dehiscence. But you're going to have to go through a lot of charts. Okay. The difference, if you did a case control study, what you do is you'd be like, let me find all the cuff dehiscences. Those are my cases. Then you have to figure out, well, what's the right group to compare with your controls? And then, and then you would have some, you know, again, your variable interest is the right hysterectomy. You'd say, well, gosh, are there more cuff dehiscences than those who had laparoscopic or vaginal hysterectomies? Okay. And you could also adjust for potential confounders. But the key thing is if you have a really rare outcome, you want to start with those cases because they're easier to identify versus if you had to follow people over time to develop that outcome, it would take you a really long time. It might not be a really feasible study. All right. The classic example, right, is mesothelioma with asbestos. That's sort of the classic example of case control studies. Okay. But how this relates to what we've been talking about, statistical tests, is, oh goodness, with my photos here, I can't read, for this case control study, comparing the risk of vaginal cuff dehiscence after a laparoscopic versus vaginal hysterectomy, which statement is or are true regarding the analysis? So I'm going to let you read through these or I'll read them and then you can think about it. Relative risk should be used as a measure of association. Odds ratio should be used as a measure of association. And again, there could be one or more statements that are true. When controlling for potential confounders, such as age, smoking, diabetes, linear regression analysis should be used. Or when controlling for potential confounders, such as age, smoking, diabetes, logistic regression analysis should be used. Okay. So the first question is, should you use relative risk or odds ratio in this, in this situation, this study, which is a case control study. And then again, what kind of regression analysis could you use if this, if these statements are true or not? Okay. Let's talk through these. Okay. So if you go back to remember that relative risk odds ratio table, if you're doing a case control study, you should use an odds ratio. Okay. Right. So for case control study, that's when there's a rare outcome and then odds ratio should be used as measure of association. The great thing when it's a rare outcome is that the odds ratio really is sort of more similar and estimates the relative risk, which is what you're trying to get at. Okay. And then because the outcome is sort of a yes, no, you're going to either have a cuff to histones or not. You're going to actually think about a logistic regression, right? So it's not a, it's not a continuous outcome, right? If the continuous outcome were, you know, I'm trying to think through something with history and I think pain scores from zero to 100 or something like that. Right. And that was the outcome. Then you do linear regression, but because the outcome is either a cuff to histones yes or no, you would think logistic regression. And oftentimes with case control studies, it'll be a special kind of logistic regression, conditional logistic regression. But again, the key thing is focusing on is the outcome a categorical yes, no outcome. If so, it's logistic regression. If it's a continuous outcome, that's when you use linear regression. And I will say somehow that's really confusing because when you think about, oh, I'm controlling for all these things and age is continuous. But again, the key thing is what is the outcome variable? That's how you figure out what regression you're going to use. Okay. I'm focusing on the time. Okay. Question six, what measures of association are significant to a P value of less than 0.05. And again, I threw in that HR, that hazards ratio for Cox proportional hazards model. Think of it as just like an odds ratio relative risk when I'm thinking about what's significant. Remember key thing here is, does it include one? I'll give you some time to look through each of these. I'm doing it with you. Okay. Let's go through each of these. So in an A, right, we look at the 95% confidence interval. It includes one between 0.6 and seven. So that is not correct. It's, you know, not statistically significant. In B, we have an odds ratio of 0.5 and the confidence interval is all less than one. So it doesn't include one. So that is significant. C, relative risk of 1.2. And again, the confidence interval does not include one. It's 1.18 to 1.22. In D, the confidence interval does include one. So that is not the right answer. It is not significant to P value of less than 0.05. And they've got the hazards ratio, I'd say that's six and the confidence interval does not include one. So that is statistically significant, right? So the key thing in interpreting these is that if the 95% confidence interval does not cross or include one, then you know that again, it is statistically significant to a P value of less than 0.05. Okay. So a lot of times when reporting things, sometimes people don't even, some people report a P value with the odds ratio and the 95% confidence interval. You don't really need to do that. You can just report the 95% confidence interval because that should tell you that it's significant to less to a P value of less than 0.05. Okay. And then which of the 95% confidence intervals is the most precise? Right. So remember the 95% confidence interval tells you I'm 95% certain that the true odds ratio relative risk lies within this range. And you want to have a really good sense of where that, what that range is and the tighter it is, the more precise it is, right? So as you're looking at this answer, number C is really precise. So I gave you an example of a relative risk of 1.2 and the confidence intervals 1.18 to 1.22. So that's really narrow. So you know that there's 95% confidence that that true relative risk lies within that range, which is really, really tight. It's really narrow. Okay. Versus if you look at example E, the hazards ratio of six, but it could be as low as 1.3 up to 13. That's a really wide range, right? So it's just not as precise. So you're often looking for, again, a more precise confidence interval, not just sort of what the absolute number is for the relative risk or odds ratio or hazards ratio. Okay. Question seven, I'm going to start speak faster. Okay. So this is about sample size for women with greater than stage two anterior vaginal prolapse. Again, this was that example we gave before you're going to be randomized to either your anterior pair or this new amazing surgery. We think that the anterior peer cure rate is going to be 60%. And you think that the new surgery is going to be better and that the cure rate will be 80%. Okay. So what additional data are needed to calculate a sample size for your study? Again, this is a new surgery you're thinking about. What else do you need to know? Alpha power, standard deviation, both alpha and power, or all of the above? Okay. So the answer is A and B. You need alpha, you need power. You do not need standard deviation. That's only relevant if it's a continuous variable, tenuous outcome, for example. Okay. So you just need alpha and power. Commonly alpha is 0.05, power is 80%. Sometimes you can even push it to 90%, which would be great. But you also want to think about what's feasible to do your study. Okay. And then I put in some wording here again of how you would word that. Remember also you can change the ratio. So here, the original example was a one-to-one ratio, the same number of people in anterior pair to the same number of the new surgery. But sometimes, let's say I really want more people to get the new surgery, I can make it two-to-one. So you can change the ratio and then that will also, in your program that you're using, help figure out, well, how many people do you need in each group? Okay. Question eight. Sorry, just two more questions. Let's say that again, the same example. Sorry, Dr. Wu, I think it didn't advance. So it's still on question seven. Okay. Let's see. That's so weird. Okay. So question seven, do you see question eight at all? Not yet. Okay. Let me do this. I'm going to stop sharing and try to share again. Do you see question eight? Yes. Okay. That was so odd. Right in the middle. Okay. Let's say again, same idea, anterior pair versus new surgery, but the primary outcome is that quality of life at three months based on a scale with scores from zero to 100. And let's say 100 is the highest quality of life. The minimally important difference is 10. And prior data shows that quality of life at three months after an anterior pair is 70. Okay. What additional data are needed to calculate a sample size to detect a minimally important difference? And I'm going to go through this quickly because I know that we're, are you seeing the answer advance? Okay. So in this case, the key is that you're going to need all of them. You need alpha, you need power. Okay. But you also, because it's a continuous outcome, you know what the standard deviation is for this quality of life scale. Okay. And then I just have one last question. And I wrote in here how to write that out in case you want an example for your paper. Okay. Question nine. All right. So in the prior example, sample size was 91 per group to detect a difference between 60 and 80% between anterior pair and this new surgery and alpha 0.05 and power of 80%. So let's think through what would happen to the sample size if I want a power of 90% instead of 80%. Does the sample size increase or does it decrease? Okay. So I want even more power to detect a difference if there's one that truly exists. What do we think? Okay. It sounds like you're going to need more because you need more power. Okay. So it increases, but only until 119 from 91. So that's not, that could be pretty feasible to do. And what about if you want the alpha to go from 5% down to 1%, what would happen there? Okay. Increase again and to 131. So that might still also be feasible. All right. The last question is, let's say you think the anterior pair is actually better and it's like a 70% sort of result versus 80% cure for the new surgery. So a smaller difference instead of the difference between 60 and 80%. You're now trying to say like, is there a difference between 70 and 80%? So it's a smaller difference to show with an alpha 0.05 and power of 80. It's going to increase again. You're going to need more people to detect the smaller difference, but how much, a lot more. Okay. So to 313. So it's a lot more people. So if you're doing your sample size estimates before you plan your surgery, you're going to say like, gosh, that might not be feasible, but it might be feasible to go a power of 90%. So you always want to think through as you're planning a study, what is the sample size you're going to need? Because you want to say, is this a feasible study to even do, especially let's say, if you're like, you know, in your third year of fellowship, you're trying to think through, can I get this done in a year? Maybe, maybe not. And so you want to think through that. Okay. Recap. I hope that you went, these objectives have been met. You understand different types of data. You know more about commonly used statistics and that you can figure out of these statistical tests, what you should use to analyze your data. We talked about how to pick a test. The table is the key. This table here, if you want to take a picture, I do think that this should be available as a webinar format. I don't know if there's a way to share slides for those of you who are attended, I'm happy to share these slides. But this is sort of the key table. And then I think again, going through examples to really force yourself to choose, or when you're actually analyzing your study, for those of you who are doing your research, use this table to help think through, oh, okay, that's when I'm going to use this test, because it really kind of walks you through the groups and then think about the data, and then you can figure out the tests that you need. Okay. And then that is it, which I'm only one minute late, but I know I haven't gone to any Q&A. So Lauren, tell me what I should do. If there's some key questions or themes that have come up or things that people need to see? Absolutely. I think that the primary, I think we can hit a lot of questions by maybe pulling up the odds ratio versus the relative risk table for folks to look at. There are a couple of questions asking if a linear regression would be used for relative risk when we talked about the logistic regression for the odds ratio. So would linear regression be relative risk? So no. So linear regression will have a different kind of output. So in order to get the relative risk, you have to use sort of special kinds of models. So sometimes you might have a cohort study, and ideally you want the relative risk to be the output, but we commonly will see a logistic regression analysis done for an outcome that's a yes, no. There are like generalized linear models that will give you a relative risk as an output, but that's not what we commonly will do. So I will say that's just sort of one of the caveats is we tend to see a lot of logistic regression because we have an outcome that's like a yes, no cure, for example, even in a cohort study. And the issue will be is that if your outcome is really common, your odds ratio in that logistic regression might be really high. When if you actually calculate the relative risk without adjusting for anything, it's much lower. So it is kind of one of these weird things, but the linear regression output is not a relative risk. It usually is a little more, you get like these beta coefficients, it's a little more complicated, but that's why that is one issue. So you want to ideally have a relative risk in your cohort studies, and you can just calculate what's the relative risk of, you know, surgery A and B compared to treatment, cure, yes or no, and get the relative risk. You just can't adjust for that at that point. So you can see what the relative risk is, and then commonly we end up doing logistic regression. So it's just good to know about sort of what some of the caveats are of logistic regression outputs of odds ratios for common outcomes. So my quick answer. Perfect. I think one of the other just clarifying points were distinguishing continuous variables versus discrete, looking at like POPQ measurements, even though you might measure those at 0.5 centimeters, looking at a visual numerical scale or two examples. Okay. Yeah. So I think numerical scale is zero to 10. If it's like, you can only have zero through 10, like that really isn't as contagious thing, but like age, it's like, you know, zero to a lot, sort of the range is much wider. Right. So, and then I think if you have a scale, you also want to kind of see what the distribution's like, right. If it's really skewed, because there's a lot of people with low pain scores from zero to 10, for example, a visual analog scale, then you really probably still want to think about a non-parametric test, because again, your data is really skewed and it only goes up to 10. I mean, I still think of like, even parity is an example or Charleston comorbidity. It's like, again, they're really skewed. They don't go all the way to like a hundred. Those are different because they are discrete. You can't like break up a parent, you know, a child. But I think of even when the scale only goes to 10 versus let's say a hundred, I think of 10 being like, okay, you might still want to think about non-parametric tests. And again, if your non-parametric test is significant, then you know, it's, you know, it's significant because the parametric test will also be significant. So I think if you're not sure just go with the non-parametric test, it's more conservative. And what was the other question you said about POPQ? Measuring the POPQ in 0.5 centimeter increments, if that would change the type of variable. Yeah, that's a good question because I think, again, if you think about POPQ, you might have a pretty wide range, but it's not really as continuous as you think of like age being, right? Because even let's say if you're looking at, you know, AA, it's like minus three to three, right? So you really are kind of restricted to this smaller group. You know, so I would not think of it as fully continuous. And again, you know, depending on kind of what cohort you're studying, you might have kind of skewed data in that too. So I think you can never go wrong with using non-parametric. And again, if it's significant, then your parametric test will also be significant. Okay. I know we're a little bit over, so I don't know if we can get to all the other questions that are there, but. I don't know if, I mean, I'm happy to see, oh gosh, I'm looking at the chat. I can't even get through all these. Okay, I just want to say like. Oh yeah, go ahead. Oh, I, if people also have questions, I'll type in my email. You can also email me afterwards. I don't know if that chat goes out to everybody. Can everyone see that? It's jennifer underscore w u at med.unc.edu. Feel free. You can also email me with questions because I know I haven't had a chance to get to everybody. I do think this webinar should be available. I'm also happy to share slides. And I don't know what else I can say. I'm here to learn about statistics because it's really hard to make statistics interesting, but I hope it was practical. I hope that you feel more confident in your ability to analyze your data. And I am happy to have you email me and ask me questions, or hopefully I'll see you at Augs, that you can come up and ask me questions there too. It's fine. All right. Yes, that was really interesting and so helpful. So thank you so much on behalf of Augs. I just want to thank Dr. Wu again and everyone for joining us today. For a full list of the upcoming webinars, you can visit the Augs website to sign up. So thank you all and have a great evening. Yeah, thanks. Again, thanks for coming to statistics, which I was lamenting. I was like, this is not the most exciting topic. All right. Thanks everybody for being here. Appreciate it. And I don't know if I can.
Video Summary
In the video, Dr. Jennifer Wu discusses various statistical tests and regression analyses used for different types of data. She emphasizes the importance of understanding the type of data being analyzed (continuous, ordinal, or categorical) and the number of groups being compared. For example, a student's t-test is used for comparing continuous data between two groups, while a Mann-Whitney U test is used for comparing skewed or ordinal data between two groups. For more than two groups, a one-way ANOVA or Kruskal-Wallis test can be used, depending on the distribution of data. Dr. Wu also explains the concept of p-values and their interpretation. A p-value measures the probability that the observed difference between groups occurred by chance. A lower p-value indicates a higher likelihood that the difference is not due to chance, leading to the conclusion of statistical significance. The commonly used threshold for statistical significance is a p-value of 0.05 or less. In addition, Dr. Wu discusses regression analyses, including linear regression for continuous outcomes and logistic regression for categorical outcomes. She explains the interpretation of odds ratios and relative risks in logistic regression and cohort studies. Furthermore, she introduces survival analysis, including Kaplan-Meier survival curves and the log-rank test for comparing survival between groups. Overall, the video provides an overview of statistical tests and regression analyses used for different types of data, as well as an explanation of p-values and interpretation of results. No specific video credits were mentioned.<br /><br />The webinar covered various topics including different types of data, commonly used statistical tests, and how to choose the appropriate test for analyzing data. The webinar also discussed the significance of confidence intervals and how they indicate whether a result is statistically significant. Additionally, sample size calculations were discussed, highlighting the importance of considering factors such as alpha and power when determining sample size. Overall, the webinar aimed to improve participants' understanding of statistical concepts and provide practical tips for analyzing data. No specific video credits were mentioned.
Keywords
statistical tests
regression analyses
continuous data
ordinal data
categorical data
p-values
statistical significance
linear regression
logistic regression
sample size calculations
×
Please select your language
1
English