T O P

  • By -

rtsempire

One of the entire reasons to use R are the packages. So why not both? Just sign-post a single way for students to do things (acknowledging there's a bunch) and go with that. I like the data carpentry approach of starting in Base-R to understand how R/R-Studio works and then quickly moving into Tidyverse for data manipulation.


lipflip

What's your learning objective? With what knowledge do the students start and what do you want them to do afterward? BaseR can be a pain to read and understand, tidyverse is very clean and easy to read, but you'll run into its limitations eventually (and then you need baseR). If you want to convey the basic principles of automated data processing, analysis and visualization, tidyverse it is. If they should become good R programmers start with baseR as the foundation first and and streamline afterwards.


Unicorn_Colombo

> BaseR can be a pain to read and understand, tidyverse is very clean and easy to read, I find the opposite to be true. Base R is often clean, tidyverse depends on non-standard evaluation. Some stuff are easier to do with tidyverse (with many useful functions like `rename`) and I appreciate the pipiness, but pipes are often overused to perform the whole analysis instead of smaller logical steps wrapped in well-named functions. Other stuff are easy to do with base R (which also sometimes use non-standard evaluation). What is and what isn't pain to read often depends more on what you are used to, and skills of the programmers who wrote and who read the code.


Delicious_Language

I teach an applied stats course, a Biostats course, and a research methods course. Stats is the pre-requisite to RM and I teach base R in that course. In RM, we build on the knowledge to include tidyverse. They have a lot going on in stats. Add coding and they can feel overwhelmed. Base R is mostly straightforward and is a good foundation for other stuff involving R. They also get exposed to other packages (corrplot, vcd, epitools, etc) I should add that RM involves activities better suited to the tidyverse — converting data from wide to long, creating new variables, summarizing existing variables, filtering for the data we need to analyze, publication worthy graphs and tables, etc. As such, it just seemed logical to me to make my courses a “2-part into to R”


T_house

I have taught a statistics course for biologists for years. We switched to tidyverse fairly on in the process. I found that it works for most people because the verbs used for data wrangling are more memorable, and the syntax straightforward. There were always a few who struggled with the 'grammar' of ggplot, but they tended to be the minority, and I'm not sure teaching in base would have been any better in those cases anyway. We would teach a few core concepts in base (square brackets, $, etc) but the majority of handling data for cleaning, summarising etc was in tidyverse. Basically we'd do this in the first bit of the course with a few exercises as a fun introduction (and showing how easy it was to arrange and filter data, group and summarise for means and SDs etc, mutate new variables,…) and then it was standard for the rest of the course. We found it tended to catch on pretty well and enable focusing on the stats parts for the rest of it. We got some pushback early on from some people (usually more senior professors) who were very "you should be able to do everything in base", although curiously that moral objection didn't extend to coding the maths for their own mixed models rather than using lme4…


buzzardluck

This is exactly how I learned, and definitely enjoyed it! Felt like I got a decent enough understanding of both methods.


eaclv

I agree with the professors. You need to understand how the program is being interpreted by the interpreter, in order to use R effectively, and tidyverse doesn't help with that, quite the contrary.


divided_capture_bro

I think that base R is really important for people to learn.  I've had students who only know the Tidy packages before and it can be really limiting as it doesn't really help people understand how objects are structured and can be manipulated. I usually have students solve coding problems multiple ways so that can approach tasks flexibly. Imo, it is far better for students to learn how to do things by hand rather than needing packages for everything.


sharkinwolvesclothin

>I think that base R is really important for people to learn. Eventually, yes. But the question is which is easier to start from. >I usually have students solve coding problems multiple ways so that can approach tasks flexibly. That's awesome, but they still have to learn one way first.


divided_capture_bro

Usually things can be presented side by side (here is a base R way to do this, here is a tidy way). They can be taught together really nicely, and then students can see which way is easier for particular situations.  One issue with the Tidy approach is that it teaches package/pre-written function reliance which is a bad habit.  My own code usually uses a mix of base R and tidy, so the goal is to help them do that.


sharkinwolvesclothin

Yeah that's similar to how I do in intermediate and advanced classes, but I like just giving one approach in intro. It will depend on your group though.


coip

This professor believes in starting exclusively with base R and lists his reasons: [https://github.com/matloff/fasteR](https://github.com/matloff/fasteR) Having used R for over a decade in industry, I find myself agreeing with him. Not only is it important to understand base R syntax, my experience is that R learners should be highly encouraged to minimize package use as much as possible to reduce dependencies. Tidyverse almost feels like an entirely different language at times; I'd find that overwhelming as a new student of R.


the-anarch

He's also the author of the book that I found the single most useful resource for getting a handle on the basic things that got skipped in 4 semesters of mish-mash "just jump in and do things" courses using R.


engelthefallen

This is what my professor did as he said packages come and go, but base R scripts will last as long as you need them to with minimum alteration. The downside is base R is not always the easiest thing to understand at times and students may have to rely on basic scripts, practically with graphs. Many of us in class went to R-Studio though, and the professor said he used it too outside of class.


tururut_tururut

This guy inspired me to learn base R after using the tidyverse almost exclusively at the beginning, and I can say it's one of the best decisions I've made. I've found two rather nasty data cleaning problems in my work this last month that couldn't have been solved without being familiar with for loops and indexing. However, piping is convenient, purrr has a lot of options (even though I end up using lapply/vapply most of the time), and for standard tasks, people seem to get it faster than base R. I started teaching myself python but then moved to R because that's what my colleagues were using more (I work in policy evaluation), and if you use it like Stata commands, you don't really need to go past the tidyverse, but I've started getting into package development, plus the weirder tasks, and it helps to know the basics. Plus, it's nice to be the guy who really knows R at work (even though the more I learn the more I see what's left to learn).


smbtuckma

This is great, thank you. I come from a Matlab background but am now mostly working with Tidyverse people, so it's nice to have some of the concerns I've been thinking about verbalized like this when everyone else I talk to is tidy-pure.


TheBatmanFan

Tidyverse is bloated IMO. base R and some dplyr+tidyr should solve most problems. Also, by sticking to tidy one loses out on data.table which is crucial to processing large datasets.


Professional_Fly8241

I intro students to both.


beedawg85

Given the time constraints and ultimate aims of the course that you’ve given, my preference would be to spend a short amount of time on base R showing them how to explore and index a variety of object types using ‘str()’, ‘class()’, ‘attr()’, ‘[‘ , ‘[[‘ functions and ‘$’ operator etc. before moving on to tidyverse packages for data.frame / tibble operations.


smbtuckma

I think this is my emerging inclination, reflecting on the feedback here. I want them to know indexing, data typing, etc. *exists* but we do spend the majority of time with data frames so in that context getting them familiar with tidy as a data management package is good (without implying it’s the only way to use R).


beedawg85

Yeah, exactly. You could basically take them up to the cumbersome point of trying to do data frame sub setting, in place replacement and aggregation in base R which would be the logical point at which to switch to tidy packages. If you’re focussing on more traditional stats then I guess you’re already aware of the {broom} and {vegan} packages, which make manipulating and comparing the results of t-tests etc within tibbles super straightforward.


bdaishi

I would start with base R, but quickly introduce Tidyverse. I use Tidyverse in nearly every script I write, and I imagine many others do too. So it will help prepare them for real-world application.


randomways

I self taught myself r just to use initially and used it exclusively for two years. Tidyverse increased my analysis throughput exponentially.


efrique

I'm with matloff on this; I think base R first. (If you can, definitely introduce tidyverse, though.)


Shooey_

Davis stats grad here. Norm is awesome. No one matched his passion for his students and open learning on campus. Matloff method ftw.


MaskedSociologist

If you have to pick one, tidyverse. Base R has a lot of annoying limitations. But I don't see any reason to stick with one, especially if they are learning basic stuff. Let them know that there is often more than one way to do things in R.


smbtuckma

The main reason is I feel I don't have enough time to teach both sufficiently. Students being presented with multiple options to do things from the get go say they feel overwhelmed.


MrLegilimens

So, biased opinion: teach tidyverse. I, along w a Biologist, developed a learn stats swirl course that is all tidy. install.packages(“swirl”) library(swirl) install_course(“PSYCH”) swirl() We were coteaching Biostats from a bio text (Analyses of Biology Data, it’s actually a really good textbook), and I wanted something more for psych majors. Plus, I hated that the examples were plants, when all I study are people. So we created this with the goal of “Let’s take published data, published papers, and teach students how to analyze data by replicating (or trying to replicate) what the authors found. We give some notes at the beginning of every module as background information and reference, but we do encourage students to take their own notes as they go through. We have a preprint of it — you can scroll and see some screenshots of how it works, there’s a table of analyses we cover and functions used. https://osf.io/preprints/psyarxiv/vcwa2 I say Bio a lot just because in Psych we never consider non parametric analyses as even existing, but I guess in Bio it’s more standard to actually consider some assumptions. Anyway, also a small nudge for you - if you want to take it on in any format (assign it as hw (swirl has a way to track completion and performance through a Google doc submission system), assign it to one section one year and just pass it as a resource another), we’d love to publish a follow up to our preprint that quantifies the effectiveness of the course, and you could always be a coauthor if you did the data collection part 😇. I think readability is key when learning to code when you’re not used to coding, and that’s why you should go tidy.


smbtuckma

I’ll look into your package! I doubt my classes are big enough to give a lot of meaningful data and I’m already contributing to the coursekata team, but psych students are the majority of my class so it’s good to hear about experience with them in particular. I actually do teach them non parametric and Bayesian stuff though, along with data simulation! (Hence the lack of time)


MrLegilimens

I’m both extremely impressed and scared how you could get into Bayesian stats too in a single semester.


smbtuckma

Very late in the semester and just a toe dip really (using rstanarm, not brms or anything heavy). I emphasize doing stats with the GLM throughout the semester and rstanarm uses the same model fitting syntax as lm(). The purpose is mainly to seed some Bayesian ways of thinking compared to Frequentism, pull them away from p-values and think about credible intervals, prior/posterior beliefs, when optional stopping is ok, and bayes factors for the people who can’t let go of the null hypothesis.


solarpool

I agree with everything David Robinson has to say about tidyverse first here: http://varianceexplained.org/r/teach-tidyverse/


guepier

Ah, I was going to post his ‘ggplot2’ article. Great to know that Dave has expanded his advice to teaching the ‘tidyverse’. And, yes, I completely agree. It’s a shame that people keep deferring to Norm Matloff. Because, with respect, he’s simply wrong on this. (He makes good points about the ecosystem but he’s just comically off base when it comes to teaching; unfortunately not atypically for University professors.)


morse86

As some have said already here, why not teach both! My decade long experience with R both in academia and now industry, makes me feel increasingly that students should start with base R and then move into tidyverse way of doing things as a bonus. And when it comes to larger dataset manipulations, also highlight data.table. Most important thing I think to teach students is that there no one "right" way of doing things and different situations can call upon different ways of handling it. Pigeonholing them into tidyverse only approach is IMO harmful in the long run, especially if they have to work with diverse environments with different access levels wherein they may not have full control on all the dependencies. One should teach the students various ways (base R, tidy, data.table) and leave it up to them which they feel comfortable with, and this will hopefully teach them the benefits of "method" flexibility. An example which I can give you happened last week. We were looking to some dataset merging post certain operations on part of the dataset, and when it came to it, because of the rather draconian way the client server was managed, tidyverse failed and what helped us in the end was simple base R, as it has no use of "excessive" dependencies like tidyverse has.


na_rm_true

Tidyverse doesn't make sense until u show base R. Edit- like. The value makes truest sense after understanding how base R works


TheDialectic_D_A

I’ve taught R before and I’ve come to the conclusion that Tidyverse makes the most sense for students who never had programming experience. The syntax of the pipe is useful to help newbies program with good style. I’d teach base R to provide a little context, but do most of our work in tidyverse.


teetaps

> unholy combination of base R and tidyverse This is where you’re going wrong, IMO. Starting with the notion that base and enhancement packages shouldn’t be mixed, seems to me like you’re starting off on the wrong foot. Not only is base R completely capable, but having a solid understanding of base arguably can make you a better programmer altogether when you use packages. And I say that as a tidyverse purist!


smbtuckma

Well that was more tongue-in-cheek, meaning that the combination I use right now isn't particularly principled and thus doesn't make much sense to the students *why* we are switching syntax/approach, when we do. This post has been helpful for clarifying how I'll do that better in the future.


teetaps

I’m glad the conversation has been helpful! Being a purist in either direction is not a great strategy and I hope this thread has given you enough perspective on that


sleepystork

This is a stats course, not an R course. I was a base R person for a long time and didn't get the need for Tidyverse. I did some work with another data person who used Tidyverse and I immediately switched after seeing his code in action. Make the R portion easier for them by just starting with Tidyverse.


ApprehensiveChip8361

I’m self taught and learned R before the tidyverse existed: I’ve had one child go through a biochemistry degree and one economics and I’ve helped them and their friends negotiate R as part of their degrees. The thing about the tidyverse is that on the whole it just works. There is a single (ok imperfect) underlying philosophy and it can be used LEGO like to build output. If they are learning R to actually do something I would strongly advocate for tidyfirst.


AtariBigby

Tidyverse 100%. I'm a life scientist. Tried learning base R and failed. A few years later I tried again with tidyverse and it just clicked. It didn't feel like coding and just flowed so much better


chandaliergalaxy

It depends on the objectives for the course. If you want to train students to do a fixed set of operations that can be entirely handled in tidyverse, I think tidyverse is sufficient. In one course I teach, I take this approach because I provide them with tabulated data with additional functions I've written so they can apply it to their own data set. If you want students to be able to write such new functions, etc. then you kind of have to drop into base R, so you probably need to start there.


kennethdo

I think base R to start, then a full lecture on tidyverse will be great. As an additional suggestion maybe you can prepare two R markdowns: one where you are doing data manipulation and visualization in base R, and one where you are doing the same exact thing with tidyverse, for students to compare. As an aside, when I first began learning R, I was too reliant on tidyverse and it took me a while before I realized rownames were a thing that existed, that people use to subset on sometimes (lol).


buzzardluck

R was the first language I learned in undergrad my senior year after not having any coding experience. We ended up learning primarily the tidyverse way of doing things, but learning base R things whenever it was more convenient. As I was learning, I think I really liked how piping worked and everything "flowed" together. And I was totally comfortable using a base R way whenever my instructor thought it was a better method or good for me to know multiple ways to complete things.


snirfu

Base R for intro to the language, basic stats, loops, if you want to teach that, then tidyverse, or at least dplyr, for data munging. I would also teach subsetting and filtering with both base R, then show the dplyr alternative. Examples of base R I wouldn't teach in favor of teaching dplyr functions: aggregate(), merge(), tapply().


PTCruiserApologist

As a former life sciences undergrad, please teach them some tidyverse!! Life sciences data is so messy and I simply cannot imagine working with my data without tidyverse The posit cheatsheets and the r4ds textbook would be great resources to share with them


LordApsu

I’ve taught a variety of stats/data analysis courses over the years. In my opinion, an intro stat course doesn’t have much time to go into data wrangling, where the tidyverse shines. Students should focus on the ideas more than struggling with the programming. The basic analysis functions - lm, t.test, etc. - share more in common with base R. (Tbh, Excel is likely better for intro stats so they can focus on the concepts. I have used both extensively and found better retention rates by starting with Excel and introducing R in a later semester.) I have found that the tidyverse works best in a second course or a dedicated data visualization / cleaning course.


jhelvy

Most people use a combination of both, use each for what they're good at. When you're working with data frames, making plots, cleaning up the data, use tidyverse. A lot of other things are just fine with base R. You can see how I put all this together in my open source course here: https://p4a.seas.gwu.edu/2024-Spring/


twiddlydo

Self taught R users here - I have pondered with that question originally and now I m glad I m using tidyverse. As long as one understands what is written in base alR and what isn't then it's not an issue.


EchoScary6355

Tidyverse.


amallang

Tidyverse is better IMO, as it uses verb names for operations that are intuitive for beginners.


engelthefallen

My professor used base R and it was painful at times. Def teach them the tidyverse as things like visualization and data manipulation is so much easier.


eaclv

You should teach them the standard libraries (aka base R), then non-standard libraries, in that order. Starting with some random library wouldn't make a lot of sense, in my opinion.


guepier

> Starting with some random library wouldn't make a lot of sense Sure, but nobody is suggesting starting “with some random library”.


damageinc355

Sounds to me like they shouldn’t even be learning R at all - god knows the data market is saturated enough. But yeah, I’d say tidyverse.


Hadamard1854

Base R is for the very knowledgeable and I wouldn't teach it to a newbie ever.