Letโs talk about version control and collaboration today and one of its powerful tools: git โจ
Using Git can be a lifesaver (and it has often been one in the past for me ๐). Itโs basically like a mini time travel machine that you use - it allows you to have version control of your work progress. But unlike Dropbox or other tools, it does not automatically save the status quo of your work but requires you to do it actively with commits and pushes. A typical workflow looks like this:
Alternative text
Image showing a git workflow from the working directory to the remote repo. Working directory โ Staging area โ local repo โ remote repo and also common git commands (git add code.R, git commit -m “Update”, git push, git pull, git checkout, git merge)
RStudio has a nice GUI that allows you to do everything without writing code - but if you need to remember some commands, itโs most likely git add, git commit, git push, git pull, and git status (to check if you have uncommitted files) ๐
Here’s what the typical workflow can look like in action:
Alternative text
GIF showing the commands git add, git commit, git push, git pull in a sequential order
You start with your local repository on your own machine, work on your code and do some changes. Now the #git workflow starts ๐ซ
git add
: Once you made some changes, this command lets you add them to the staging area (this is an essential step before committing them and tells git that these are the files you want to commit in your next commit) ๐git commit
: Once you made some changes, this allows you to “commit” them and to “version control” them in git. I talked to many people and I couldnโt find a best practice on how often you should send commits. I like to think of them as a status report or a (small) milestone to which you may want to return to. So I try to send a commit once a (thematic) step is reached.git push
: If you hit this command, you will push one (or more) commits to the remote repositorygit pull
: This is usually one of the first commands I execute - it pulls changes from others and makes sure that youโre working on the most current version ๐git status
: This command allows you to check if you have still some uncommitted changes in files ๐ต๐ผBut there are many more commands out there! When I get lost, I usually find myself here looking things up at Atlassian ๐ฉ๐ผโ๐ป
You have probably also heard of branches and merges in Git โ this is an excellent way to collaborate with others. The GIF shows how you start working from the main branch (this is where all the changes should eventually end up and where your final product lives). Each dot shows a new commit that is pushed:
Alternative text
GIF showing how a feature branch evolves from a main branch and is then guided back (merged) into the main branch
Once you want to make changes (like integrating a new function in your package) you start a new feature branch. The feature branch eventually goes back to the main branch (this is what we call “merging”). The cool thing is that you can somewhat work independently from your colleagues or collaborators on individual tasks because they can start their own feature branch. Merging back feature branches (in the best case) requires a code review - you can also do this on GitHub and I’m a big fan of it because it makes you a better programmer step-by-step and allows sharing knowledge.
I learned it the hard way but it’s best if feature branches don’t get too long and complicated because it easily becomes hard to review them ๐ค
If you want to visualize it yourself, here’s a slide deck ๐ฉ๐ผโ๐ซ that explains the workflows and more.
If you connect your local repository with a global repository (for instance on GitHub), youโll be able to store it also in the cloud and access it from everywhere. Setting up this connection is extremely easy - also using different IDEs. The steps are similar but I’ll go into more detail for VS Code and RStudio (or Posit, as it will be called as of October 2022).
Alternative text
Visualization showing a typical workflow when using GitHub in RStudio with a new project: 1) Create a new repository on GitHub, 2) Open . Rproj in RStudio, 3) Connect with GitHub - and now it’s time to pull, commit and push :)
I worked a lot both in industry and academia using RStudio. It’s a fantastic and smooth IDE that allows you to write in multiple languages. The GIF below shows how I typically set up a project in RStudio with GitHub when working in academia.
GIF showing how I typically set up a project:
Alternative text
I create a GitHub repository first (depending on data privacy and other things, I go for either public or private but I always add a README. READMEs are great because they allow you to write a short description of your repository in markdown).
Then I go back to my RStudio desktop version and select “File” > “New project”. To enable version control, select here “Version control” and then copy-paste the link from your GitHub repository
Alternative text
Screenshot showing a green “Code” button on GitHub that reveals a HTTPS-based URL that you can copy
A new project opens and your version control is up and running ๐
I realized that I usually start with a similar setup when working on an academic project, so I wrote a few code snippets that populate my .Rproj
with files and folders. It’s described in more detail in this blog post that I wrote in 2020
# Set up the folder structure
folder_names <- (
# Main folders
c("data", "code", "figures",
# Sub-folders
"data/raw", "data/processed"))
for (j in seq_along(folder_names)) {
dir.create(folder_names[j])
}
# Add files to the folders
file_names <- (
c(
# For preparing your data
"1_data_preparation",
# The merging file might also be combined
# with the first file
"2_merging",
# For your descriptives
"3_descriptives",
# For your analysis
"4_analysis",
# For your visualization
"5_visualization"
)
)
for (j in seq_along(file_names)) {
file.create(paste0("code/", file_names[j], ".Rmd"))
}
# Create a helper function file
file.create("code/helper.R")
You can either always copy-paste this code or turn it into a code snippet that lets you run it automatically when typing “academic” (or whatever word you prefer) and hitting “Tab”.
This example shows how it works with a header, but it can also be transferred to other setups:
I love it! That's beautiful ๐ I also love code snippets ๐https://t.co/VWjzm5rdum pic.twitter.com/4nr2EBNR4D
— We are R-Ladies (@WeAreRLadies) September 14, 2022
If you go for the snippets, here’s how it works:
snippet academic
`r folder_names <- (c("data", "code", "figures", "data/raw", "data/processed"))
for (j in seq_along(folder_names)) {
dir.create(folder_names[j])
}
file_names <- (c("1_data_preparation", "2_merging", "3_descriptives","4_analysis", "5_visualization"))
for (j in seq_along(file_names)) {file.create(paste0("code/", file_names[j], ".Rmd"))}
file.create("code/helper.R")`
academic
” in the console and hit “Tab” - and let the magic happen ๐ซThe GIF below shows it works in a local R project environment. Just start typing “academic
” in the console, once you hit “Tab”, the files get automatically populated.
Alternative text
GIF showing how using the code snippet “academic” automatically fills the folder structure in your R project
๐บ If you are looking for a tutorial on code snippets, have a look at Sharon Machlis' tutorial.
Alternative text
Visualization showing a typical workflow when using GitHub in VS Code with a new project: 1) Create a new repository on GitHub, 2) Clone repository in your VS Code, 3) Connect with GitHub - and now it’s time to pull, commit and push :)
Similar to RStudio, VS Code also lets you easily connect to GitHub to set up a working version control. It follows similar steps as the process in RStudio described above.
I again create a GitHub repository first and copy the link.
Alternative text
Screenshot showing a green “Code” button on GitHub that reveals an HTTPS-based URL that you can copy
Then I go back to VS Code and select “File” > “New Window”. To enable version control, select now “Clone repository”, paste the link from the GitHub repository and save the repository also on your local machine by cloning it. The GIF shows this workflow with visuals:
GIF showing how I typically set up a project in VS Code:
Alternative text
And again, a new project opens and your version control is up and running and you can start working on your projects ๐
Another cool thing about VS Code is admittedly its extensions. There are for instance some that show you the color behind the hex code but there are many more helpful GitHub extensions. I typically use these two (but the options are endless):
While most of you use Git(Hub) for version control, there are also other use cases.
Today @WeAreRLadies it's about one of the most powerful tools in your workflow: #git (and also GitHub as a hosting platform) ๐ค
— Cosima Meyer ๐ฆฃ @cosima_meyer@mas.to (@cosima_meyer) September 15, 2022
What do you use Git(Hub) mainly for?
If you have interesting repositories to share, other use cases, or use all of them, put them in the comments ๐
GitHub is a code hosting platform (like GitLab or Bitbucket) that allows you to share your code with others but also to have it as version control for yourself. GitHub becomes more and more social โ it now allows you for instance to generate a personalized README where you have a mini website. Here you can post things about yourself and your work๐ฉโ๐ป
Alternative text
Screenshot showing an example of a GitHub README (a personalized landing page of a GitHub account)
The README is a great landing page to collect information about yourself and your work ๐ฉโ๐ป (it also works for organizations now!)
And it’s extremely easy to set it up: Create a new public (!) repository with your GitHub handle and a README file. Now you can start editing it and making it beautiful ๐
Alternative text
Screenshot showing an example of a raw GitHub README (a personalized landing page of a GitHub account)
๐ Hereโs a short step-by-step guide on how to set up and change your README
๐ฅ And here are a million ways how to further customize your README - it’s fantastic and you can easily spend hours looking for the best design ๐ฅฐ
I personally also use GitHub to star repositories that I find helpful โญ and to work together on projects (issues are fantastic tools and once you got the hang of pull requests and issues, it’s so satisfying to close them and check them off your list). It also helps you to organize things using agile working tools like Kanban boards (we have one for our package development).
If you’re up for some automated pipelines๏ธ, there’s “GitHub Actions” that (for instance) allows you to run your checks on your package once you push changes to GitHub (there’s more to this in the blog post on package development) but you can also use them to host a website (with GitHub pages), implement FTP deployments, and much more!
๐ As the last gem, you can see a skyline of your activity on GitHub! Hereโs mine from 2021:
Alternative text
Screenshot of a GitHub skyline showing the number of commits per day/week as skyscrapers
If you want to see what yours looks like, you can use this link ๐
While I touched the surface of what Git can do, itโs an extremely powerful tool that has so much more to offer ๐คฉ Here are some more resources, if you want to learn more about it:
๐ Hereโs the summary of todayโs input (also as ๐PDF for you to download here):
Visual summary of how to GitHub with RStudio and VS Code Left side: Image showing a git workflow from the working directory to the remote repo.
Working directory โ Staging area โ local repo โ remote repo and also common git commands (git add code.R, git commit -m “Update”, git push, git pull, git checkout, git merge) Right side:
Visualization showing a typical workflow when using GitHub in RStudio and VS Code with a new project:
Alternative text
This blog post was last updated September 2022, 25.