Code and Document Management

An introduction to code and document management by Dr. Yi-Xin Liu at Fudan University (lyx@fudan.edu.cn).

This is a part of the course: Road to Scientific Research: Powerful Computer Applications (XDSY118019.01).

Lecture date: 2024.10.31

Code Management

Tracking the history of modifications to a code project is extremely important. You can do it manually, like:

$ ls
mycode_v1.py
mycode_v2.py
mycode_v2_with_some_new_implementations.py
mycode_v3_along_original.py
mycode_v3_with_some_new_implementations.py
mycode_v3_with_other_improvements.py
mycode_GPU.py

Or like this:

mycode_20220606.py  mycode_20220707.py  mycode_20220808.py  ...

Or

Failure of Manual Management

Soon you will discover that the manual way to track the version of your code is:

  • tedious
  • painful

It is also not scalable: when you have multiple files to track, the history of your project quickly become a mess!

Version Control System

  • In modern days, we will use a version control system to do code and document management.
  • Git is the most popular version control system.

Git and GitHub

Getting Started with Git

Demo and Exercises for Git

  • Installation
  • Configuration
  • Initializing a repo
  • Staging and committing
  • Status, log and checkout
  • Create a branch

Learning material: Git & GitHub Tutorial for Scientists

Git: Installation

  • MacOS: goto the download page of git official website.
  • Windows: goto git for windows and download the latest version.
  • After installation, check the version of git
$ git --version

Git: Configuration

  • Configure via command line:
$ git config --global user.name "Yixin Liu"
$ git config --global user.email "lyx@fudan.edu.cn"
$ git config --global core.editor "code"
$ git config --global core.autocrlf input  # Mac/Linux
$ git config --global core.autocrlf true  # Windows
  • Edit config file manually:
$ git config --global -e
# or
$ code ~/.gitconfig

Git: Initialization

  • Create a new Git repository (repo)
$ cd ~/projects/gittest
$ git init

Git: Status and Diff

  • Check the current status of git
$ git status
  • Check what have been modified in a file:
$ git diff README.md

Git: Staging and Committing

  • Staging
$ touch README.md
$ git add README.md
  • Committing
$ git ci -m "First commit."

Git: Log and Checkout

  • List commit logs
$ git log
$ git log --oneline
  • Checkout a specific commit: in a state of "detached HEAD", use it carefully.
$ git checkout 8eb8716
  • Checkout a commit using TAG
$ git tag
$ git checkout v0.3.0

Git: Branching

  • Create and checkout new branch
$ git checkout -b a_new_feature
  • List all branches
$ git branch
  • Switch among branches
$ git checkout master
$ git branch
$ git checkout a_new_feature
$ git switch master

Git: Merging

  • Merge a branch to master
$ git checkout master
$ git merge a_new_feature

Github.com

The main purpose of GitHub.com is to facilitate the version control and issue tracking aspects of software development.

  • Labels, milestones, responsibility assignment, and a search engine are available for issue tracking.
  • For version control, Git (and by extension GitHub.com) allows pull requests (PR) to propose changes to the source code.
  • Users with the ability to review the proposed changes can see a diff of the requested changes and approve them.

Github.com Service

  • Github.com hosts Git repositories (repos).
  • Projects on GitHub.com can be accessed and managed using the standard Git command-line interface.
  • All standard Git commands work with it.
  • GitHub.com also allows users to browse public repositories on the site.
  • Multiple desktop clients and Git plugins are also available.
  • The site provides social networking-like functions such as feeds, followers, wikis and a social network graph to display how developers work on their versions ("forks") of a repository and what fork (and branch within that fork) is newest.

Demo and Exercises for Github

Goto Github.com and do the following:

  • Create an account if you do not have one.
  • Create a new public repo.
  • Push your local Git repo to Github.com.
  • Create a new branch in your local Git repo.
  • Push your local Git repo to Github.com again.
  • Create a PR for that branch in Github.com.
  • Merge the PR in Github.com
  • Pull the merged repo on Github.com to update your local repo.

Reference Manager

Reference management software is software for scholars and authors to use for recording and utilizing bibliographic citations (references) as well as managing project references. Once a citation has been recorded, it can be used time and again in generating bibliographies, such as lists of references in scholarly books, articles and essays. The development of reference management packages has been driven by the rapid expansion of scientific literature.

Check out a list of reference managers and a comparison of them.

Zotero

Zotero is, at the most basic level, a reference manager. It is designed to store, manage, and cite bibliographic references, such as books and articles. In Zotero, each of these references constitutes an item. More broadly, Zotero is a powerful tool for collecting and organizing research information and sources.

Note: Zotero is free.


Demonstration: go to Zotero and install Zotero to your computer.

Mendeley

Mendeley is a free reference manager that can help you store, organize, note, share and cite references and research data.

Note: Zotero is free.


Demonstration: go to Mendeley and install Mendeley to your computer.

Paperpile

Paperpile is a online software for no-fuss reference management on the web. Paperpile enable us manage our research library right in the browser:

  • Save time with a smart, intuitive interface
  • Access PDFs from anywhere
  • Add citations and bibliographies to Google Docs

Note: Paperpile is not free.


Demonstration: go to Paperpile.