Lesson 8: Workflows

Learning a programming language doesn't only require learning new syntax, but also getting proficient with new tools, an aspect of programming education that is often overlooked. After having only worked in Pluto notebooks so far, we will now take a look at alternative workflows.

For this purpose, we will first cover the Julia package manager Pkg, which allows us to write reproducible code. We will then move on to a REPL-based workflow that works with all editors. Julia developers commonly have an interactive REPL session running while working on their code.

We will then introduce the structure of a Julia package by generating an empty package with PkgTemplates.jl and showcase the VSCode IDE with the Julia extension. Finally, we will demonstrate DrWatson.jl, a template and "assistant" for scientific experiments, and demonstrate how to run Julia programs from the command-line.

These workflows should empower you to write homework, projects and even your thesis in Julia!

Table of Contents

Note

These notes are designed to accompany a live demonstration in the Julia programming for Machine Learning class at TU Berlin.

The Julia package manager

We have already encountered Julia's package manager Pkg during the installation of Pluto. In the Julia REPL, Pkg can be opened by typing a closing square bracket ].

Depending on your installed version of Julia, the prompt should change from julia> to (@1.10) pkg>:

julia> # Default Julia-mode. Type ] to enter Pkg-mode.

(@v1.10) pkg> # Prompt changes to indicate Pkg-mode

To exit the package-manage mode, press backspace. The name in parenthesis, here @v1.10, is the name of the currently activated environment. @v1.10 is the global environment of our Julia 1.10 installation.

By typing status in Pkg-mode, we can print a list of installed packages:

(@v1.10) pkg> status
Status `~/.julia/environments/v1.10/Project.toml`
  [5fb14364] OhMyREPL v0.5.20
  [295af30f] Revise v3.5.2

In my case, two packages are installed in the @v1.10 environment.

Let's take a look at the indicated folder ~/.julia/environments/v1.10 in a new shell session. It contains two files: a Project.toml and a Manifest.toml.

$ cd ~/.julia/environments/v1.10

$ ls
Manifest.toml Project.toml

These two files define environments.

Environments

Let's first take a look at contents of the Project.toml. We can either open it in an editor or look at the file contents in our terminal using the command cat Project.toml:

[deps]
OhMyREPL = "5fb14364-9ced-5910-84b2-373655c76a03"
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"

In the case of our environment, it just contains a list of installed packages with "universally unique identifiers" (UUIDs). As we will see in the following sections, the Project.toml contains more information when used in packages.

The Manifest.toml is a much longer file. It lists all packages in the dependency tree. For packages that are not part of Julia Base, Git tree hashes and versions are specified. This makes our environment fully reproducible!

Let's look at ours:

# This file is machine-generated - editing it directly is not advised

julia_version = "1.8.5"
manifest_format = "2.0"
project_hash = "e9cf4d3c4e1f72eba6aa88164f23d06c005b9b9b"

[[deps.ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"
version = "1.1.1"

[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[deps.Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[deps.CodeTracking]]
deps = ["InteractiveUtils", "UUIDs"]
git-tree-sha1 = "d730914ef30a06732bdd9f763f6cc32e92ffbff1"
uuid = "da1fd8a2-8d9e-5ec2-8556-3022fb5608a2"
version = "1.3.1"

[[deps.Crayons]]
git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15"
uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
version = "4.1.1"
...

Each environment we create adds a folder to ~/.julia/environments that contains a Project.toml and a Manifest.toml.

Reproducibility

The pair of Project.toml and Manifest.toml make our environment fully reproducible, which is important for scientific experiments.

Creating a new virtal environment

To create a new environment, enter Pkg-mode in the Julia REPL and type activate followed by the name of your new environment:

(@v1.10) pkg> activate MyTest # create new environment called  "MyTest"
  Activating new project at `~/.julia/environments/v1.10/MyTest`

(@MyTest) pkg> # environment is active

This creates a new folder at ~/.julia/environments/v1.10/MyTest containing a Project.toml and Manifest.toml. Adding packages to this environment will update both of these files:

(@MyTest) pkg> add LinearAlgebra
   Resolving package versions...
    Updating `~/.julia/environments/v1.10/MyTest/Project.toml`
  [37e2e46d] + LinearAlgebra
    Updating `~/.julia/environments/v1.10/MyTest/Manifest.toml`
  [56f22d72] + Artifacts
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [e66e0078] + CompilerSupportLibraries_jll v1.0.1+0
  [4536629a] + OpenBLAS_jll v0.3.20+0
  [8e850b90] + libblastrampoline_jll v5.1.1+0

Temporary environments

If you want to try an interesting new package you've seen on GitHub, the package manager offers a simple way to start a temporary environment.

In your Julia REPL, enter package mode and type activate --temp. This will create an environment with a randomized name in a temporary folder.

(@v1.10) pkg> activate --temp
  Activating new project at `/var/folders/74/wcz8c9qs5dzc8wgkk7839k5c0000gn/T/jl_9AGcg1`

(jl_9AGcg1) pkg>

Environments in Pluto

Pluto notebooks also contain reproducible environments. Let's take a look at the source code of a notebook called empty_pluto.jl that just contains a single cell declaring using LinearAlgebra.

### A Pluto.jl notebook ###
# v0.19.25

using Markdown
using InteractiveUtils

# ╔═║ 9842a4f5-69d1-4566-b605-49d5c6679b4a
using LinearAlgebra # πŸ’‘ the only cell we added! πŸ’‘

# ╔═║ 00000000-0000-0000-0000-000000000001
PLUTO_PROJECT_TOML_CONTENTS = """
[deps]
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
"""
 
# ╔═║ 00000000-0000-0000-0000-000000000002
PLUTO_MANIFEST_TOML_CONTENTS = """
# This file is machine-generated - editing it directly is not advised

julia_version = "1.8.5"
manifest_format = "2.0"
project_hash = "ac1187e548c6ab173ac57d4e72da1620216bce54"

[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[deps.CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "1.0.1+0"

[[deps.Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[deps.LinearAlgebra]]
deps = ["Libdl", "libblastrampoline_jll"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[deps.OpenBLAS_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
version = "0.3.20+0"

[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl", "OpenBLAS_jll"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.1.1+0"
"""

# ╔═║ Cell order:
# ╠═9842a4f5-69d1-4566-b605-49d5c6679b4a
# β•Ÿβ”€00000000-0000-0000-0000-000000000001
# β•Ÿβ”€00000000-0000-0000-0000-000000000002

We can see that

Pluto notebooks are therefore fully reproducible and also regular Julia files!

REPL-based workflows

The most basic workflow uses the Julia REPL in combination with your favorite editor.

Loading Julia source code

To load a source file, use the command include. To test this, I have created two almost identical files:

# Contents of foo.jl
foo(x) = x
# Contents of bar.jl
module Bar

  bar(x) = x

  export bar # export function

end # end module

Let's compare the two approaches. The first one loads all contents of the file into the global namespace

julia> include("foo.jl")
foo (generic function with 1 method)

julia> foo(2)
2

whereas the second approach encapsulates everything inside the module Bar. Via using .Bar, we make all functions that are exported in Bar available:

julia> include("bar.jl") # load module Bar
Main.Bar

julia> Bar.bar(2)  # we can access the function in the module...
2

julia> bar(2)      # ...but not directly
ERROR: UndefVarError: bar not defined
Stacktrace:
 [1] top-level scope
   @ REPL[4]:1

julia> using .Bar  # import everything that is exported in module Bar...

julia> bar(2)      # ...so we can use exports without name-spacing Bar
2

Enhancing the REPL experience

Loading packages on startup

If you have code that you want to be run every time you start Julia, add it to your startup file that is located at ~/.julia/config/startup.jl. Note that you might have to first create this config folder.

A common use-case for the startup.jl to load packages that are crucial for your workflow. Don't add too many packages: they will increase the loading time of your REPL and might pollute the global namespace. There are however two packages I personally consider essential additions: Revise.jl and OhMyRepl.jl.

Revise.jl

Revise.jl will keep track of changes in loaded files and reload modified Julia code without having to start a new REPL session.

To load Revise automatically, add the following code to your startup.jl:

# First lines of ~/.julia/config/startup.jl
try
    using Revise
catch e
    @warn "Error initializing Revise in startup.jl" exception=(e, catch_backtrace())
end

It is enough to add using Revise, but the try-catch statement will return a helpful error message in case something goes wrong.

OhMyRepl.jl

OhMyRepl adds many features to your REPL, amongst other things:

# Add to ~/.julia/config/startup.jl
atreplinit() do repl
    try
        @eval using OhMyREPL
    catch e
        @warn "Error initializing OhMyRepl in startup.jl" exception=(e, catch_backtrace())
    end
end

VSCode

In combination with the Julia extension, VSCode is the most commonly recommended editor for development in Julia. It provides several features and shortcuts that make package development convenient:

We will demonstrate the extension during the lecture.

Writing packages

In Julia, packages are the natural medium for code that doesn't fit in a simple script. While this might sound excessive at first, it provides many conveniences.

Thanks to templates, setting up the file structure for a Julia package takes seconds.

PkgTemplates.jl

PkgTemplates.jl is a highly configurable package for project templates. In this example, we are going to stick to the defaults:

julia> using PkgTemplates

julia> t = Template()

julia> t("MyPackage")

At the end of the package generation, Julia will inform us that our project has been created in the ~/.julia/dev folder:

[ Info: New package is at ~/.julia/dev/MyPackage

The output folder can be configured in the template. Take a look at the PkgTemplates user guide to create a template customized to your needs.

File structure

Let's take a look at the structure of the files generated by PkgTemplates.jl:

$ cd ~/.julia/dev/MyPackage

$ tree -a -I '.git/' # show folder structure, ignoring the .git folder 
.
β”œβ”€β”€ .github
β”‚Β Β  └── workflows
β”‚Β Β      β”œβ”€β”€ CI.yml
β”‚Β Β      β”œβ”€β”€ CompatHelper.yml
β”‚Β Β      └── TagBot.yml
β”œβ”€β”€ .gitignore
β”œβ”€β”€ LICENSE
β”œβ”€β”€ Manifest.toml
β”œβ”€β”€ Project.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ src
β”‚Β Β  └── MyPackage.jl
└── test
    └── runtests.jl

5 directories, 10 files

Note

In the lecture we will be discussing all files in detail:

  • Project.toml for packages
    • compat entries
    • semantic versioning
  • structure of Julia source code
  • package testing
  • continuous integration (CI)

Activating the package environment

In VSCode

The Julia VSCode extension provides a keyboard shortcut to start a REPL: Alt+j Alt+o (option+j option+o on macOS).

In the REPL

To start a REPL session that directly activates your local project environment, start julia with the flag --project:

$ cd ~/.julia/dev/MyProject

$ julia --project
# Starts Julia REPL session
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.5 (2023-01-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> # press ]

(MyPackage) pkg> # project environment is active!

The environment is directly active, there is no need to type activate MyPackage.

Tip

I recommend setting a shell alias set for julia --project --banner=no.

Project.toml in packages

Let's add a dependency to our package, for example CSV.jl:

(MyPackage) pkg> add CSV
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
    Updating `~/.julia/dev/MyPackage/Project.toml`
  [336ed68f] + CSV v0.10.10
    Updating `~/.julia/dev/MyPackage/Manifest.toml`
  [336ed68f] + CSV v0.10.10
  [944b1d66] + CodecZlib v0.7.1
  [9a962f9c] + DataAPI v1.15.0
  [e2d170a0] + DataValueInterfaces v1.0.0
  [48062228] + FilePathsBase v0.9.20
  [842dd82b] + InlineStrings v1.4.0
  [82899510] + IteratorInterfaceExtensions v1.0.0
  [2dfb63ee] + PooledArrays v1.4.2
  [91c51154] + SentinelArrays v1.3.18
  [3783bdb8] + TableTraits v1.0.1
  [bd369af6] + Tables v1.10.1
  [3bb67fe8] + TranscodingStreams v0.9.13
  [ea10d353] + WeakRefStrings v1.4.2
  [76eceee3] + WorkerUtilities v1.6.1
  [9fa8497b] + Future
  [8dfed614] + Test

When adding a package, the Project.toml of our package will automatically be updated. It is always located in the root folder of the package (in our example at ~/.julia/dev/MyPackage/Project.toml).

In our case, the Project.toml contains:

name = "MyPackage"
uuid = "c97c58cb-c2b5-45a4-93b4-32bd8ab523c1"
authors = ["Adrian Hill <git@adrianhill.de> and contributors"]
version = "1.0.0-DEV"

[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"

[compat]
julia = "1"

[extras]
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test"]

Tip

When looking at a new package, checking out its dependencies in the Project.toml is a good starting point.

Semantic versioning

It is good practice (and required for package registration) to enter [compat] entries for all dependencies. This allows us to update dependencies without having to worry about our code breaking.

By convention, Julia packages are expected to follow Semantic Versioning to specify version numbers:

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes

  2. MINOR version when you add functionality in a backward compatible manner

  3. PATCH version when you make backward compatible bug fixes

  4. Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Let's add a compat entry for CSV.jl. Using status in Pkg-mode, we can inspect the current version:

(MyPackage) pkg> status
Project MyPackage v1.0.0-DEV
Status `~/.julia/dev/MyPackage/Project.toml`
  [336ed68f] CSV v0.10.10

Let's declare in the Project.toml that only versions 0.10.X of CSV.jl are permitted. This will allow us to get updates that patch bugs, but not updates with breaking changes.

[compat]
CSV = "0.10"
julia = "1"

Structure of the source folder

By convention, the "main" file of a project has the same name as the project. PkgTemplates already created this file src/MyPackage.jl for us:

# Contents of src/MyPackage.jl
module MyPackage

# Write your package code here.

end

The file defines a module with the same name as our package. Inside this module, you will import dependencies, include other source files and export your functions. Let's look at a toy example:

# Updated contents of src/MyPackage.jl
module MyPackage

# 1.) Import functions you need
using LinearAlgebra: cholesky 

# 2.) Include source files
include("my_source_code_1.jl")
include("my_source_code_2.jl")
include("my_source_code_3.jl")

# 3.) Export functions you defined
export my_function_1, my_function_2

end # end module

Tip

When looking at the source code of a package, this should be the first file you read.

Package tests

By convention, package tests are in a folder called test/. The main file that includes all other tests is called runtest.jl. To run this file, enter Pkg-mode and write test:

(MyPackage) pkg> test
     Testing MyPackage
      Status `/private/var/folders/74/wcz8c9qs5dzc8wgkk7839k5c0000gn/T/jl_TcJkwR/Project.toml`

     ... # Julia resolves a temporary environment from scratch
    
     Testing Running tests...
Test Summary: |Time
MyPackage.jl  | None  0.0s
     Testing MyPackage tests passed

Our tests passed since we didn't have any!

Using the Test.jl standard library package and its macros @test and @testset, we can add tests to our package, which will be demonstrated in the lecture.

Note

Package tests will be covered in the lecture. Take a look at the unit test documentation.

Continuous integration

The .github/workflows/ folder contains three files, which specify GitHub actions:

These files contain instructions that will be run on GitHub's computers. The most basic use-case is running package tests. GitHub Actions either run on a timed schedule or at specific events, for example when pushing commits and opening pull requests.

Note

GitHub Actions and CI will be showcased in the lecture.

Package registration

If you wrote a high quality, well tested package and want to make it available to all Julia users through the package manager, follow the Registrator.jl instructions.

People will then be able to install your package by writing

(@v1.10) pkg> add MyPackage

Experiments with DrWatson.jl

DrWatson.jl describes itself as "scientific project assistant software". It serves two purposes:

  1. It sets up a project structure that is specialized for scientific experiments, similar to PkgTemplates.

  2. It introduces several useful helper functions. Among these are boiler-plate functions for file loading and saving.

The following two sections are directly taken from the DrWatson documentation, which I recommend reading

File structure

To initialize a DrWatson project, run:

julia> using DrWatson

julia> initialize_project("MyScientificProject"; authors="Adrian Hill", force=true)

The default setup will initialize a file structure that looks as follows:

β”‚projectdir          <- Project's main folder. It is initialized as a Git
β”‚                       repository with a reasonable .gitignore file.
β”‚
β”œβ”€β”€ _research        <- WIP scripts, code, notes, comments,
β”‚   |                   to-dos and anything in an alpha state.
β”‚   └── tmp          <- Temporary data folder.
β”‚
β”œβ”€β”€ data             <- **Immutable and add-only!**
β”‚   β”œβ”€β”€ sims         <- Data resulting directly from simulations.
β”‚   β”œβ”€β”€ exp_pro      <- Data from processing experiments.
β”‚   └── exp_raw      <- Raw experimental data.
β”‚
β”œβ”€β”€ plots            <- Self-explanatory.
β”œβ”€β”€ notebooks        <- Jupyter, Weave or any other mixed media notebooks.
β”‚
β”œβ”€β”€ papers           <- Scientific papers resulting from the project.
β”‚
β”œβ”€β”€ scripts          <- Various scripts, e.g. simulations, plotting, analysis,
β”‚   β”‚                   The scripts use the `src` folder for their base code.
β”‚   └── intro.jl     <- Simple file that uses DrWatson and uses its greeting.
β”‚
β”œβ”€β”€ src              <- Source code for use in this project. Contains functions,
β”‚                       structures and modules that are used throughout
β”‚                       the project and in multiple scripts.
β”‚
β”œβ”€β”€ test             <- Folder containing tests for `src`.
β”‚   └── runtests.jl  <- Main test file, also run via continuous integration.
β”‚
β”œβ”€β”€ README.md        <- Optional top-level README for anyone using this project.
β”œβ”€β”€ .gitignore       <- by default ignores _research, data, plots, videos,
β”‚                       notebooks and latex-compilation related files.
β”‚
β”œβ”€β”€ Manifest.toml    <- Contains full list of exact package versions used currently.
└── Project.toml     <- Main project file, allows activation and installation.
                        Includes DrWatson by default.

Workflow

The DrWatson workflow is best summarized in the following picture from the documentation:

DrWatson workflow

Calling scripts from the command line

Working on compute-clusters often required scheduling "jobs" from the command-line. To run a Julia script in the file my_script.jl, run the following command:

$ julia my_script.jl arg1 arg2...

Inside your script, the additional command-line arguments arg1 and arg2 can be used through the global constant ARGS. If my_script.jl contains the code

# Content of my_script.jl
for a in ARGS
  println(a)
end

Calling it with arguments foo, bar from the command-line will print:

$ julia my_script.jl foo bar
foo
bar

Command-line switches

Julia provides several command-line switches. For example, for parallel computing, --threads can be used to specify the number of CPU threads and --procs for the number of worker processes.

The following command will run my_script.jl with 8 threads:

$ julia --threads 8 -- my_script.jl arg1 arg2

Parallel computing

In this lecture, we only covered GPU parallelization (Lecture 7 on Deep Learning).

Refer to the Julia documentation on parallel computing for more information on multi-threading and distributed computing.

External packages

Handling arguments in ARGS can be tedious. Comonicon.jl is a package to build simple command-line interfaces for Julia programs by using a macro @main. Among other features, it supports

Take a look at the documentation.

Further reading

Additional resources on workflows in Julia can be found on the Modern Julia Workflows website and JuliaNotes.

Last modified: April 15, 2024.
Website, code and notebooks are under MIT License © Adrian Hill.
Built with Franklin.jl, Pluto.jl and the Julia programming language.