Creating nf-core-style Nextflow modules

Bioinformatics Tools & Workflows

A practical guide to writing local Nextflow modules with modern nf-core-style metadata, topic-channel version outputs, stub tests, and nf-test snapshots.

Author

Bhargava Reddy Morampalli

Published

11 June 2026

I have been cleaning up some local modules in one of my Nextflow pipelines, mainly updating how they capture versions using topic channels. While doing that, I realised the same steps might be useful to anyone building local modules for the first time.

When I started writing modules, I treated them as finished once the process ran. I now expect a little more: clear inputs and outputs, a captured tool version, a working stub, and a test I can rerun after a change.

This is the guide I would have wanted when I was learning. It is not a replacement for the nf-core documentation, and the details will move as Nextflow and nf-core evolve. The examples reflect the conventions I was using in June 2026.

The short version is this:

start from the nf-core module template when you can;
keep main.nf boring and explicit;
make meta.yml describe the actual channel structure;
emit tool versions with topic: versions;
test both real execution and stub mode;
open the snapshot and check the version tuples yourself.

I always inspect the snapshot myself, even when the test passes.

Environment assumed

I am assuming a local Nextflow pipeline repository with modules under modules/local/, nf-test available, and a recent Nextflow version. The examples use local modules, but the same habits apply when writing modules for a shared nf-core-style codebase.

Start from the template when possible

If you are creating a brand-new module, do not begin with a blank file unless you have a very good reason. Let nf-core/tools generate the boring parts first:

nf-core modules create fastqc --author @your-github-handle --label process_low --meta

For a tool/subcommand-style module, use the tool path:

nf-core modules create samtools/depth --author @your-github-handle --label process_low --meta

The command creates the module scaffold and prompts for the details it can infer. If it finds a Bioconda entry, it can fill in some software/container information. If it finds a bio.tools entry, it may also suggest inputs, outputs, and EDAM ontology terms.

Do not treat those guesses as truth. Treat them as a useful first draft.

After generation, open main.nf, meta.yml, and tests/main.nf.test together. Check that the command, outputs, metadata, and assertions agree.

The shape of a small module

A simpler example from one of my pipelines is a depth module based on samtools depth. It has:

one input tuple: sample metadata plus a BAM file;
one file output: a depth text file;
one version output: the samtools version;
one stub section that creates the expected output file without running the real tool.

Here is the process in individual sections before showing the whole thing.

Process header

The process name is uppercase, the tag is useful in logs, and the label maps to your pipeline resource configuration:

process SAMTOOLS_DEPTH {
    tag "$meta.id"
    label 'process_low'

For most local modules, tag "$meta.id" is enough. If the module takes a reference or a method name, add that only if it makes the trace easier to read.

Software environment

Pin the tool in Conda and, when you know the image, pin the container too:

    conda "bioconda::samtools=1.17"
    container "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"

Inputs

Most nf-core-style modules pass sample information as a meta map in a tuple:

    input:
    tuple val(meta), path(bam)

That tuple shape is important because the modern meta.yml format mirrors the channel grouping. A tuple in main.nf becomes a list in meta.yml. A single value channel is not wrapped in the same way. This is one of those small formatting details that makes metadata very readable.

Outputs

The normal output gets an emit: name:

    output:
    tuple val(meta), path("*.txt"), emit: depth

Use a name that describes the channel, not the file extension. depth, bam, index, report, plot, and summary are easier to reason about later than out or result.

Version output with topic channels

I want every module to emit the tool version alongside its normal output. For this module, I use a tuple containing the process name, tool name, and version string, then send it to the shared versions topic:

    tuple val("${task.process}"),
          val('samtools'),
          eval('(samtools --version 2>/dev/null || echo unknown) | head -n 1 | cut -d " " -f 2'),
          emit: versions_samtools,
          topic: versions

There are two separate ideas here:

emit: versions_samtools gives the process output a unique name for nf-test snapshots and module metadata.
topic: versions sends the same value into the shared versions topic so the pipeline can collect all tool versions centrally.

In a pipeline, this usually becomes something like:

ch_versions = Channel.topic('versions')

Topic channels are useful because many processes can send values to the same topic without wiring a long chain of mix() calls. The important caveat is that a process should not both consume from a topic and emit to that same topic, because that can create a pipeline that waits forever.

For modules with more than one tool, emit one version channel per tool:

tuple val("${task.process}"), val('python'), eval('python --version 2>&1 | sed "s/Python //" || echo unknown'), emit: versions_python, topic: versions
tuple val("${task.process}"), val('pandas'), eval('python -c "import pandas; print(pandas.__version__)" 2>/dev/null || echo unknown'), emit: versions_pandas, topic: versions

In my coverage_plot module, for example, one process emits separate version tuples for Python, pandas, Matplotlib, and seaborn.

Make version commands output clean version strings

The version expression should print one stable value. Some tools print multi-line banners, write versions to stderr, or print extra dependency information. Normalise that in the expression. For samtools, the full samtools --version output is too much for a clean tuple, so the expression keeps only the first line and extracts the version number.

`when:`

Most modules should respect task.ext.when:

    when:
    task.ext.when == null || task.ext.when

This lets the pipeline turn a module on or off with process configuration without adding branching logic inside the module itself.

Script block

The script should be readable from left to right:

    script:
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    samtools depth $args -a $bam > ${prefix}.txt
    """

I like using task.ext.args even when the first version of the module does not need extra options. It gives you one obvious place for process-specific arguments later. task.ext.prefix is similarly useful for making filenames predictable while still allowing a workflow to override them.

Stub block

Stub mode should create the same output paths, just without running the real tool:

    stub:
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    touch ${prefix}.txt
    """
}

nf-core module tests require a stub test. It lets you check the process wiring, output names, channels, and version tuples without running the real tool. The stub only needs to create the expected output paths.

The whole `main.nf`

Putting the pieces together:

process SAMTOOLS_DEPTH {
    tag "$meta.id"
    label 'process_low'

    conda "bioconda::samtools=1.17"
    container "quay.io/biocontainers/samtools:1.17--h00cdaf9_0"

    input:
    tuple val(meta), path(bam)

    output:
    tuple val(meta), path("*.txt"), emit: depth
    tuple val("${task.process}"), val('samtools'), eval('(samtools --version 2>/dev/null || echo unknown) | head -n 1 | cut -d " " -f 2'), emit: versions_samtools, topic: versions

    when:
    task.ext.when == null || task.ext.when

    script:
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    samtools depth $args -a $bam > ${prefix}.txt
    """

    stub:
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
    touch ${prefix}.txt
    """
}

Writing `meta.yml`

The modern meta.yml file should describe the channel structure in the module.

Start with the module identity:

name: "samtools_depth"
description: Calculate per-base depth coverage from BAM files
keywords:
  - depth
  - coverage
  - bam
  - samtools

Keep the description factual. Keywords should be specific enough to help someone find the module later.

Tools

Write one entry per real tool. Use homepage, documentation, doi, licence, and tool_dev_url when you know them. Do not invent bio.tools IDs or ontology terms just to make the file look complete.

tools:
  - samtools:
      description: |
        SAMtools is a suite of programs for interacting with high-throughput sequencing data.
      homepage: http://www.htslib.org/
      documentation: https://www.htslib.org/doc/samtools-depth.html
      doi: 10.1093/bioinformatics/btp352
      licence: ["MIT"]

For a Python plotting module, it is fine to list Python and the libraries that matter:

tools:
  - python:
      description: Python programming language
      homepage: https://www.python.org/
      documentation: https://docs.python.org/3/
      tool_dev_url: https://github.com/python/cpython
      licence: ["PSF-2.0"]
  - pandas:
      description: Python library providing data structures and data analysis tools
      homepage: https://pandas.pydata.org/
      documentation: https://pandas.pydata.org/docs/
      tool_dev_url: https://github.com/pandas-dev/pandas
      licence: ["BSD-3-Clause"]

List the software whose versions help explain the output; you do not need to document every imported library.

Inputs in grouped form

This is the modern grouped structure for one tuple channel:

input:
  - - meta:
        type: map
        description: |
          Groovy Map containing sample information
          e.g. `[ id:'sample1', single_end:false ]`
    - bam:
        type: file
        description: BAM file for depth calculation
        pattern: "*.bam"
        ontologies:
          - edam: "http://edamontology.org/format_2572" # BAM

Notice the double list:

input:
  - - meta:
    - bam:

That means “one channel, and this channel is a tuple containing meta and bam.” It looks odd until you get used to it, but it matches the channel shape exactly.

Outputs keyed by `emit:`

Outputs are now keyed by the emit: names from main.nf:

output:
  depth:
    - - meta:
          type: map
          description: |
            Groovy Map containing sample information
            e.g. `[ id:'sample1', single_end:false ]`
      - "*.txt":
          type: file
          description: Per-base depth coverage file
          pattern: "*.txt"
          ontologies:
            - edam: "http://edamontology.org/format_3475" # TSV

This is much easier to review than an ungrouped list of files. You can compare the output block directly against main.nf:

tuple val(meta), path("*.txt"), emit: depth

If the emit: name is depth, the metadata key should be depth.

Version outputs in metadata

Every version output gets its own output: entry:

  versions_samtools:
    - - "${task.process}":
          type: string
          description: The name of the process
      - samtools:
          type: string
          description: The name of the tool
      - "(samtools --version 2>/dev/null || echo unknown) | head -n 1 | cut -d \" \" -f 2":
          type: eval
          description: The expression to obtain the version of the tool

The version-capture command needs to match main.nf exactly. If you change it there, update meta.yml at the same time.

And the same tuple appears under topics.versions:

topics:
  versions:
    - - "${task.process}":
          type: string
          description: The name of the process
      - samtools:
          type: string
          description: The name of the tool
      - "(samtools --version 2>/dev/null || echo unknown) | head -n 1 | cut -d \" \" -f 2":
          type: eval
          description: The expression to obtain the version of the tool

This duplication feels a little fussy, but it is useful. output: documents the named process output. topics: documents what the process contributes to the shared topic channel.

For a multi-tool process, repeat this once per tool. In coverage_plot, for example, we had:

output:
  versions_python:
  versions_pandas:
  versions_matplotlib:
  versions_seaborn:

topics:
  versions:
    # python tuple
    # pandas tuple
    # matplotlib tuple
    # seaborn tuple

The snapshot should then show all four keys and all four tool names.

Authors and maintainers

End with both:

authors:
  - "@your-github-handle"
maintainers:
  - "@your-github-handle"

For old local modules, it is common to find authors but not maintainers. Add both.

A full `meta.yml` example

Here is the compact version for the samtools_depth example:

name: "samtools_depth"
description: Calculate per-base depth coverage from BAM files
keywords:
  - depth
  - coverage
  - bam
  - samtools
tools:
  - samtools:
      description: |
        SAMtools is a suite of programs for interacting with high-throughput sequencing data.
      homepage: http://www.htslib.org/
      documentation: https://www.htslib.org/doc/samtools-depth.html
      doi: 10.1093/bioinformatics/btp352
      licence: ["MIT"]

input:
  - - meta:
        type: map
        description: |
          Groovy Map containing sample information
          e.g. `[ id:'sample1', single_end:false ]`
    - bam:
        type: file
        description: BAM file for depth calculation
        pattern: "*.bam"
        ontologies:
          - edam: "http://edamontology.org/format_2572" # BAM

output:
  depth:
    - - meta:
          type: map
          description: |
            Groovy Map containing sample information
            e.g. `[ id:'sample1', single_end:false ]`
      - "*.txt":
          type: file
          description: Per-base depth coverage file
          pattern: "*.txt"
          ontologies:
            - edam: "http://edamontology.org/format_3475" # TSV
  versions_samtools:
    - - "${task.process}":
          type: string
          description: The name of the process
      - samtools:
          type: string
          description: The name of the tool
      - "(samtools --version 2>/dev/null || echo unknown) | head -n 1 | cut -d \" \" -f 2":
          type: eval
          description: The expression to obtain the version of the tool

topics:
  versions:
    - - "${task.process}":
          type: string
          description: The name of the process
      - samtools:
          type: string
          description: The name of the tool
      - "(samtools --version 2>/dev/null || echo unknown) | head -n 1 | cut -d \" \" -f 2":
          type: eval
          description: The expression to obtain the version of the tool

authors:
  - "@your-github-handle"
maintainers:
  - "@your-github-handle"

The real file can have more detail, but this is the skeleton I now look for.

Writing the nf-test

A module test should answer three questions:

Does the process run?
Are the expected outputs present?
Are the version outputs visible in the snapshot?

For this module:

nextflow_process {

    name "Test Process SAMTOOLS_DEPTH"
    script "../main.nf"
    process "SAMTOOLS_DEPTH"

    tag "modules"
    tag "modules_local"
    tag "samtools"
    tag "samtools/depth"

    test("Should calculate per-base depth from BAM file") {

        when {
            process {
                """
                input[0] = [
                    [ id: 'test', single_end: false ],
                    file("${projectDir}/modules/local/samtools_depth/tests/data/test.bam", checkIfExists: true)
                ]
                """
            }
        }

        then {
            assertAll(
                { assert process.success },
                { assert snapshot(process.out).match() },
                { assert process.out.versions_samtools }
            )
        }
    }

    test("Should work in stub mode") {

        options "-stub"

        when {
            process {
                """
                input[0] = [
                    [ id: 'test_stub', single_end: false ],
                    file("${projectDir}/modules/local/samtools_depth/tests/data/test.bam", checkIfExists: true)
                ]
                """
            }
        }

        then {
            assertAll(
                { assert process.success },
                { assert process.out.depth },
                { assert process.out.versions_samtools }
            )
        }
    }
}

I like snapshotting process.out in the real test because it catches channel names, file fingerprints, and version tuples in one place. For stub tests, I often keep the assertions lighter if the real test already snapshots the full output.

When to sanitize snapshots

Snapshots are very good at catching accidental changes, but they can become noisy with unstable outputs:

binary files that differ between runs;
logs with timestamps or absolute paths;
directories containing generated files with variable names;
plots whose metadata changes even when the visual output is effectively the same.

In those cases, snapshot only the stable part:

def report = process.out.report.get(0).get(1)

assert snapshot(
    file(report).getName().toString(),
    process.out.versions_python
).match()

Or collect stable files and unstable filenames separately for directory outputs:

def stableFiles = []
def unstableNames = []

file(process.out.results.get(0).get(1)).eachFileRecurse { f ->
    if (!f.isDirectory() && f.getName().endsWith(".tsv")) {
        stableFiles.add(f)
    }
    if (!f.isDirectory() && f.getName().endsWith(".log")) {
        unstableNames.add(f.getName().toString())
    }
}

assert snapshot(
    stableFiles,
    unstableNames,
    process.out.versions_tool
).match()

If an output is unstable, sanitise only that part of the snapshot. Leave the version tuples intact.

Module-local test config

Sometimes a module needs small test-only configuration. For example, a stub-only test may not need to solve a difficult Conda environment, or a test may need ext.args.

Use a tiny tests/nextflow.config:

process {
    withName: 'SAMTOOLS_DEPTH' {
        ext.args = { params.module_args ?: '' }
        ext.prefix = { "${meta.id}" }
    }
}

Then include it only in the test that needs it:

test("Should calculate per-base depth from BAM file") {

    config "modules/local/samtools_depth/tests/nextflow.config"

    when {
        params {
            module_args = "-q 10"
        }
        process {
            """
            input[0] = [
                [ id: 'test', single_end: false ],
                file("${projectDir}/modules/local/samtools_depth/tests/data/test.bam", checkIfExists: true)
            ]
            """
        }
    }
}

Keep this file small. It is for module test configuration, not a second pipeline config.

Run the tests

I normally run the test, update the snapshot when a change is intentional, and then run the test once more without the flag:

# Run the current test
nf-test test modules/local/samtools_depth/tests/main.nf.test

# Accept an intentional snapshot change
nf-test test modules/local/samtools_depth/tests/main.nf.test --update-snapshot

# Confirm that the saved snapshot is stable
nf-test test modules/local/samtools_depth/tests/main.nf.test

I also open tests/main.nf.test.snap and read it. A passing test does not tell me whether the version tuple is the one I expected.

For samtools_depth, the important part looks like this:

{
  "versions_samtools": [
    [
      "SAMTOOLS_DEPTH",
      "samtools",
      "1.17"
    ]
  ]
}

That tells you the emitted channel is named correctly and the tool version tuple is actually reaching the test output.

For a module with four tools, I want to see four keys:

versions_python
versions_pandas
versions_matplotlib
versions_seaborn

and four tool names in the tuples:

python
pandas
matplotlib
seaborn

If the test passes but the snapshot does not contain those, it needs to be fixed before the module is really done.

Linting

The ideal final check is nf-core module linting:

nf-core modules lint samtools/depth --dir . --local

or for a local module name:

nf-core modules lint samtools_depth --dir . --local

In a fully recognised nf-core pipeline or modules repository, this checks that meta.yml exists, validates against the module schema, and agrees with main.nf inputs, outputs, and topics.

The final checklist I use

For one module, I now check this in order:

main.nf
  [ ] process name matches the module name
  [ ] input tuples are explicit
  [ ] every output has a useful emit name
  [ ] every version output has emit: versions_<tool>
  [ ] every version output also has topic: versions
  [ ] task.ext.when is respected
  [ ] script block uses task.ext.prefix, and task.ext.args when useful
  [ ] stub block creates the expected output paths

meta.yml
  [ ] input channels are grouped by channel shape
  [ ] output map is keyed by emit names
  [ ] versions_<tool> outputs are documented
  [ ] topics.versions documents the same version tuples
  [ ] EDAM ontology terms are included where obvious and verified
  [ ] authors and maintainers are present

tests
  [ ] real test succeeds, unless a real blocker is documented
  [ ] stub test succeeds
  [ ] snapshot contains versions_<tool> keys
  [ ] snapshot contains the expected tool names and version values
  [ ] unstable outputs are sanitized only as much as needed

References

These are the docs I keep open while doing this work: